程序代写CS代考 compiler c++ Fortran concurrency cache More Concurrency Exercises and OpenMP – cscodehelp代写

More Concurrency Exercises and OpenMP
Shuaiwen

Concurrency and Synchronization Exercise
2

How many lines of output will it produce?
Answer: 7
Q1(fork)
3

Q2 (sempahores)
Consider the following program using semaphores . Which of the following execution order of sem_wait and sem_post are possible (W represent sem_wait, P represent sem_post)?
(A) WPWWWP (B) WWPPWW (C) PPWWWW (D) WPWPWW (E) WWWWPP
4

Will the following programs create zombie processes?
Q3 (Zombie Processes)
5

Q4 (Scalability)
Consider a multithreaded webserver running on a machine with N CPU cores. The server has M worker threads. Every incoming request is put in a request queue, and served by one of the free worker threads. When the server is fully saturated, it has a certain throughput at saturation. Under which circumstances will increasing M might lead to an approximate linear increase in the saturation throughput of the server?
(A) Under any circumstances
(B) Under no circumstances
(C) when M < N and the workload to the server is CPU-bound (D) when M < N and the workload to the server is I/O-bound 6 Please choose all possible outputs of the following program. Q5 (unintentional share or not) A. 0123456789 B. 0234156789 C. 0987123456 D. 8905674321 E. 0114459876 F. 1124465789 7 In the program below, which variables are shared among threads? Which are not? Q6 (variable scope) 8 Q6 (false sharing vs. true sharing) Consider the program below, is there any potential performance issue with it? If there is, how to fix it? Note that the two functions are concurrently executed on two different processor cores, each of which has its own cache hierarchy. What about static struct foo f[3]; 9 False Sharing vs True Sharing True sharing: true sharing, would require programmatic synchronization constructs to ensure ordered data access. False sharing: The frequent coordination required between processors when cache lines are marked ‘Invalid’ requires cache lines to be written to memory and subsequently loaded. False sharing increases this coordination and can significantly degrade application performance. OpenMP programs are known to be prone to false sharing. 10 Q7 (semaphore mechanism) If a process performs a sem_post(sem_t ∗sem) operation on a counting semaphore whose value is zero, and there are one or more other processes waiting on the semaphore, how many of the processes could be unblocked? A. Exactly one process B. All processes C. Zero D. Could be any value from one to N (the number of blocked process) 11 Q8 ( comparison across different synchronizations) Which of the following descriptions about semaphores and spinlocks are correct. A. Spinlocks waste CPU resources (busy waiting) while semaphores almost do not. B. A semaphore has a counter and could be acquired by multiple threads while a spinlock could only be acquired by one thread. C. When threads hold locks for a short period of time and the contention between threads is not significant, Spinlocks are preferred due to no context switch overheads. D. When threads hold locks for a short period of time and the contention between threads is not significant, Semaphores are preferred since spinlock’s busy-waiting will waste CPU cycles. 12 Please write out all the possible printouts of the following program A and B. Q9 (fork() and waitpid(-1)) 13 › Door opening and locking Q10 (Synchronization using mutex) 14 Lecture 7 : Condition Variables. What about semaphores? Q11 (writing a barrier) 15 OpenMP Basics 16 › What is OpenMP? - Open specification for Multi-Processing - “Standard” API for defining multi-threaded shared-memory programs - openmp.org – Talks, examples, forums, etc. - computing.llnl.gov/tutorials/openMP/ - portal.xsede.org/online-training - www.nersc.gov/assets/Uploads/XE62011OpenMP.pdf › High-level API - Preprocessor (compiler) directives ( ~ 80% ) - Library Calls ( ~ 19% ) - Environment Variables ( ~ 1% ) Introduction to OpenMP 17 A Programmer’s View of OpenMP › OpenMP is a portable, threaded, shared-memory programming specification with “light” syntax - Exact behavior depends on OpenMP implementation! - Requires compiler support (C, C++ or Fortran) › OpenMP will: - Allow a programmer to separate a program into serial regions and parallel regions, rather than T concurrently-executing threads. - Hide stack management - Provide synchronization constructs › OpenMP will not: - Parallelize automatically - Guarantee speedup - Provide freedom from data races 18 int main() { // Do this part in parallel printf( "Hello, World! " ); return 0; } Motivation – OpenMP 19 int main() { omp_set_num_threads(16); // Do this part in parallel #pragma omp parallel { printf( "Hello, World! " ); } return 0; } Motivation – OpenMP 20 Programming Model Because OpenMP is designed for shared memory parallel programming, it largely limited to single node parallelism. Typically, the number of processing elements (cores) on a node determine how much parallelism can be implemented. 21 › OpenMP easily parallelizes loops - Requires: No data dependencies (reads/write or write/write pairs) between iterations! › Preprocessor calculates loop bounds for each thread directly from serial source #pragma omp parallel for for( i=0; i < 25; i++ ) { printf(“Foo”); } ? Programming Model – Concurrent Loops ? 22 Motivation of using OpenMP 23 › No better materials then LLNL OpenMP tutorial. › We are now going to go through some of the important knowledge points using the tutorial: › https://hpc.llnl.gov/tuts/openMP/ LLNL OpenMP Tutorial 24

Leave a Reply

Your email address will not be published. Required fields are marked *