程序代写代做代考 compiler algorithm Take-home Test
Take-home Test
Take-home Test
What needs to be done?
• Take the sequential program in tempsim.c and write an OpenMP
version and an MPI version
• You will write a short report on these (described later)
• Run on 1, 2, 4 and 8 cores and processes.
• Use a single node to run these – this will give you faster turn-around
• Size your data so that the 8 core version takes less than 20 seconds to run,
and is long enough you see a difference in times over different runs. We
don’t need to run for long periods of time for this.
• Show a speedup on some number of processors. You may increase the
problem size, if necessary.
• A hint: printf(“after h init
”); fflush(stdout );
The program (in simtemp.c)
#include
#include
#include
#include
int main (void) {
int T = 100; // number of timesteps
int L = 10000; // length of the strip
int H = 100; // height of the strip
double strip[H][L]; // array holding temperature of the strip.
int l, h, t;
double avg = 0.0;
double exectime = -omp_get_wtime();
// init the strip. The left and right sides are -30C, the top and
bottom edges are 100C.
for (h = 0; h < H; h++) {
strip[h][0] = -30.00;
strip[h][L-1] = -30.00;
}
for (l = 1; l < L-1; l++) {
strip[0][l] = 100.00;
strip[H-1][l] = 100.00;
}
for (h = 1; h < H-1; h++)
for (l = 1; l < L-1; l++)
strip[h][l] = 0.0;
for (t = 0;t < T; t++)
// printf("in t=%d
",t); fflush(stdout );
for (h = 1; h < H-1; h++)
for (l = 1; l < L-1; l++)
strip[h][l] = (strip[h][l] + (strip[h-1][l] + strip[h+1][l] +
strip[h][l-1] + strip[h][l+1])/4.0)/2.0;
for (h = 1; h < H-1; h++)
for (l = 1; l < L-1; l++)
avg += strip[h][l];
// printf("after avg loop
"); fflush(stdout );
avg = avg / ((H-2)*(L-2));
exectime = exectime + omp_get_wtime();
printf("average temperature of the strip = %lf, time = %lf
",
avg, exectime);
return 0;
}
The MPI version
• Convert simtemp.c to MPI, and time it.
• The t loop should not be parallelized
• Leave the timing information where it is. All I/O should be outside of
timing loops.
• Use reductions where possible.
• Time this for 1, 2, 4, 8 and 16 processes using a single node.
• If the strip array is distributed across processes, we need to have halo
regions
• This is because each process will need to use data from the adjacent
process to compute the temperatures at the edges of the part of the
strip stored on a process.
P0 P1 P2 Pn-1. . .
0 9 10 19 20 29 L-10 L-1
0
H-1
Shared data
Patterned data is shared between adjacent processes. The red top and
bottom are initialized to 100, the blue sides are initialized to -30.
Neither of these are updated.
Three solutions
1. Add buffer areas to the array based on the number of cores. Start
up tasks to operate on each region. I.e., use OpenMP tasks like MPI
processes
2. Use a “red-black” algorithm on the strip. First we do the red regions, then the
black. This eliminates races. Make sure you create enough iterations in your
parallel loop to make it run in parallel.
this
And this
in parallel
2. Use a “red-black” algorithm on the strip. First we do the red
regions, then the black. This eliminates races.
Then this
And this
in parallel
3. Use a fine-grained red-black algorithm. Do all of the points
corresponding to red boxes in parallel, and then all of the points
corresponding to black boxes. This is the most standard on shared
memory machines.
The OMP program
• Convert simtemp.c to openMP, and time it.
• The t loop should not be parallelized
• Leave the timing information where it is. All I/O should be outside of
timing loops.
• Use reductions where possible.
• Use SIMD and parallel for simd where appropriate.
• If the Intel compiler on scholar does not support SIMD, let me know.
• Time this for 1, 2, 4, 8 and 16 threads using a single node.
• Note that the same issues as we had with MPI and halo regions exist here.
• Updating and reading border values can lead to races, slow convergence,
etc.
What to turn in
• Your code
• A report
• Should have a table comparing OpenMP and MPI run times
• A high level description of how you distributed your data in MPI
• You should have some speedup
• You can use scanned hand drawn pictures. Life is too short to do powerpoint
for a take-home exam.
• Turn it in to blackboard as a zip file of the directory containing your
code and report. The directory should be named
the zip file should be
Take-home Test
What needs to be done?
The program (in simtemp.c)
The MPI version
Slide Number 5
Shared data
Three solutions
Slide Number 11
Slide Number 12
Slide Number 13
The OMP program
What to turn in