Some error in the parallel implementation of Red / Black Afterive Over Relaxation (SOR)

Question

Some error in the parallel implementation of Red / Black Afterive Over Relaxation (SOR)

One of my tasks is to implement the RED / BLACK SOR algorithm using MPI. The grid is divided as a control, and each iteration is divided into two phases: red and black. In each phase, the algorithm calculates either red or black non-boundary grid points. The rest of the implementation is similar to the wiki definition.

Full code: serial here , parallel here .

Here is a consistent implementation

iteration = 0; do { maxdiff = 0.0; for ( phase = 0; phase < 2 ; phase++){ for ( i = 1 ; i < N-1 ; i++ ){ for ( j = 1 + (even(i) ^ phase); j < N-1 ; j += 2 ){ Gnew = stencil(G,i,j); diff = fabs(Gnew - G[i][j]); if ( diff > maxdiff ) maxdiff = diff; G[i][j] = G[i][j] + omega * (Gnew-G[i][j]); } } } iteration++; } while (maxdiff > stopdiff);

For a parallel implementation, the grid is first divided equally into different nodes (column). For example, if the grid size is 8x8 and the nodes are 3, then we divide the grid as 8x3, 8x3, 8x2 into these nodes. During data exchange, data is exchanged to and with node left and right neighbors. The figure below should give a clear picture of the whole process.

 /* Initializing MPI */ MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); // divide grid equally among different nodes range(1, N - 1, totalnodes, mynode, &jsta, &jend); // neigboring nodes inext = mynode + 1; iprev = mynode - 1; iteration = 0; do { maxdiff = 0.0; for ( phase = 0; phase < 2 ; phase++){ // exchange column with neigboring node exchange(phase); for ( i = 1 ; i < N-1 ; i++ ){ // row for ( j = jsta + (even(i) ^ phase); j <= jend ; j += 2 ){ // column Gnew = stencil(G,i,j); diff = fabs(Gnew - G[i][j]); if ( diff > maxdiff ) maxdiff = diff; G[i][j] = G[i][j] + omega * (Gnew-G[i][j]); } } } MPI_Allreduce(&maxdiff, &diff, 1, MPI_DOUBLE, MPI_MAX, MPI_COMM_WORLD); maxdiff = diff; iteration++; } while (maxdiff > stopdiff); MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize();

The figure shows how the grid is divided and the neighbors communicate. Figure describes how grid is divided and neighbors communicate.

The problem is that the end result of parallel and serial SOR seems to change by a few bits. I need a fresh pair of eyes to skip the code to track this error, I think the communication between the nodes is working fine.

 $ cc -o sor-seq sor-seq.c -lm $ ./sor-seq 8 -print Running 8 x 8 SOR 6.006 5.525 5.330 5.251 5.234 5.276 5.417 5.799 6.621 6.204 5.984 5.879 5.850 5.892 6.032 6.338 6.952 6.687 6.523 6.432 6.395 6.409 6.483 6.640 7.181 7.069 6.988 6.931 6.891 6.864 6.852 6.857 7.382 7.420 7.429 7.414 7.373 7.306 7.203 7.059 7.607 7.799 7.896 7.920 7.884 7.782 7.595 7.294 7.926 8.273 8.436 8.488 8.459 8.344 8.101 7.643 8.506 8.929 9.088 9.136 9.120 9.033 8.821 8.298 $ mpicc -o sor-par sor-par.c $ mpirun -n 3 ./sor-seq 8 -print Running 8 x 8 SOR 5.940 5.383 5.092 4.882 4.677 4.425 4.072 3.507 6.496 5.939 5.542 5.201 4.839 4.392 3.794 2.950 6.786 6.334 5.938 5.542 5.086 4.512 3.761 2.773 6.994 6.672 6.334 5.942 5.450 4.809 3.964 2.873 7.197 7.028 6.784 6.442 5.965 5.308 4.414 3.228 7.445 7.457 7.335 7.075 6.660 6.045 5.157 3.896 7.807 8.020 8.022 7.864 7.555 7.055 6.273 5.032 8.443 8.795 8.868 8.805 8.640 8.348 7.848 6.920 Node: 0 5.940 5.383 5.092 4.882 0.000 0.000 0.000 0.000 6.496 5.939 5.542 5.201 0.000 0.000 0.000 0.000 6.786 6.334 5.938 5.542 0.000 0.000 0.000 0.000 6.994 6.672 6.334 5.942 0.000 0.000 0.000 0.000 7.197 7.028 6.784 6.442 0.000 0.000 0.000 0.000 7.445 7.457 7.335 7.075 0.000 0.000 0.000 0.000 7.807 8.020 8.022 7.864 0.000 0.000 0.000 0.000 8.443 8.795 8.868 8.805 0.000 0.000 0.000 0.000 Node: 1 0.000 0.000 5.092 4.882 4.677 4.425 4.072 0.000 0.000 0.000 5.542 5.201 4.839 4.392 3.794 0.000 0.000 0.000 5.938 5.542 5.086 4.512 3.761 0.000 0.000 0.000 6.334 5.942 5.450 4.809 3.964 0.000 0.000 0.000 6.784 6.442 5.965 5.308 4.414 0.000 0.000 0.000 7.335 7.075 6.660 6.045 5.157 0.000 0.000 0.000 8.022 7.864 7.555 7.055 6.273 0.000 0.000 0.000 8.868 8.806 8.640 8.348 7.848 0.000 Node: 2 0.000 0.000 0.000 0.000 0.000 4.425 4.072 3.507 0.000 0.000 0.000 0.000 0.000 4.392 3.794 2.950 0.000 0.000 0.000 0.000 0.000 4.512 3.761 2.773 0.000 0.000 0.000 0.000 0.000 4.809 3.964 2.873 0.000 0.000 0.000 0.000 0.000 5.308 4.414 3.228 0.000 0.000 0.000 0.000 0.000 6.045 5.157 3.896 0.000 0.000 0.000 0.000 0.000 7.055 6.273 5.032 0.000 0.000 0.000 0.000 0.000 8.348 7.848 6.920

+4

c parallel-processing mpi

Aman Jan 24 '12 at 18:05

source share

2 answers

David · Answer 1 · 2012-01-29T10:44:08+0000

Have you considered trying a debugger? I am very biased as I work for the MPI debugger, but I just downloaded your code and tried it in Allinea DDT, and it stopped at the stencil () when I found a reading outside the end of your array, initially by process 7.

To reproduce this, compile your code with "-g" to support debugging and access Allinea DDT from http://www.allinea.com , install and enable the "Debug memory" feature (and make sure that you have " Protected pages "above, with 1 page in the memory debugging settings).

Launch your program using, say, 8 processes, and you should soon find the answer in seconds.

Good luck

David

wildplasser · Answer 2 · 2012-01-29T11:34:18+0000

It looks suspicious:

 MPI_Isend(bufs1, icnt1, MPI_DOUBLE, iprev, 1, MPI_COMM_WORLD, &ireqs1); MPI_Isend(bufs2, icnt2, MPI_DOUBLE, inext, 1, MPI_COMM_WORLD, &ireqs2); MPI_Irecv(bufr1, N, MPI_DOUBLE, iprev, 1, MPI_COMM_WORLD, &ireqr1); MPI_Irecv(bufr2, N, MPI_DOUBLE, inext, 1, MPI_COMM_WORLD, &ireqr2); MPI_Wait(&ireqs1, &istatus); MPI_Wait(&ireqs2, &istatus); MPI_Wait(&ireqr1, &istatus); MPI_Wait(&ireqr2, &istatus);

I would expect something like:

 MPI_Isend(bufs1, icnt1, MPI_DOUBLE, iprev, 1, MPI_COMM_WORLD, &ireqs1); MPI_Isend(bufs2, icnt2, MPI_DOUBLE, inext, 1, MPI_COMM_WORLD, &ireqs2); MPI_Wait(&ireqs1, &istatus); MPI_Wait(&ireqs2, &istatus); MPI_Irecv(bufr1, N, MPI_DOUBLE, iprev, 1, MPI_COMM_WORLD, &ireqr1); MPI_Irecv(bufr2, N, MPI_DOUBLE, inext, 1, MPI_COMM_WORLD, &ireqr2); MPI_Wait(&ireqs1, &istatus); MPI_Wait(&ireqs2, &istatus);

And perhaps the middle two wait () are not needed at all.

Some error in the parallel implementation of Red / Black Afterive Over Relaxation (SOR)

More articles: