I think that you fundamentally misunderstood what the scatter operation does and how MPI expects memory to be allocated and used.
MPI_Scatter takes the source array and breaks it into pieces, sending a unique element to each member of the MPI communicator. In your example, you need your matrix to be a distribution of adjacent p*p elements in linear memory that would send p values ββfor each process. Your original βmatrixβ is an array of pointers. There is no guarantee that the lines are sequentially located in memory, and MPI_Scatter does not know how to go through the array of pointers that you passed. As a result, the call simply reads past the end of the first row that you passed, by indirectly accessing the matrix pointer, processing everything that follows in memory as data. This is why you get garbage values ββin processes that receive data after the first row.
All MPI data copy routines assume that the source and target memory are βflatβ linear arrays. C multidimensional arrays should be stored in the main row order, and not in pointer arrays, as you did here. The cheap and unpleasant hack of your example, illustrating the correct operation of the scatter, would look like this:
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <mpi.h> int *createMatrix (int nrows, int ncols) { int *matrix; int h, i, j; if (( matrix = malloc(nrows*ncols*sizeof(int))) == NULL) { printf("Malloc error"); exit(1); } for (h=0; h<nrows*ncols; h++) { matrix[h] = h+1; } return matrix; } void printArray (int *row, int nElements) { int i; for (i=0; i<nElements; i++) { printf("%d ", row[i]); } printf("\n"); } int main (int argc, char **argv) { if (MPI_Init(&argc, &argv) != MPI_SUCCESS) { perror("Error initializing MPI"); exit(1); } int p, id; MPI_Comm_size(MPI_COMM_WORLD, &p); // Get number of processes MPI_Comm_rank(MPI_COMM_WORLD, &id); // Get own ID int *matrix; if (id == 0) { matrix = createMatrix(p, p); // Master process creates matrix printf("Initial matrix:\n"); printArray(matrix, p*p); } int *procRow = malloc(sizeof(int) * p); // received row will contain p integers if (procRow == NULL) { perror("Error in malloc 3"); exit(1); } if (MPI_Scatter(matrix, p, MPI_INT, // send one row, which contains p integers procRow, p, MPI_INT, // receive one row, which contains p integers 0, MPI_COMM_WORLD) != MPI_SUCCESS) { perror("Scatter error"); exit(1); } printf("Process %d received elements: ", id); printArray(procRow, p); MPI_Finalize(); return 0; }
which does this:
$ mpicc -o scatter scatter.c $ mpiexec -np 4 scatter Initial matrix: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Process 0 received elements: 1 2 3 4 Process 1 received elements: 5 6 7 8 Process 2 received elements: 9 10 11 12 Process 3 received elements: 13 14 15 16
i.e. when you transfer data stored in linear memory, it works. The equivalent main array of strings will be statically allocated as follows:
int matrix[4][4] = { { 1, 2, 3, 4 }, { 5, 6, 7, 8 }, { 9, 10, 11, 12 }, { 13, 14, 15, 16 } };
Note the difference between a statically allocated two-dimensional array and an array of pointers that your code allocates dynamically. They are not at all the same thing, although in appearance they are similar.