I ran into a problem performing non-blocking MPI transmission, where it crashes the machine with a segmentation error. All machines correctly receive data, but a machine with identifier 0 crashes during the MPI_Waitall() operation. Can anyone determine the cause of the problem? Thanks!
Here is the source code of the program and the error report that I get at startup:
#include <stdio.h> #include <stdlib.h> #include <mpi.h> #define BLOCK_LOW(id,p,n) ((id)*(n)/(p)) #define BLOCK_HIGH(id,p,n) (BLOCK_LOW((id)+1,p,n)-1) #define BLOCK_SIZE(id,p,n) (BLOCK_HIGH(id,p,n)-BLOCK_LOW(id,p,n)+1) #define BLOCK_OWNER(id,p,n) (((p)*((id)+1)-1)/(n)) #define LENGTH 100 int main(int argc, char *argv[]) { int id, p, i; MPI_Request* sendRequests; MPI_Status* sendStatuses; MPI_Request receiveRequest; MPI_Status receiveStatus; int array[LENGTH]; int array2[LENGTH]; MPI_Init(&argc, &argv); MPI_Barrier(MPI_COMM_WORLD); for (i = 0; i < LENGTH; i++) { array[i] = i * 5; array2[i] = 0; } MPI_Comm_rank(MPI_COMM_WORLD, &id); MPI_Comm_size(MPI_COMM_WORLD, &p); if (id == 0) { sendRequests = malloc((p-1) * sizeof(MPI_Request)); for (i = 1; i < p; i++) { MPI_Isend(array + BLOCK_LOW(i-1, p-1, LENGTH), BLOCK_SIZE(i-1, p-1, LENGTH), MPI_INT, i, 0, MPI_COMM_WORLD, &sendRequests[i-1]); } MPI_Waitall(p-1, sendRequests, sendStatuses); } else { MPI_Recv(array2, BLOCK_SIZE(id-1, p-1, LENGTH), MPI_INT, 0, 0, MPI_COMM_WORLD, &receiveStatus); for (i = 0; i < BLOCK_SIZE(id-1, p-1, LENGTH); i++) { printf("Element %d (%d): %d\n", i, i + BLOCK_LOW(id-1, p-1, LENGTH), array2[i]); } } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); return 0; }
This is the error I get when running the code:
[lin12p5:13467] *** Process received signal *** [lin12p5:13467] Signal: Segmentation fault (11) [lin12p5:13467] Signal code: Invalid permissions (2) [lin12p5:13467] Failing at address: 0x400f30 [lin12p5:13467] [ 0] /lib/libpthread.so.0(+0xeff0) [0x7fa96ab4eff0] [lin12p5:13467] [ 1] /usr/lib/libmpi.so.0(+0x37f01) [0x7fa96bad5f01] [lin12p5:13467] [ 2] /usr/lib/libmpi.so.0(PMPI_Waitall+0xb3) [0x7fa96bb06b73] [lin12p5:13467] [ 3] mpi-test(main+0x232) [0x400da6] [lin12p5:13467] [ 4] /lib/libc.so.6(__libc_start_main+0xfd) [0x7fa96a7fcc8d] [lin12p5:13467] [ 5] mpi-test() [0x400ab9] [lin12p5:13467] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 13467 on node lab12p5 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- [lin13p5][[33088,1],1][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)