MPI Send and Recv freezes with a buffer size greater than 64kb

I am trying to send data from process 0 to process 1. This program failed when the buffer size is less than 64 KB, but freezes if the buffer becomes much larger. The following code should reproduce this problem (should freeze), but should succeed if n changed to less than 8000.

 int main(int argc, char *argv[]){ int world_size, world_rank, count; MPI_Status status; MPI_Init(NULL, NULL); MPI_Comm_size(MPI_COMM_WORLD, &world_size); MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); if(world_size < 2){ printf("Please add another process\n"); exit(1); } int n = 8200; double *d = malloc(sizeof(double)*n); double *c = malloc(sizeof(double)*n); printf("malloc results %p %p\n", d, c); if(world_rank == 0){ printf("sending\n"); MPI_Send(c, n, MPI_DOUBLE, 1, 0, MPI_COMM_WORLD); printf("sent\n"); } if(world_rank == 1){ printf("recv\n"); MPI_Recv(d, n, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status); MPI_Get_count(&status, MPI_DOUBLE, &count); printf("recved, count:%d source:%d tag:%d error:%d\n", count, status.MPI_SOURCE, status.MPI_TAG, status.MPI_ERROR); } MPI_Finalize(); } Output n = 8200; malloc results 0x1cb05f0 0x1cc0640 recv malloc results 0x117d5f0 0x118d640 sending Output n = 8000; malloc results 0x183c5f0 0x184c000 recv malloc results 0x1ea75f0 0x1eb7000 sending sent recved, count:8000 source:0 tag:0 error:0 

I found this question and this question , which is similar, but I believe the problem is creating dead ends. I would not expect such a problem, because each process performs only one send or receive.

EDIT: added status check.

EDIT2: It seems the problem is that I installed OpenMPI, but also installed Intel's MPI implementation when I installed MKL. My code compiled with the OpenMPI header and libraries, but worked with Intel mpirun. Everything works as expected when I guarantee that I am running the mpirun executable from OpenMPI.

+5
source share
2 answers

The problem was installing both Intel MPI and OpenMPI. I saw that /usr/include/mpi.h belongs to OpenMPI, but mpicc and mpirun were implemented by Intel:

 $ which mpicc /opt/intel/composerxe/linux/mpi/intel64/bin/mpicc $ which mpirun /opt/intel/composerxe/linux/mpi/intel64/bin/mpirun 

I managed to solve the problem by running

 /usr/bin/mpicc 

and

 /usr/bin/mpirun 

so that I use OpenMPI.

Thanks to @Zulan and @gsamaras for suggesting checking my installation.

+3
source

Code is OK! I just checked with version 3.1.3 ( mpiexec --version ):

 linux16:/home/users/grad1459>mpicc -std=c99 -O1 -o px px.c -lm linux16:/home/users/grad1459>mpiexec -n 2 ./px malloc results 0x92572e8 0x9267330 sending sent malloc results 0x9dc92e8 0x9dd9330 recv recved, count:8200 source:0 tag:0 error:1839744 

As a result, the problem is related to your installation. Run the following troubleshooting options:

  • Check the result of malloc *
  • Check status

I would argue that the return value of malloc() is NULL , since you mention that it fails if you request more memory. The system may refuse to provide this memory.


I was partially right, the problem arose with the installation, but as OP said:

The problem seems to be that I installed OpenMPI, but also installed Intel's MPI implementation when I installed MKL. My code compiled with the OpenMPI header and libraries, but worked with Intel mpirun. Everything works as expected when I guarantee that I am running the mpirun executable from OpenMPI.

* verify that `malloc` succeeded in C

+1
source

Source: https://habr.com/ru/post/1246348/


All Articles