The cube material does not matter: consider
#include <mpi.h> #include <cstdlib> using namespace std; int main(int argc, char *argv[]) { int size, rank; const int root = 0; int datasize = atoi(argv[1]); MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (rank != root) { int nodeDest = rank + 1; if (nodeDest > size - 1) { nodeDest = 1; } int nodeFrom = rank - 1; if (nodeFrom < 1) { nodeFrom = size - 1; } MPI_Status status; int *data = new int[datasize]; for (int i=0; i<datasize; i++) data[i] = rank; cout << "Before send" << endl; MPI_Send(&data, datasize, MPI_INT, nodeDest, 0, MPI_COMM_WORLD); cout << "After send" << endl; MPI_Recv(&data, datasize, MPI_INT, nodeFrom, 0, MPI_COMM_WORLD, &status); delete [] data; } MPI_Finalize(); return 0; }
where running gives
$ mpirun -np 4 ./send 1 Before send After send Before send After send Before send After send $ mpirun -np 4 ./send 65000 Before send Before send Before send
If in DDT you looked at the message queue window, you will see that everyone sends, but no one receives, and you have the classic deadlock .
MPI_Send enough, MPI_Send semantics are not defined, but it is allowed to block until the "reception has not been sent." MPI_Ssend is clearer in this regard; it will always be blocked until the message is sent. More information about the different sending modes can be seen here .
The reason she worked for small messages is an implementation accident; for messages "small enough" (for your case it looks like <64kB), your MPI_Send implementation uses the "eager sending" protocol and does not block reception; for larger messages, where it is not necessarily safe to simply store buffered copies of the message that are written in memory, Send expects an appropriate receipt (which is always allowed to be done anyway).
There are several things you could do to avoid this; all you have to do is make sure that not everyone calls the MPI_Send lock at the same time. You could (say) first transfer the processors, then receive, and the odd processors first receive, and then send. You can use a non-blocking connection (Isend / Irecv / Waitall). But the simplest solution in this case is to use MPI_Sendrecv , which is a lock (Send + Recv), and not a blocking send plus a blocking reception. Transmission and reception will be performed at the same time, and the function will be blocked until both are completed. So it works
#include <mpi.h> #include <cstdlib> using namespace std; int main(int argc, char *argv[]) { int size, rank; const int root = 0; int datasize = atoi(argv[1]); MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (rank != root) { int nodeDest = rank + 1; if (nodeDest > size - 1) { nodeDest = 1; } int nodeFrom = rank - 1; if (nodeFrom < 1) { nodeFrom = size - 1; } MPI_Status status; int *outdata = new int[datasize]; int *indata = new int[datasize]; for (int i=0; i<datasize; i++) outdata[i] = rank; cout << "Before sendrecv" << endl; MPI_Sendrecv(outdata, datasize, MPI_INT, nodeDest, 0, indata, datasize, MPI_INT, nodeFrom, 0, MPI_COMM_WORLD, &status); cout << "After sendrecv" << endl; delete [] outdata; delete [] indata; } MPI_Finalize(); return 0; }
Launch gives
$ mpirun -np 4 ./send 65000 Before sendrecv Before sendrecv Before sendrecv After sendrecv After sendrecv After sendrecv