When the remote computer dies, the MPI manager could not detect it by calling MPI_Irecv

I am writing a program to detect a sudden failure of a remote machine. The dispatcher process starts on machine 1, and the workflow runs on machine 2. The server manager sends a message to the workflow by calling MPI_Isend. The remote worker receives the message by calling MPI_Irecv. After each call, I always check their return code to see if there is a problem with MPI_COMM_WORLD. I also check the return code MPI_Testthat runs after send and recv calls.

Be that as it may, the return code is always 0 even after rebooting machine 2. I can see that it MPI_Isendalways returns 0. Please give me some tips on how to detect a remote machine failure.

By the way, I used the following instruction:

MPI_Errhandler_set(MPI_COMM_WORLD,MPI_ERRORS_RETURN);
+4
source share
1 answer

He probably should have turned it back a long time ago to make it easier for others to keep track of it.


As discussed in other messages, MPI_Sendand the completion of a friend does not necessarily indicate that the message is received at the other end. It only MPI_Ssendimplies any termination, and even this only indicates that the receiver has begun to receive a message in its buffer.

MPI_Ssend, , , , , .

, , , MPI. , , . , MPI_Ssend. , -, (, MPI_Ssend MPI_Barrier, ).

0

Source: https://habr.com/ru/post/1538954/


All Articles