There is a difference between when the MPI function return function (lock against non-blocking) and when the corresponding operation completes (standard, synchronous, buffered, ready mode).
Non-blocking calls to MPI_I... immediately returned, regardless of whether the operation is completed or not. The operation continues in the background or asynchronously. Block calls are not returned if the operation has not completed. Non-blocking operations are represented by their descriptor, which can be used to perform a block wait ( MPI_WAIT ) or a non-blocking test ( MPI_TEST ) to complete.
The completion of the operation means that the provided data buffer is no longer processed by MPI, and therefore it can be reused. Send buffers become free for reuse either after the message has been completely placed on the network (including the case when part of the message is still buffered by the network equipment and / or driver), or the MPI implementation was buffered somewhere. The buffered case does not require the recipient to send the appropriate receive operation and, therefore, is not synchronized - the receive could be sent much later. Blocking synchronous sending MPI_SSEND not returned if the receiver has not sent a receiving operation, so it synchronizes both ranks. Non-blocking synchronous sending MPI_ISSEND returned immediately, but the asynchronous (background) operation will not be completed if the receiver has not sent the corresponding reception.
A lock operation is equivalent to a non-blocking operation, immediately followed by a wait. For instance:
MPI_Ssend(buf, len, MPI_TYPE, dest, tag, MPI_COMM_WORLD);
is equivalent to:
MPI_Request req; MPI_Status status; MPI_Issend(buf, len, MPI_TYPE, dest, tag, MPI_COMM_WORLD, &req); MPI_Wait(&req, &status);
Standard sending ( MPI_SEND / MPI_ISEND ) ends after the message is created, and the data buffer provided as the first argument can be reused. There is no synchronization guarantee - a message can receive buffering locally or remotely. In most implementations, there is usually a certain threshold in size: messages up to this size are buffered, and longer messages are sent synchronously. The threshold is implementation dependent.
Buffered messages always buffer messages to an intermediate buffer provided by the user, mainly performing a more complex memory copy operation. The difference between a lock ( MPI_BSEND ) and a non-blocking version ( MPI_IBSEND ) is that the first is not returned before all message data has been buffered.
A finished transfer is a special kind of operation. It succeeds only if the ranking of the recipient has already sent a receive operation by the time the sender makes a send call. This can reduce communication delay, eliminating the need for any kind of handshake.