Usually the default advice with teams is the opposite: use a collective operation whenever possible, instead of encoding your own. The more information the MPI library has about a communication template, the more possibilities it should optimize internally.
Unless special hardware support is available, collective calls are actually implemented internally from the point of view of sending and receiving. But the actual communication pattern is likely to be more than just a series of sending and receiving. For example, using a tree to transfer part of the data may be faster than sending it to a bunch of receivers with the same rank. Much work is being done to optimize collective communications, and to do it better.
Having said that MPI_Alltoallv slightly different. It is difficult to optimize for all irregular communication scenarios at the MPI level, so it is possible that some kind of custom communication code may improve. For example, the implementation of MPI_Alltoallv can be synchronized: it may be necessary that all processes “check”, even if they should send a message of length 0. I, however, such an implementation is unlikely, but here alone in the wild .
So the real answer is "it depends." If the implementation of the MPI_Alltoallv library is a bad match for the task, the custom link code will win. But before you go this route, check if your MPI-3 colleagues are right for your problem.
source share