Is there an MPI implementation that allows you to dynamically add / remove nodes at runtime?
These are actually two questions. Nodes can usually be dynamically added at runtime using type calls MPI_Comm_spawn. As @Hristo noted in the comments, you must set the correct info key in Open MPI. It is also possible in other implementations. Regarding the removal of nodes, this is a large area of โโresearch at the moment. Most MPI implementations currently have varying levels of success, while maintaining a common node gap. In the current releases of Open MPI , I do not believe that there is any support for this kind of failure [citation needed], although there is work to improve this. In the current version of MPICH, you can pass the flag -disable-auto-cleanuptompiexec, and it will not automatically clean your application after the / node process crashes. However, you still have to modify the MPI application to deal with this situation. Various MPICH derivatives (Intel MPI, Cray MPI, IBM MPI, MVAPICH, etc.) do not support this AFAIK function. There are other research implementations that are also available to expand support for the MPI standard. Troubleshooting at the user level is currently being considered by the standardization body as a way to allow the user to handle process failures. There is a research implementation on the Open MPI site, and an experimental prototype will also be introduced in the next version of MPICH (3.2).
node, node ?
, . API- , - , , . , .