I am trying to understand how to configure a distributed MPI cluster using ipython / ipyparallel. I do not have a strong MPI background.
I followed the following instructions in ipyparallel docs (using ipcluster in mpiexec / mpirun mode) and this is great for distributing computations on a single node machine. Therefore, we create a profile mpiby configuring it in accordance with the above instructions and starting the cluster
$ ipython profile create --parallel --profile=mpi
$ vim ~/.ipython/profile_mpi/ipcluster_config.py
Then on host A, I start the controller and 4 MPI mechanisms:
$ ipcontroller --ip='*' --profile=mpi
$ ipcluster engines --n=4 --profile=mpi
Running the following snippet:
from ipyparallel import Client
from mpi4py import MPI
c = Client(profile='mpi')
view = c[:]
print("Client MPI.COMM_WORLD.Get_size()=%s" % MPI.COMM_WORLD.Get_size())
print("Client engine ids %s" % c.ids)
def _get_rank():
from mpi4py import MPI
return MPI.COMM_WORLD.Get_rank()
def _get_size():
from mpi4py import MPI
return MPI.COMM_WORLD.Get_size()
print("Remote COMM_WORLD ranks %s" % view.apply_sync(_get_rank))
print("Remote COMM_WORLD size %s" % view.apply_sync(_get_size))
gives
Client MPI.COMM_WORLD.Get_size()=1
Client engine ids [0, 1, 2, 3]
Remote COMM_WORLD ranks [1, 0, 2, 3]
Remote COMM_WORLD size [4, 4, 4, 4]
Then, on host B, I launch 4 MPI engines. I run the fragment again, which gives
Client MPI.COMM_WORLD.Get_size()=1
Client engine ids [0, 1, 2, 3, 4, 5, 6, 7]
Remote COMM_WORLD ranks [1, 0, 2, 3, 2, 3, 0, 1]
Remote COMM_WORLD size [4, 4, 4, 4, 4, 4, 4, 4]
, ipcluster 4, , . MPI.
:
- ipython/ipyparallel, , MPI . ipyparallel MPI, , , MPI-, IPython MPI Machine? , , ipyparallel , , , .
- - , MPI ipyparallel? googled , .
- , ipython/ipyparallel MPI, ?
, , , MPI . , :
, , MPI , .
, , , Torque/SLURM .. , , , , mpiexec, , .
: , ipyparallel MPI, ipyparall MPI.