Unable to start OpenMPI through more than two machines

When I tried to run the first example in boost :: mpi tutorial, I was unable to start more than two machines. In particular, it looked fine:

mpirun -hostfile hostnames -np 4 boost1

with each hostname in hostnames like <node_name> slots=2 max_slots=2. But, when I increase the number of processes to 5, it just freezes. I reduced the number slots/ max_slotsto 1 with the same result when I exceeded 2 cars. On nodes, this is displayed in the job list:

<user> Ss orted --daemonize -mca ess env -mca orte_ess_jobid 388497408 \
-mca orte_ess_vpid 2 -mca orte_ess_num_procs 3 -hnp-uri \
388497408.0;tcp://<node_ip>:48823

Also, when I kill him, I get this message:

node2- daemon did not report back when launched
node3- daemon did not report back when launched

The cluster is configured using mpiand boostlibs available on the installed NFS drive. Am I stumped with NFS? Or is something else going on?

Update: To be clear, the enhancement program that I run is

#include <boost/mpi/environment.hpp>
#include <boost/mpi/communicator.hpp>
#include <iostream>
namespace mpi = boost::mpi;

int main(int argc, char* argv[]) 
{
  mpi::environment env(argc, argv);
  mpi::communicator world;
  std::cout << "I am process " << world.rank() << " of " << world.size()
        << "." << std::endl;
  return 0;
}

@Dirk Eddelbuettel mpi hello_c.c, .

#include <stdio.h>
#include "mpi.h"

int main(int argc, char* argv[])
{
    int rank, size;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    printf("Hello, world, I am %d of %d\n", rank, size);
    MPI_Barrier(MPI_COMM_WORLD);
    MPI_Finalize();

   return 0;
}

, sshing . node mpi/boost , NFS. ( node, boost/mpi ), . "hello world", mpirun -H node1,node2 -np 12 ./hello,

[<node name>][[2771,1],<process #>] \
[btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] \
connect() to <node-ip> failed: No route to host (113)

"Hello World" , . node node .

"Hello world" mpirun -H node1 -np 12 ./hello 2 . ( , : orted , .)

, , mpi libs node, , NFS. ? , , mpi, ? , , , , ?

+3
3

: mpi, ssh, tcp/ip . , ssh- , . , iptables, hello .

: , , mpi , .

+5

:

  • MPI ", "?
  • localhost?
  • ssh
  • - ,

,

mpirun -H host1,host2,host3 -n 12 ./helloworld

. , Boost... , Boost MPI , .

+2

- mca btl_tcp_if_include eth0, eth0- OpenMPI , . - mca btl_tcp_if_exclude eth0. eth0 .

/etc/hosts :

10.1.2.13 node13

...

10.1.3.13 node13-ib

mpirun, TCP, TCP, (20 ) OpenMPI IP- 10.1.3.XXX , .

,

+2

Source: https://habr.com/ru/post/1738087/


All Articles