Check MPI on the cluster

I am learning OpenMPI in a cluster. Here is my first example. I expect that the output will display a response from different nodes, but they all respond from the same node node062. I'm just wondering why and how I can get a report from different nodes to show that MPI actually distributes processes to different nodes? Thank you and welcome!

ex1.c

/* test of MPI */ #include "mpi.h" #include <stdio.h> #include <string.h> int main(int argc, char **argv) { char idstr[2232]; char buff[22128]; char processor_name[MPI_MAX_PROCESSOR_NAME]; int numprocs; int myid; int i; int namelen; MPI_Status stat; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); MPI_Get_processor_name(processor_name, &namelen); if(myid == 0) { printf("WE have %d processors\n", numprocs); for(i=1;i<numprocs;i++) { sprintf(buff, "Hello %d", i); MPI_Send(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD); } for(i=1;i<numprocs;i++) { MPI_Recv(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD, &stat); printf("%s\n", buff); } } else { MPI_Recv(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat); sprintf(idstr, " Processor %d at node %s ", myid, processor_name); strcat(buff, idstr); strcat(buff, "reporting for duty\n"); MPI_Send(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD); } MPI_Finalize(); } 

ex1.pbs

 #!/bin/sh # #This is an example script example.sh # #These commands set up the Grid Environment for your job: #PBS -N ex1 #PBS -l nodes=10:ppn=1,walltime=1:10:00 #PBS -q dque # export OMP_NUM_THREADS=4 mpirun -np 10 /home/tim/courses/MPI/examples/ex1 

compile and run:

 [ tim@user1 examples]$ mpicc ./ex1.c -o ex1 [ tim@user1 examples]$ qsub ex1.pbs 35540.mgt [ tim@user1 examples]$ nano ex1.o35540 ---------------------------------------- Begin PBS Prologue Sat Jan 30 21:28:03 EST 2010 1264904883 Job ID: 35540.mgt Username: tim Group: Brown Nodes: node062 node063 node169 node170 node171 node172 node174 node175 node176 node177 End PBS Prologue Sat Jan 30 21:28:03 EST 2010 1264904883 ---------------------------------------- WE have 10 processors Hello 1 Processor 1 at node node062 reporting for duty Hello 2 Processor 2 at node node062 reporting for duty Hello 3 Processor 3 at node node062 reporting for duty Hello 4 Processor 4 at node node062 reporting for duty Hello 5 Processor 5 at node node062 reporting for duty Hello 6 Processor 6 at node node062 reporting for duty Hello 7 Processor 7 at node node062 reporting for duty Hello 8 Processor 8 at node node062 reporting for duty Hello 9 Processor 9 at node node062 reporting for duty ---------------------------------------- Begin PBS Epilogue Sat Jan 30 21:28:11 EST 2010 1264904891 Job ID: 35540.mgt Username: tim Group: Brown Job Name: ex1 Session: 15533 Limits: neednodes=10:ppn=1,nodes=10:ppn=1,walltime=01:10:00 Resources: cput=00:00:00,mem=420kb,vmem=8216kb,walltime=00:00:03 Queue: dque Account: Nodes: node062 node063 node169 node170 node171 node172 node174 node175 node176 node177 Killing leftovers... End PBS Epilogue Sat Jan 30 21:28:11 EST 2010 1264904891 ---------------------------------------- 

UPDATE:

I would like to run several background jobs in one PBS script so that jobs can run at the same time. for example, in the example above, I added another call to run ex1 and change both runs as background in ex1.pbs

 #!/bin/sh # #This is an example script example.sh # #These commands set up the Grid Environment for your job: #PBS -N ex1 #PBS -l nodes=10:ppn=1,walltime=1:10:00 #PBS -q dque echo "The first job starts!" mpirun -np 5 --machinefile /home/tim/courses/MPI/examples/machinefile /home/tim/courses/MPI/examples/ex1 & echo "The first job ends!" echo "The second job starts!" mpirun -np 5 --machinefile /home/tim/courses/MPI/examples/machinefile /home/tim/courses/MPI/examples/ex1 & echo "The second job ends!" 

(1) The result is after qsub of this script with the previous compiled executable ex1.

 The first job starts! The first job ends! The second job starts! The second job ends! WE have 5 processors WE have 5 processors Hello 1 Processor 1 at node node063 reporting for duty Hello 2 Processor 2 at node node169 reporting for duty Hello 3 Processor 3 at node node170 reporting for duty Hello 1 Processor 1 at node node063 reporting for duty Hello 4 Processor 4 at node node171 reporting for duty Hello 2 Processor 2 at node node169 reporting for duty Hello 3 Processor 3 at node node170 reporting for duty Hello 4 Processor 4 at node node171 reporting for duty 

(2) However, I think the running time of ex1 is too fast and probably the two background jobs don't have too many overlaps, which is not the case when I apply the same to my real project. Therefore, I added sleep (30) to ex1.c to extend the running time of ex1, so that two jobs running ex1 in the background will run almost all the time at the same time.

 /* test of MPI */ #include "mpi.h" #include <stdio.h> #include <string.h> #include <unistd.h> int main(int argc, char **argv) { char idstr[2232]; char buff[22128]; char processor_name[MPI_MAX_PROCESSOR_NAME]; int numprocs; int myid; int i; int namelen; MPI_Status stat; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); MPI_Get_processor_name(processor_name, &namelen); if(myid == 0) { printf("WE have %d processors\n", numprocs); for(i=1;i<numprocs;i++) { sprintf(buff, "Hello %d", i); MPI_Send(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD); } for(i=1;i<numprocs;i++) { MPI_Recv(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD, &stat); printf("%s\n", buff); } } else { MPI_Recv(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat); sprintf(idstr, " Processor %d at node %s ", myid, processor_name); strcat(buff, idstr); strcat(buff, "reporting for duty\n"); MPI_Send(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD); } sleep(30); // new added to extend the running time MPI_Finalize(); } 

But after recompiling and qsub again, the results look wrong. Processes are interrupted. in ex1.o35571:

 The first job starts! The first job ends! The second job starts! The second job ends! WE have 5 processors WE have 5 processors Hello 1 Processor 1 at node node063 reporting for duty Hello 2 Processor 2 at node node169 reporting for duty Hello 3 Processor 3 at node node170 reporting for duty Hello 4 Processor 4 at node node171 reporting for duty Hello 1 Processor 1 at node node063 reporting for duty Hello 2 Processor 2 at node node169 reporting for duty Hello 3 Processor 3 at node node170 reporting for duty Hello 4 Processor 4 at node node171 reporting for duty 4 additional processes aborted (not shown) 4 additional processes aborted (not shown) 

in ex1.e35571:

 mpirun: killing job... mpirun noticed that job rank 0 with PID 25376 on node node062 exited on signal 15 (Terminated). mpirun: killing job... mpirun noticed that job rank 0 with PID 25377 on node node062 exited on signal 15 (Terminated). 

I wonder why there is an interruption of processes? How can I do background jobs correctly in a PBS script?

+4
source share
4 answers

a couple of things: you need to tell mpi where to start the processes, assuming you are using mpich, look at the mpiexec help section and find the machine file or equivalent description. If the machine file is not provided, it will work on the same host

PBS automatically creates a nodes file. Its name is stored in the PBS_NODEFILE environment variable, available in the PBS command file. Try the following:

 mpiexec -machinefile $PBS_NODEFILE ... 

If you are using mpich2, you have two downloads during the execution of mpi using mpdboot. I don’t remember the details of the command, you will need to read the man page. Do not forget to create a secret file, otherwise mpdboot will fail.

I will read your message again, you will use open mpi, you still have to provide the machine file to the mpiexec command, but you do not need to bother with mpdboot

+3
source

By default, PBS (I assume torque) allocates nodes in exclusive mode, so there is only one job per node. This is slightly different if you have multiple processors, most likely one process per processor. PBS can be changed to allocate nod in time-sharing mode, look at the qmgr.long man page for a short history, most likely you will not overlap nodes in the node file, since the node file is created when resources are available than during submission.

PBS's goal is to manage resources, most often time, node distribution (automatic).

The commands in the PBS file are executed sequentially. You can put processes in the background, but this can lead to victory in the allocation of resources, but I don’t know your exact workflow. I used background processes in PBS scripts to copy data before the main program runs in parallel using &. The PBS script is actually just a shell script.

you can assume that PBS knows nothing about the internal workings of your script. You can run multiple processes / threads in script mode. If you do, it is up to you and your operating system to balance the distribution of the kernel / processors. If you are using a multi-threaded program, you most likely need to start one mpi process for a node and then spawn OpenMP streams.

Let me know if you need clarification.

+2
source

As a diagnostic, try pasting these instructions right after your call to MPI_GET_PROCESSOR_NAME.

 printf("Hello, world. I am %d of %d on %s\n", myid, numprocs, name); fflush(stdout); 

If all processes return the same node identifier, this tells me that you do not quite understand what is going on in the job and cluster management system - perhaps PBS (despite the fact that you are apparently telling about it otherwise ), placing all 10 processes on one node (do you have 10 cores in node?).

If it gives different results, it tells me something is wrong with your code, although it looks good to me.

+1
source

There is an error in your code that is not related to mpich, you have reused me in your two cycles.

 for(i=1;i<numprocs;i++) { sprintf(buff, "Hello %d", i); MPI_Send(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD); } for(i=1;i<numprocs;i++) 

The second cycle of the cycle will be ruined.

0
source

Source: https://habr.com/ru/post/1299854/


All Articles