Running multiple SLURM worker daemons

I want to run several working daemons on the same machine. According to damienfrancois answer to what is the minimum number of computers for a slurm cluster , this can be done. Currently, the problem is that I can only run one working daemon on one machine. eg

When i started

sudo slurmd -N linux1 -cDvv sudo slurmd -N linux2 -cDvv 

linux1 is omitted when linux2 starts. Is it possible to run several working daemons on the same machine? Here is my slurm.conf file

+2
source share
2 answers

since your intention seems to be just checking Slurm's behavior, I would recommend that you use front-end mode, where you can create dummy compute nodes on a single computer.

In your FAQ you have more detailed information, but basically you should configure your installation to work with this mode:

 ./configure --enable-front-end 

And configure the nodes in slurm.conf

 NodeName=test[1-100] NodeHostName=localhost 

In this guide, they also explain how to run several real daemons in the same node by changing the ports, but this was not necessary for my testing purposes.

Good luck

+3
source

I have the same problem as yours, I solved it by changing the paths of the log files, as mentioned there multiple support for slurmd . For example, in your slurm.conf

 SlurmdLogFile=/var/log/slurm/slurmd.log SlurmdPidFile=/var/run/slurmd.pid SlurmdSpoolDir=/var/spool/slurmd 

it should be

 SlurmdLogFile=/var/log/slurm/slurmd.%n.log SlurmdPidFile=/var/run/slurmd.%n.pid SlurmdSpoolDir=/var/spool/slurmd.%n 

Now you can run several slurmd.

Note. I tried to use your slurm conf, I think that some parameters are missing, how to define two NodeName instead of one and add which port to use for each node. It works for me

 # COMPUTE NODES NodeName=linux[1-10] NodeHostname=linux0 Port=17004 CPUs=1 State=UNKNOWN NodeName=linux[11-19] NodeHostname=linux0 Port=17005 CPUs=1 State=UNKNOWN # PARTITIONS PartitionName=main Nodes=linux1 Default=YES MaxTime=INFINITE State=UP PartitionName=dev Nodes=linux11 Default=YES MaxTime=INFINITE State=UP 
0
source

Source: https://habr.com/ru/post/1272932/


All Articles