I would like to run the same program on a large number of different input files. I could just imagine each as a separate Slurm view, but I don’t want to swamp the queue, immediately dropping 1000 units of tasks. I tried to figure out how to handle the same number of files, instead creating a selection first, and then within this distribution cycle across all files using srun, giving each call one core from the selection. The problem is that no matter what I do, only one work step is performed. The simplest test case I could come up with is:
#!/usr/bin/env bash
srun --exclusive --ntasks 1 -c 1 sleep 1 &
srun --exclusive --ntasks 1 -c 1 sleep 1 &
srun --exclusive --ntasks 1 -c 1 sleep 1 &
srun --exclusive --ntasks 1 -c 1 sleep 1 &
wait
No matter how many cores I assign:
time salloc -n 1 test
time salloc -n 2 test
time salloc -n 4 test
it always takes 4 seconds. Is it impossible to complete several task steps in parallel?