How to process files simultaneously with bash?

Suppose I have 10K filesa and a bash script that processes a single file. Now I would like to process all these files at the same time as running the K script in parallel. I do not want (obviously) to process any file more than once.

How do you propose to implement it in bash ?

+4
source share
3 answers

One way to do a limited number of parallel jobs is with GNU parallel . For example, using this command:

 find . -type f -print0 | parallel -0 -P 3 ./myscript {1} 

You will transfer all files in the current directory (and its subdirectories) as myscript parameters one at a time. The -0 option sets the delimiter as a null character, and the -P parameter sets the number of jobs executed in parallel. The default number of parallel processes is the number of cores in the system. There are other options for parallel processing in clusters, etc., which are documented here .

+12
source

I bash you can easily run the script part in another process just by using '(' and ')'. If you add & , the parent process will not wait for the child. So you are actually using ( command1; command2; command3; ... ) & :

 while ... do ( your script goes here, executed in a separate process ) & CHILD_PID = $! done 

And also $! gives you the PID of the child process. What else do you need to know? When you reach running processes k , you need to wait for others. This is done using wait <PID> :

 wait $CHILD_PID 

If you want to wait for them all, just use wait .

This should be enough to implement the system.

+5
source
  for f1 in *;do (( cnt = cnt +1 )) if [ cnt -le $k ];then nohup ./script1 $f1 & continue fi wait cnt=0 done 

check it out. did not have time

+1
source

Source: https://habr.com/ru/post/1488879/


All Articles