How to (trivially) parallelize with a Linux shell by running a single task for the Linux kernel?

Today processors usually contain several physical cores. They can even be multithreaded, so the Linux kernel sees a fairly large number of kernels and, accordingly, it runs the Linux scheduler several times (one for each kernel). When running several tasks on a Linux system, the scheduler usually provides a good distribution of the total workload to all Linux kernels (it can be the same physical kernel).

Now, let's say I have a large number of files to process with the same executable. I usually do this with the find command:

find <path> <option> <exec> 

However, this only launches one task at a time and waits for it to complete before the next task begins. Thus, only one core is used for this at any time. This leaves most cores inactive (if this find command is the only task running on the system). It would be much better to run N tasks at the same time. Where N is the number of kernels visible by the Linux kernel.

Is there a team that will do this?

+6
source share
3 answers

Use find with the -print0 option. Connect it to xargs using the -0 option. xargs also accepts the -P option to specify multiple processes. -P should be used in combination with -n or -L .

Read man xargs more details.

Command example: find . -print0 | xargs -0 -P4 -n4 grep searchstring find . -print0 | xargs -0 -P4 -n4 grep searchstring

+7
source

If you have GNU Parallel http://www.gnu.org/software/parallel/ installed, you can do this:

 find | parallel do stuff {} --option_a\; do more stuff {} 

You can install GNU Parallel simply:

 wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel chmod 755 parallel cp parallel sem 

Watch videos for GNU. Learn more at the same time: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

+2
source

Gnu parallel or xargs -P is probably the best way to handle this, but you can also write a kind of multitasking structure in bash. This is a bit dirty and unreliable, however, due to the lack of certain features.

 #!/bin/sh MAXJOBS=3 CJ=0 SJ="" gj() { echo ${1//[][-]/} } endj() { trap "" sigchld ej=$(gj $(jobs | grep Done)) jobs %$ej wait %$ej CJ=$(( $CJ - 1 )) if [ -n "$SJ" ]; then kill $SJ SJ="" fi } startj() { j=$* while [ $CJ -ge $MAXJOBS ]; do sleep 1000 & SJ=$! echo too many jobs running: $CJ echo waiting for sleeper job [$SJ] trap endj sigchld wait $SJ 2>/dev/null done CJ=$(( $CJ + 1 )) echo $CJ jobs running. starting: $j eval "$j &" } set -m # test startj sleep 2 startj sleep 10 startj sleep 1 startj sleep 1 startj sleep 1 startj sleep 1 startj sleep 1 startj sleep 1 startj sleep 2 startj sleep 10 wait 
0
source

Source: https://habr.com/ru/post/906775/


All Articles