What is a simple mechanism for synchronous unification processes on Unix?

I need to limit the number of processes running in parallel. For example, I would like to execute this line of psuedo commands:

export POOL_PARALLELISM=4 for i in `seq 100` ; do pool foo -bar & done pool foo -bar # would not complete until the first 100 finished. 

Therefore, despite the fact that 101 foo is queued for launch, only 4 will be executed at any given time. pool would be fork () / exit () and leave the remaining processes to completion.

Is there a simple mechanism for this with Unix tools? at and batch not applied, as they usually call at the top of the minute, and also perform tasks sequentially. Using a queue is not necessarily the best, because I want these synchronous.

Before writing a C-shell using semaphores and shared memory, and then debugging deadlocks, which I will definitely introduce, can anyone recommend bash / shell or another tool for this.

+4
source share
1 answer

There is no need to write this tool yourself, there are several good options.

make

make can do this quite easily, but it relies heavily on process control files. (If you want to run some operation on each input file that creates the output file, this can be surprising.) The -j command line will indicate the specified number of tasks, and the -l command-line parameter will indicate the average load on the system to be performed. before starting new tasks. (Which may be nice if you want to do some work β€œin the background.” Don't forget about the nice(1) command, which can also help here.)

So, a quick (and untested) Makefile for image conversion:

 ALL=$(patsubst cimg%.jpg,thumb_cimg%.jpg,$(wildcard *.jpg)) .PHONY: all all: $(ALL) convert $< -resize 100x100 $@ 

If you run this with make , it will work one by one. If you run with make -j8 , it will do eight separate tasks. If you run make -j , it will start hundreds. (When compiling the source code, I find that a two-digit number of cores is a great starting point. This gives each processor something to do, waiting for disk I / O requests. Different machines and different loads may work differently.)

xargs

xargs provides the command line --max-procs . This is best if parallel processes can be split into a single input stream, either with the ascii NUL split input commands, or with new I / O commands. (Well, the -d option allows you to choose something else, but the two are common and easy.) This gives you the advantage of using find(1) syntax instead of writing funny expressions like the Makefile example above, or allowing your input Do not bind to files completely. (Think about whether you had a program for factoring large composite numbers into prime coefficients, which would make this task suitable for make inconvenient at best. xargs could do this easily.)

The previous example might look something like this:

 find . -name '*jpg' -print0 | xargs -0 --max-procs 16 -I {} convert {} --resize 100x100 thumb_{} 

parallel

The moreutils package (available at least on Ubuntu) provides the parallel command. It can be launched in two different ways: either execute the specified command for different arguments, or run different commands in parallel. The previous example might look like this:

 parallel -i -j 16 convert {} -resize 100x100 thumb_{} -- *.jpg 

beanstalkd

The beanstalkd program uses a completely different approach: it provides a message bus for sending requests, and task servers block the task, complete the tasks, and then return to waiting for a new task in the queue. If you want to write data back to the specific HTTP request that initiated the task, this may not be very convenient, since you must provide this mechanism yourself (possibly another "pipe" on the beanstalkd server), but if the end result sends data to a database or email or something similar asynchronously, it might be easiest to integrate into an existing application.

+4
source

Source: https://habr.com/ru/post/1398711/


All Articles