Python Multiprocessing Process Number

Question

Python Multiprocessing Process Number

I am using the Python multiprocessing pool module to create a process pool and assign tasks to it.

I created 4 processes and assigned 2 tasks, but tried to display their process number, but on the display I see only one process number "6952" ... Should it not print 2 process numbers

from multiprocessing import Pool from time import sleep def f(x): import os print "process id = " , os.getpid() return x*x if __name__ == '__main__': pool = Pool(processes=4) # start 4 worker processes result = pool.map_async(f, (11,)) #Start job 1 result1 = pool.map_async(f, (10,)) #Start job 2 print "result = ", result.get(timeout=1) print "result1 = ", result1.get(timeout=1)

Result: -

 result = process id = 6952 process id = 6952 [121] result1 = [100]

+5

python multiprocessing

user1050619 Nov 01 '14 at 22:12

source share

2 answers

It prints 2 process identifiers.

 result = process id = 6952 <=== process id = 6952 process id = 6952 <=== process id = 6952 [121] result1 = [100]

This is because your workflow completed quickly and was ready to process another request.

 result = pool.map_async(f, (11,)) #Start job 1 result1 = pool.map_async(f, (10,)) #Start job 2

In the code above, your employee completed the job and returned to the pool and was ready to complete task 2. This can happen for several reasons. The most common are that the worker is busy or not ready.

Here is an example when we will have 4 workers, but only one of them will be ready right away. So we know which one will do the job.

 # https://gist.github.com/dnozay/b2462798ca89fbbf0bf4 from multiprocessing import Pool,Queue from time import sleep def f(x): import os print "process id = " , os.getpid() return x*x # Queue that will hold amount of time to sleep # for each worker in the initialization sleeptimes = Queue() for times in [2,3,0,2]: sleeptimes.put(times) # each worker will do the following init. # before they are handed any task. # in our case the 3rd worker won't sleep # and get all the work. def slowstart(q): import os num = q.get() print "slowstart: process id = {0} (sleep({1}))".format(os.getpid(),num) sleep(num) if __name__ == '__main__': pool = Pool(processes=4,initializer=slowstart,initargs=(sleeptimes,)) # start 4 worker processes result = pool.map_async(f, (11,)) #Start job 1 result1 = pool.map_async(f, (10,)) #Start job 2 print "result = ", result.get(timeout=3) print "result1 = ", result1.get(timeout=3)

Example:

 $ python main.py slowstart: process id = 97687 (sleep(2)) slowstart: process id = 97688 (sleep(3)) slowstart: process id = 97689 (sleep(0)) slowstart: process id = 97690 (sleep(2)) process id = 97689 process id = 97689 result = [121] result1 = [100]

0

dnozay Nov 01 '14 at 10:42

source share

dano · Accepted Answer · 2014-11-01T22:36:25+0000

It is just up to the time. Windows should spawn 4 processes in the Pool , which must then be started, initialized, and prepared for use from Queue . On Windows, this requires that each child process re-import the __main__ module, and for Queue instances that were internally used by Pool , for each child. This takes a non-trivial amount of time. In fact, long enough when both of your map_async() calls are executed before all processes in the Pool are up and running. You can see this if you add some trace of the function performed by each worker in Pool :

 while maxtasks is None or (maxtasks and completed < maxtasks): try: print("getting {}".format(current_process())) task = get() # This is getting the task from the parent process print("got {}".format(current_process()))

Conclusion:

 getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> process id = 5145 getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> process id = 5145 getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> result = [121] result1 = [100] getting <ForkServerProcess(ForkServerPoolWorker-2, started daemon)> getting <ForkServerProcess(ForkServerPoolWorker-3, started daemon)> getting <ForkServerProcess(ForkServerPoolWorker-4, started daemon)> got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>

As you can see, Worker-1 starts and consumes both tasks before workers 2-4 try to consume from Queue . If you add a sleep call after creating the Pool instance in the main process, but before calling map_async you will see that different processes process each request:

 getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> getting <ForkServerProcess(ForkServerPoolWorker-2, started daemon)> getting <ForkServerProcess(ForkServerPoolWorker-3, started daemon)> getting <ForkServerProcess(ForkServerPoolWorker-4, started daemon)> # <sleeping here> got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> process id = 5183 got <ForkServerProcess(ForkServerPoolWorker-2, started daemon)> process id = 5184 getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> getting <ForkServerProcess(ForkServerPoolWorker-2, started daemon)> result = [121] result1 = [100] got <ForkServerProcess(ForkServerPoolWorker-3, started daemon)> got <ForkServerProcess(ForkServerPoolWorker-4, started daemon)> got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> got <ForkServerProcess(ForkServerPoolWorker-2, started daemon)>

(Note that the extra "getting / "got" statements you see are sent to each process to gracefully close them).

Using Python 3.x on Linux, I can reproduce this behavior using the 'spawn' and 'forkserver' , but not the 'fork' . Presumably, because the superposition of the child processes is much faster than their propagation and re-import of __main__ .

Python Multiprocessing Process Number

More articles: