I tried to map some code using concurrent.futures.ProcessPoolExecutor
, but I had weird deadlocks that didn't occur with ThreadPoolExecutor
. Minimal example:
from concurrent import futures def test(): pass with futures.ProcessPoolExecutor(4) as executor: for i in range(100): print('submitting {}'.format(i)) executor.submit(test)
In python 3.2.2 (on 64-bit Ubuntu) this seems to freeze all the time after sending all jobs - and this seems to happen whenever the number of jobs is greater than the number of workers. If I replaced ProcessPoolExecutor
with ThreadPoolExecutor
, it works flawlessly.
As an exploration attempt, I gave each future a callback to print the value of i
:
from concurrent import futures def test(): pass with futures.ProcessPoolExecutor(4) as executor: for i in range(100): print('submitting {}'.format(i)) future = executor.submit(test) def callback(f): print('callback {}'.format(i)) future.add_done_callback(callback)
This confused me even more - the value i
printed by callback
is the value at the time it was called, and not at the time it was defined (so I never see callback 0
, but I get a lot of callback 99
s). Again, ThreadPoolExecutor
displays the expected value.
Wondering if this could be a bug, I tried the latest version of python. Now the code at least ends, but I still get the wrong value of i
.
So can someone explain:
what happened to the ProcessPoolExecutor
between python 3.2 and the current version of dev, which apparently fixed this deadlock
why the "wrong" value i
is printed
EDIT: as Yukevich pointed out, of course, printing i
will print the value during the callback call, I donβt know what I thought ... if I pass the called object with the value of of i
as one of its attributes, it works as expected.
EDIT: a bit more info: all callbacks are executed, so it looks like executor.shutdown
(called by executor.__exit__
), which cannot say that the processes are finished. This seems to be fully fixed in current python 3.3, but there seems to have been a lot of changes to multiprocessing
and concurrent.futures
, so I don't know what is fixed. Since I cannot use 3.3 (it does not seem to be compatible with either the version or the numpy version), I tried just copying the multiprocessor and parallel packages to my 3.2 installation, which seems to work fine. However, it seems a little strange that, as far as I see, ProcessPoolExecutor
completely broken in the latest version, but no one was hurt.