This means that the pools must be initialized after defining the functions that will be performed on them. Using pools inside if __name__ == "__main__": blocks works if you are writing a stand-alone script, but this is not possible in large code bases or in server code (for example, in a Django or Flask project). Therefore, if you are trying to use pools in one of them, be sure to follow these recommendations, which are described in the following sections:
- Initialize the pools at the bottom of the modules or inside the functions.
- Do not call pool methods in the global scope of the module.
Alternatively, if you only need the best I / O concurrency (for example, database access or network calls), you can save yourself from all this headache and use thread pools instead of process pools. This includes completely undocumented:
from multiprocessing.pool import ThreadPool
Its interface is exactly the same as that of the pool, but since it uses threads, not processes, it is not accompanied by any of the caveats that process pools use, except that you do not get true concurrency of code execution, just concurrency in blocking input / output.
Pools should be initialized after defining the functions that will be executed on them
The incomprehensible text from the python documentation means that during the definition of the pool, the surrounding module is imported by the threads in the pool. In the case of the python terminal, this means all and only the code that you have already run.
Therefore, any functions that you want to use in the pool must be defined before it is initialized . This is true both for the code in the module and for the code in the terminal. The following code modifications in the question will work fine:
from multiprocessing import Pool def f(x): return x
Or
from multiprocessing import Pool def f(x): print(x)
Ok, I mean good on Unix. Windows has its own problems, which I will not talk about here.
Module Pool Cautions
But wait, there is something else (for using pools in modules that you want to import to another location)!
If you define a pool inside a function, you have no problem. But if you use the Pool object as a global variable in the module, it should be defined at the bottom of the page, not at the top . While this goes against most good code styles, it is necessary for functionality. The way to use the pool declared at the top of the page is to use it only with functions imported from other modules, for example:
from multiprocessing import Pool from other_module import f p = Pool(3) p.map(f, range(20))
Importing a pre-configured pool from another module is quite terrible, since the import should go after what you want to run on it, for example:
from multiprocessing import Pool POOL = Pool(5) def f(x):
And secondly, if you execute something in a pool in the global scope of the imported module, the system freezes . i.e. this does not work:
from multiprocessing import Pool def f(x): return x p = Pool(1) print(p.map(f, range(5))) import module
This, however, works as long as nothing imports module2:
from multiprocessing import Pool def f(x): return x p = Pool(1) def run_pool(): print(p.map(f, range(5))) import module module.run_pool()
Now the reasons for this are only more bizarre and probably due to the fact that the code in the question generates an attribute error only once each, and after that it seems that the code is executing correctly. It also seems that the pool threads (at least with some reliability) reload the code in the module after execution.