Using python multiprocessing pool in terminal and in code modules for Django or Flask

Question

Using python multiprocessing pool in terminal and in code modules for Django or Flask

When using multiprocessing.Pool in python with the following code, there are several fancy actions.

from multiprocessing import Pool p = Pool(3) def f(x): return x threads = [p.apply_async(f, [i]) for i in range(20)] for t in threads: try: print(t.get(timeout=1)) except Exception: pass

I get the following error three times (one for each thread in the pool), and it prints "3" through "19":

 AttributeError: 'module' object has no attribute 'f'

The first three calls to apply_async never return.

Meanwhile, if I try:

 from multiprocessing import Pool p = Pool(3) def f(x): print(x) p.map(f, range(20))

I get an AttributeError 3 times, the shell prints "6" to "19" and then freezes and cannot be killed [Ctrl] + [C]

Multiprocessor documents have the following:

The functionality in this package requires the main module to be imported by children.

What does it mean?

To clarify, I run the code in the terminal to test the functionality, but in the end I want it to be able to be placed in the web server modules. How do you use multiprocessing.Pool correctly in a python terminal and in code modules?

+19

python django flask multiprocessing pool

Zags Sep 22 '13 at 19:27

source share

3 answers

The function that you want to execute in the thread pool must already be defined when creating the pool.

This should work:

 from multiprocessing import Pool def f(x): print(x) if __name__ == '__main__': p = Pool(3) p.map(f, range(20))

The reason is that (at least on systems that have a fork ), when you create a pool, workers are created by marking up the current process. Therefore, if the objective function is not yet defined at this point, the worker will not be able to call it.

This is slightly different in windows, since there is no fork in windows. Here, new workflows are launched and the main module is imported. Therefore, in windows it is important to protect the executable code with if __name__ == '__main__' . Otherwise, each new employee will re-execute the code and therefore endlessly starts new processes, knocking down the program (or system).

+4

mata Sep 22 '13 at 19:36

source share

There is another possible source of this error. I got this error while running the sample code.

The source was that, despite the fact that he correctly installed multiple transfers, the C ++ compiler was not installed on my system, something I was informed about when trying to update multiprocessing. Therefore, it may be worth checking that the compiler is installed.

0

ic_fl2 Nov 30 '16 at 14:06

source share

Zags · Accepted Answer · 2013-09-23 15:10

This means that the pools must be initialized after defining the functions that will be performed on them. Using pools inside if __name__ == "__main__": blocks works if you are writing a stand-alone script, but this is not possible in large code bases or in server code (for example, in a Django or Flask project). Therefore, if you are trying to use pools in one of them, be sure to follow these recommendations, which are described in the following sections:

Initialize the pools at the bottom of the modules or inside the functions.
Do not call pool methods in the global scope of the module.

Alternatively, if you only need the best I / O concurrency (for example, database access or network calls), you can save yourself from all this headache and use thread pools instead of process pools. This includes completely undocumented:

 from multiprocessing.pool import ThreadPool

Its interface is exactly the same as that of the pool, but since it uses threads, not processes, it is not accompanied by any of the caveats that process pools use, except that you do not get true concurrency of code execution, just concurrency in blocking input / output.

Pools should be initialized after defining the functions that will be executed on them

The incomprehensible text from the python documentation means that during the definition of the pool, the surrounding module is imported by the threads in the pool. In the case of the python terminal, this means all and only the code that you have already run.

Therefore, any functions that you want to use in the pool must be defined before it is initialized . This is true both for the code in the module and for the code in the terminal. The following code modifications in the question will work fine:

 from multiprocessing import Pool def f(x): return x # FIRST p = Pool(3) # SECOND threads = [p.apply_async(f, [i]) for i in range(20)] for t in threads: try: print(t.get(timeout=1)) except Exception: pass

Or

 from multiprocessing import Pool def f(x): print(x) # FIRST p = Pool(3) # SECOND p.map(f, range(20))

Ok, I mean good on Unix. Windows has its own problems, which I will not talk about here.

Module Pool Cautions

But wait, there is something else (for using pools in modules that you want to import to another location)!

If you define a pool inside a function, you have no problem. But if you use the Pool object as a global variable in the module, it should be defined at the bottom of the page, not at the top . While this goes against most good code styles, it is necessary for functionality. The way to use the pool declared at the top of the page is to use it only with functions imported from other modules, for example:

 from multiprocessing import Pool from other_module import f p = Pool(3) p.map(f, range(20))

Importing a pre-configured pool from another module is quite terrible, since the import should go after what you want to run on it, for example:

 ### module.py ### from multiprocessing import Pool POOL = Pool(5) ### module2.py ### def f(x): # Some function from module import POOL POOL.map(f, range(10))

And secondly, if you execute something in a pool in the global scope of the imported module, the system freezes . i.e. this does not work:

 ### module.py ### from multiprocessing import Pool def f(x): return x p = Pool(1) print(p.map(f, range(5))) ### module2.py ### import module

This, however, works as long as nothing imports module2:

 ### module.py ### from multiprocessing import Pool def f(x): return x p = Pool(1) def run_pool(): print(p.map(f, range(5))) ### module2.py ### import module module.run_pool()

Now the reasons for this are only more bizarre and probably due to the fact that the code in the question generates an attribute error only once each, and after that it seems that the code is executing correctly. It also seems that the pool threads (at least with some reliability) reload the code in the module after execution.

Using python multiprocessing pool in terminal and in code modules for Django or Flask

Pools should be initialized after defining the functions that will be executed on them

Module Pool Cautions

More articles: