The multiprocessor pool is hanging and cannot exit the application

I'm sure this is a rookie mistake, but I can't figure out what I'm doing wrong with multiprocessing. I have this code (which just sits and does nothing)

if __name__ == '__main__': pool = Pool(processes=4) for i, x in enumerate(data): pool.apply_async(new_awesome_function, (i, x)) pool.close() pool.join() 

data is a list ([1,2,3,4,5]), and I'm trying to take a list sending every item that will be executed across several processors, but when I finish my working command in a function and send this code itโ€™s nothing doesn't (when I call the function myself without the code above, it works fine). Therefore, I think that I am using multiprocessing incorrectly (although I took examples from sites), any suggestions?

Update: I noticed that I canโ€™t even break out of it when it hangs with control-c .., which always works to exit my buggy programs. I looked at the python2.5 multiprocessing pool and tried to follow the recommendations and added the import inside the if statement, but no luck

Update2: Sorry, I realized that the answer to this question is lower, that the command works, but it seems that this does not complete the program or does not allow me to exit it.

+4
source share
3 answers

I do not know which database you are using, but most likely you cannot share database connections between your processes.

Linux uses fork() , which creates a copy of everything in memory when the subprocess starts. However, things like a socket, open files, and connecting to a database will not work properly unless specifically designed for this.

In the fork() window, it is not available for it to restart your script. In your case, it will be very bad, because all this will fall again. This can be prevented if you clear the if __name__ == '__main__': bit.

You should be able to re-open database connections in my_awesome_function and thus be able to interact with the database.

In truth, you will not reach that speed. In fact, I expect it to be slower. See Databases are really very slow. Your process will spend most of the time waiting for the database. Now you just have a few processes waiting for the database, and this really will not improve the situation.

But databases are for storing things. As long as you do the processing, you must do this inside your code before deleting the database. You mainly use the as set database, and your code will be much nicer using the python set. If you really need to put this material in a database, do it at the end of your program.

+2
source

Your code works for me:

 from multiprocessing import Pool import time def new_awesome_function(a,b): print(a,b, 'start') time.sleep(1) print(a,b, 'end') if __name__ == '__main__': data = [1,2,3,4,5] pool = Pool(processes=4) for i, x in enumerate(data): pool.apply_async(new_awesome_function, (i, x)) pool.close() pool.join() 

gave me:

 0 1 start 1 2 start 2 3 start 3 4 start 1 2 end 0 1 end 4 5 start 2 3 end 3 4 end 4 5 end 

Why do you think this is not working?


Edit: Try running this and looking at the exit:

 from multiprocessing import Pool import time def new_awesome_function(a,b): print(a,b, 'start') time.sleep(1) print(a,b, 'end') return a + b if __name__ == '__main__': data = [1,2,3,4,5] pool = Pool(processes=4) results = [] for i, x in enumerate(data): r = pool.apply_async(new_awesome_function, (i, x)) results.append((i,r)) pool.close() already = [] while len(already) < len(data): for i,r in results: if r.ready() and i not in already: already.append(i) print(i, 'is ready!') pool.join() 

My:

 0 1 start 1 2 start 2 3 start 3 4 start 0 1 end 4 5 start 1 2 end 2 3 end 0 is ready! 3 4 end 1 is ready! 2 is ready! 3 is ready! 4 5 end 4 is ready! 
+2
source

Multiprocessing is not performed.

You are probably doing something like this

 data = {} def new_awesome_function(a, b): data[a] = b 

After running the script, the data has not changed. This is due to the fact that copies of your program are used for multiprocessing. Your functions run, but they run in copies of your program and thus do not affect your original program.

To use multiprocessing, you need to explicitly transfer information from one process to another. With streaming, everything is common, but with multiprocessing nothing is used unless you explicitly share it.

The easiest way is to use the return values:

 def new_awesome_function(a, b): return a + b result = pool.apply_async(new_awesome_function, (1, 2)) # later... value = result.get() 

See the python documentation: http://docs.python.org/library/multiprocessing.html for other methods such as queues, pipes, and managers. What you cannot do is change the state of your program and expect it to work.

+2
source

Source: https://habr.com/ru/post/1397237/


All Articles