Multiprocessing: more processes than cpu.count

Note : I "fell asleep" in the ground multiprocessing 2 days ago. Therefore, my understanding is very simple.

I am writing and loading application in amazon s3 buckets. If the file size is larger ( 100mb ), Ive implemented parallel downloads using pool from the multiprocessing module. I am using a machine with core i7 , I had cpu_count of 8 . I got the impression that if I do pool = Pool(process = 6) , I use 6 cores and the file starts to load in parts, and the download for the first 6 parts starts at the same time. To find out what happens when the process larger than cpu_count , I entered 20 (meaning that I want to use 20 cores). To my surprise, instead of getting a block of errors, the program started loading 20 parts at a time (I used a smaller chunk size to make sure there are many parts). I do not understand this behavior. I only have 8 cores, so how can it not accept input program 20? When I say process=6 , does it really use 6 threads? Which may be the only explanation that 20 is a valid input, as there may be 1000 threads. Can someone please explain this to me.

Edit:

I "borrowed" the code here . I changed it a bit and I will ask the user to use it for main use instead of setting parallel_processes to 4

+6
source share
1 answer

The number of processes running simultaneously on your computer is not limited by the number of cores. In fact, you probably have hundreds of programs running right now on your computer - each with its own process. To make it work, the OS assigns one of your 8 processors to each process or thread only temporarily - at some point it can be stopped, and the other process will take its place. See What is the difference between parallel programming and parallel programming? if you want to know more.

Edit: Assigning more processes to your boot example may or may not make sense. Reading from disk and sending over the network is usually a lock in python. A process that is waiting to read or send part of its data can be stopped, so another process can start its I / O. On the other hand, with too many processes, either I / O or network I / O will become a bottleneck, and your program will slow down due to the additional overhead required to switch processes.

+7
source

Source: https://habr.com/ru/post/983887/


All Articles