I have a simple main() function that processes a huge amount of data. Since I have an 8-core machine with a lot of bars, I was asked to use the multiprocessing python module to speed up processing. Each subprocess will take about 18 hours.
In short, I have doubts that I correctly understood the behavior of the multiprocessing module.
I somehow start different subprocesses, for example:
def main(): data = huge_amount_of_data(). pool = multiprocessing.Pool(processes=cpu_cores)
I understand that the beginning of this script is its own process, namely the main process that ends after all subprocesses are completed. Obviously, the Main process does not feed a lot of resources, since it will first prepare the data and generate subprocesses. Will he use the kernel for himself too? The value will only be for starting 7 subprocesses instead of 8, which I liked to start above?
The main question: can I create 8 subprocesses and make sure that they will work correctly parallel to each other?
By the way, the subprocesses in no way interact with each other, and when they are finished, each of them generates a sqlite database file in which they store the results. Thus, even result_storage is processed separately.
What I want to avoid is that I create a process that will prevent others from working at full speed. I need code to complete in approximately 16 hours, not twice, because I have more processes than cores .:-)
source share