Mandatory use of if name == " main" in windows when using multiprocessing

Question

Mandatory use of if name == " main" in windows when using multiprocessing

When using multiprocessing in python, windows are expected to protect the entry point to the program. The documentation says: "Make sure the core module can be safely imported using the new Python interpreter without causing unintended side effects (like starting a new process)." Can anyone explain what exactly this means?

+6

python windows multiprocessing

pratheekms Dec 03 '13 at 20:09

source share

2 answers

The multiprocessing module works by creating new Python processes that will import your module. If you had not added the __name__== '__main__' protection, you would have entered an endless loop of creating a new process. This happens as follows:

Your module is imported and executes the code during import, which causes multiprocessing to create 4 new processes.
These 4 new processes, in turn, import the module and execute the code during import, which causes multiprocessing to create 16 new processes.
These 16 new processes, in turn, import the module and execute the code during the import, which causes multiprocessing create 64 new processes.
Ok, hope you get the picture.

So the idea is that you make sure that the spawning process occurs only once. And this is achieved most easily with the idiom of protection __name__== '__main__' .

+3

David heffernan Dec 03 '13 at 20:16

source share

Tim peters · Accepted Answer · 2013-12-03T20:29:13+0000

Turning a little to the good answer that you have already received, this helps if you understand what Linux-y systems are doing. They spawn new processes using fork() , which has two good consequences:

All data structures that exist in the main program are visible to child processes. They actually work with copies of data.
Successive child processes are started in the command immediately after fork() in the main program, so any module level code already executed in the module will not be executed again.

fork() not possible on Windows, so on Windows every module is re-imported by each child process. So:

On Windows, no data structures that exist in the main program are visible to child processes; and,
All module level code is executed in each child process.

So, you need to think a little about what code you want to execute only in the main program. The most obvious example is that you want the code that creates the child processes to run only in the main program should be protected __name__ == '__main__' . For a more subtle example, consider code that creates a giant list that you are going to pass to workflow processes for scanning. You probably want to protect this too, because in this case it makes no sense for each workflow to delete RAM and time, creating its own worthless copies of the giant list.

Note that it is a good idea to use __name__ == "__main__" appropriately even on Linux-y systems, as it makes the planned separation of work clear . Parallel programs can be confusing - every bit helps; -)

Mandatory use of if __name __ == "__ main__" in windows when using multiprocessing

More articles:

Mandatory use of if name == " main" in windows when using multiprocessing