Turning a little to the good answer that you have already received, this helps if you understand what Linux-y systems are doing. They spawn new processes using fork() , which has two good consequences:
- All data structures that exist in the main program are visible to child processes. They actually work with copies of data.
- Successive child processes are started in the command immediately after
fork() in the main program, so any module level code already executed in the module will not be executed again.
fork() not possible on Windows, so on Windows every module is re-imported by each child process. So:
- On Windows, no data structures that exist in the main program are visible to child processes; and,
- All module level code is executed in each child process.
So, you need to think a little about what code you want to execute only in the main program. The most obvious example is that you want the code that creates the child processes to run only in the main program should be protected __name__ == '__main__' . For a more subtle example, consider code that creates a giant list that you are going to pass to workflow processes for scanning. You probably want to protect this too, because in this case it makes no sense for each workflow to delete RAM and time, creating its own worthless copies of the giant list.
Note that it is a good idea to use __name__ == "__main__" appropriately even on Linux-y systems, as it makes the planned separation of work clear . Parallel programs can be confusing - every bit helps; -)
source share