Workaround for using __name __ == '__ main__' in Python multiprocessing

As we all know, we need to protect main()when running code from multiprocessingin Python using if __name__ == '__main__'.

I understand that in some cases it is necessary to provide access to functions that are mainly defined, but I do not understand why this is necessary in this case:

file2.py

import numpy as np
from multiprocessing import Pool
class Something(object):
    def get_image(self):
        return np.random.rand(64,64)

    def mp(self):
        image = self.get_image()
        p = Pool(2)
        res1 = p.apply_async(np.sum, (image,))
        res2 = p.apply_async(np.mean, (image,))
        print(res1.get())
        print(res2.get())
        p.close()
        p.join()

main.py

from file2 import Something
s = Something()
s.mp()

All functions or imports necessary for the operation Somethingare part file2.py. Why should the subprocess restart main.py?

, __name__ , file2.py, , . Windows? , ( - , - ), ?)

, - fork(), Windows. , , file2.py main.py, , file2.py

+5
4

( __name__ != '__main__', Windows , forking, , forking. multiprocessing , , " ", , , . , , ( , , ..) .

, __main__, ( , ..). if __name__ == '__main__': . .

+1

"spawn" - Python, . , Python , , , , , . , - .

, Windows, fork, .

, if __name__ == "__main__":? , . , .., .

+4

the if __name__ == '__main__' , Windows "fork" .

linux, , fork , , ( , )

fork Windows, python , , . , __name__, ( ..).

main.py ( ). python , python script .

FYI , , , , https://docs.python.org/2/library/multiprocessing.html#windows

+2

, spawn() Windows . ( ..).

The workaround is to pull the multiprocessor script into a separate file and then use the subprocess to run it from the main script.

I pass the variables to the script, selecting them in a temporary directory, and pass the temporary directory to the subprocess using argparse.

Then I drag the results to a temporary directory where the main script gets them.

Here is an example of the function file_hasher()I wrote:

main_program.py

import os, pickle, shutil, subprocess, sys, tempfile

def file_hasher(filenames):
    try:
        subprocess_directory = tempfile.mkdtemp()
        input_arguments_file = os.path.join(subprocess_directory, 'input_arguments.dat')
        with open(input_arguments_file, 'wb') as func_inputs:
            pickle.dump(filenames, func_inputs)
        current_path = os.path.dirname(os.path.realpath(__file__))
        file_hasher = os.path.join(current_path, 'file_hasher.py')
        python_interpreter = sys.executable
        proc = subprocess.call([python_interpreter, file_hasher, subprocess_directory],
                               timeout=60, 
                              )
        output_file = os.path.join(subprocess_directory, 'function_outputs.dat')
        with open(output_file, 'rb') as func_outputs:
            hashlist = pickle.load(func_outputs)
    finally:
        shutil.rmtree(subprocess_directory)
    return hashlist

file_hasher.py

#! /usr/bin/env python
import argparse, hashlib, os, pickle
from multiprocessing import Pool

def file_hasher(input_file):
    with open(input_file, 'rb') as f:
        data = f.read()
        md5_hash = hashlib.md5(data)
    hashval = md5_hash.hexdigest()
    return hashval

if __name__=='__main__':
    argument_parser = argparse.ArgumentParser()
    argument_parser.add_argument('subprocess_directory', type=str)
    subprocess_directory = argument_parser.parse_args().subprocess_directory

    arguments_file = os.path.join(subprocess_directory, 'input_arguments.dat')
    with open(arguments_file, 'rb') as func_inputs:
        filenames = pickle.load(func_inputs)

    hashlist = []
    p = Pool()
    for r in p.imap(file_hasher, filenames):
        hashlist.append(r)

    output_file = os.path.join(subprocess_directory, 'function_outputs.dat')
    with open(output_file, 'wb') as func_outputs:
        pickle.dump(hashlist, func_outputs)

There must be a better way ...

+1
source

Source: https://habr.com/ru/post/1681487/


All Articles