Is it safe to call multiprocessing from a thread in Python?

Question

Is it safe to call multiprocessing from a thread in Python?

According to https://github.com/joblib/joblib/issues/180 and is there a safe way to create a subprocess from a thread in python? Python multiprocessing module does not allow use from threads. It's true?

My understanding is that its limit is to fork from threads if you don't hold threading.Lock when you do this (in the current thread? Anywhere in the process?). However, the Python documentation does not say whether threading.Lock objects can be safely used after forks.

Also this: the locks used in conjunction with the logging module cause problems with fork. https://bugs.python.org/issue6721

I am not sure how this problem arises. It seems that the state of any locks in the process is copied to the child process when the current thread forks (which seem to be design errors and are specific for deadlocks). If so, then using multiprocessing does provide some protection against this (since I am free to create my multiprocessing .Pool after threading.Lock is created and injected by other threads, and after the threads are started, using non-fork- safe logging module). Multiprocessor module documents are also silent about whether multiprocessor service is needed. Shells should be highlighted before locking.

Is threading.Lock a multiprocessor replacement. Local everywhere avoids this problem and allows us to safely combine streams and forks?

+5

python python-multiprocessing

user48956 Sep 27 '17 at 4:51

source share

1 answer

Antti haapala · Accepted Answer · 2017-09-27T06:06:02+0000

It seems that the state of any locks in the process is copied to the child process when the current thread forks (which seem to be design errors and are defined for deadlocks).

This is not a design error, but fork() precedes multithreading a single-processor process. The state of all locks is copied to the child process, because these are just objects in memory; the entire address space of the process is copied as in fork. There are only bad alternatives: either copy all threads over the forks, or refuse forking in a multi-threaded application.

Therefore, fork() ing in a multi-threaded program has never been safe unless execve() or exit() follows in the <child process>.

Is threading.Lock a multiprocessor replacement. Local everywhere avoids this problem and allows us to safely combine streams and forks?

No. Nothing allows you to safely combine streams and forks; this is not possible.

The problem is that if there are multiple threads in the process after the fork() system call, you cannot continue to run the program safely on POSIX systems.

For example, Linux fork(2) manuals:

After fork(2) in a multi-threaded program, a child can safely call only async-signal functions (see signal(7) ) until he calls execve(2) .

those. everything is ok fork() in a multi-threaded program, and then only functions with a strong C signal are called with an aync signal function (which is a fairly limited subset of C functions) until the child process is replaced with another executable!

Insecure calls to C functions in child processes, e.g.

malloc for dynamic memory allocation
any <stdio.h> functions for formatted input
most of the pthread_* functions needed to handle the state of a thread, including creating new threads ...

Thus, there is very little that can safely carry out the children's process. Unfortunately, the developers of the CPython kernel downplayed the problems caused by this. Even now, the documentation says:

Please note that safely formatting a multi-threaded process is problematic .

Pretty euphemism for the "impossible."

It is safe to use multiprocessing from a Python process that has multiple control threads, provided that you do not use the fork start method; in Python 3.4+, you can now change the way you run it . In previous versions of Python, including all Python 2, POSIX systems always behaved as if fork was specified as the start method; this will lead to undefined behavior.

Problems are not limited to threading.Lock objects, but all locks stored in the standard C library, C extensions, etc. To make matters worse, most of the time people say "it works for me " ... until it stops working.

There have even been cases where the seemingly single-threaded Python program is actually multithreaded on MacOS X, causing crashes and deadlocks when using multiprocessing.

Another problem is that all open file descriptors, their use, common sockets can behave strangely in programs that plug, but this will take place even in single-threaded programs.

TL DR: using multiprocessing in multi-threaded programs with C extensions with open sockets, etc .:

fine in 3.4+ and POSIX if you explicitly specify a startup method that is not fork ,
excellent on Windows because it does not support forking;
in Python 2 - 3.3 on POSIX: you basically shoot in the foot.

Is it safe to call multiprocessing from a thread in Python?

More articles: