The problem with exiting the dismantled process

I am writing a daemon program that spawns several other child processes. After I started the stop script, the main process continues to work, when it is about to exit, it really confused me.

 import daemon, signal from multiprocessing import Process, cpu_count, JoinableQueue from http import httpserv from worker import work class Manager: """ This manager starts the http server processes and worker processes, creates the input/output queues that keep the processes work together nicely. """ def __init__(self): self.NUMBER_OF_PROCESSES = cpu_count() def start(self): self.i_queue = JoinableQueue() self.o_queue = JoinableQueue() # Create worker processes self.workers = [Process(target=work, args=(self.i_queue, self.o_queue)) for i in range(self.NUMBER_OF_PROCESSES)] for w in self.workers: w.daemon = True w.start() # Create the http server process self.http = Process(target=httpserv, args=(self.i_queue, self.o_queue)) self.http.daemon = True self.http.start() # Keep the current process from returning self.running = True while self.running: time.sleep(1) def stop(self): print "quiting ..." # Stop accepting new requests from users os.kill(self.http.pid, signal.SIGINT) # Waiting for all requests in output queue to be delivered self.o_queue.join() # Put sentinel None to input queue to signal worker processes # to terminate self.i_queue.put(None) for w in self.workers: w.join() self.i_queue.join() # Let main process return self.running = False import daemon manager = Manager() context = daemon.DaemonContext() context.signal_map = { signal.SIGHUP: lambda signum, frame: manager.stop(), } context.open() manager.start() 

stop script is just a single line os.kill(pid, signal.SIGHUP) , but after that the child processes (work processes and the http server process) end beautifully, but the main process just stays there, I don’t know what prevents it from returning.

+4
source share
2 answers

I tried a different approach, and it seems to have worked (note that I extracted part of the daemon code since I did not have this module installed).

 import signal class Manager: """ This manager starts the http server processes and worker processes, creates the input/output queues that keep the processes work together nicely. """ def __init__(self): self.NUMBER_OF_PROCESSES = cpu_count() def start(self): # all your code minus the loop print "waiting to die" signal.pause() def stop(self): print "quitting ..." # all your code minus self.running manager = Manager() signal.signal(signal.SIGHUP, lambda signum, frame: manager.stop()) manager.start() 

One warning is that the .pause () signal will be turned off for any signal, so you can change your code accordingly.

EDIT:

The following works great for me:

 import daemon import signal import time class Manager: """ This manager starts the http server processes and worker processes, creates the input/output queues that keep the processes work together nicely. """ def __init__(self): self.NUMBER_OF_PROCESSES = 5 def start(self): # all your code minus the loop print "waiting to die" self.running = 1 while self.running: time.sleep(1) print "quit" def stop(self): print "quitting ..." # all your code minus self.running self.running = 0 manager = Manager() context = daemon.DaemonContext() context.signal_map = {signal.SIGHUP : lambda signum, frame: manager.stop()} context.open() manager.start() 

What version of python are you using?

+1
source

You are creating an http server process but not join() it. What happens if, instead of running os.kill() , to stop the http server process, you send it a breakpoint ( None , like you send workers), and then do self.http.join() ?

Refresh . You also need to send a None watchdog request to the input queue once for each worker . You can try:

  for w in self.workers: self.i_queue.put(None) for w in self.workers: w.join() 

NB The reason you need two loops is because if you put None in the queue in the same loop that join() does, then None can be raised by a worker other than w , so joining in w will result in a lock the caller.

You do not show the code for workers or an http server, so I assume that they behave well in terms of calling task_done, etc. and that every employee will leave as soon as he sees None , without get() -fetching any things from the input queue.

Also, note that there is one open hard-replication JoinableQueue.task_done() with JoinableQueue.task_done() that may bite you.

+1
source

Source: https://habr.com/ru/post/1286513/


All Articles