Celery worker hangs without errors

I have a production plant for celery workers to execute a POST / GET request for remote maintenance and save the result. It processes a load of about 20 thousand tasks in 15 minutes.

The problem is that the workers were numb for no reason, no errors, no warnings.

I also tried to add multiprocessing, the same result.

In the log, I see an increase in the execution time of the task, for example, in s

See https://github.com/celery/celery/issues/2621 for details

+6
source share
1 answer

If your celery worker sometimes gets stuck, you can use strace & lsof to find out which system call it is stuck in.

For instance:

 $ strace -p 10268 -s 10000 Process 10268 attached - interrupt to quit recvfrom(5, 

10268 is the pid of the celery worker, recvfrom(5 means the worker stops when receiving data from the file descriptor.

Then you can use lsof to check what 5 in this workflow.

 lsof -p 10268 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ...... celery 10268 root 5u IPv4 828871825 0t0 TCP 172.16.201.40:36162->10.13.244.205:wap-wsp (ESTABLISHED) ...... 

This indicates that the worker is stuck in the tcp connection (you can see 5u in the FD column).

Some python packages, such as requests , block waiting for data from the peer, this can cause the celery worker to freeze, if you use requests , be sure to set the timeout argument.


You saw this page:

https://www.caktusgroup.com/blog/2013/10/30/using-strace-debug-stuck-celery-tasks/

+11
source

Source: https://habr.com/ru/post/987324/


All Articles