If your celery worker sometimes gets stuck, you can use strace & lsof to find out which system call it is stuck in.
For instance:
$ strace -p 10268 -s 10000 Process 10268 attached - interrupt to quit recvfrom(5,
10268 is the pid of the celery worker, recvfrom(5 means the worker stops when receiving data from the file descriptor.
Then you can use lsof to check what 5 in this workflow.
lsof -p 10268 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ...... celery 10268 root 5u IPv4 828871825 0t0 TCP 172.16.201.40:36162->10.13.244.205:wap-wsp (ESTABLISHED) ......
This indicates that the worker is stuck in the tcp connection (you can see 5u in the FD column).
Some python packages, such as requests , block waiting for data from the peer, this can cause the celery worker to freeze, if you use requests , be sure to set the timeout argument.
You saw this page:
https://www.caktusgroup.com/blog/2013/10/30/using-strace-debug-stuck-celery-tasks/
source share