(1) If your worker celery is sometimes stuck, you can use strace & lsofto find out in which system call he is stuck.
For instance:
$ strace -p 10268 -s 10000
Process 10268 attached - interrupt to quit
recvfrom(5,
10268 is the pid of the celery worker, recvfrom(5meaning that the worker stops when receiving data from the file descriptor.
Then you can use lsofto check what is 5in this workflow.
lsof -p 10268
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
......
celery 10268 root 5u IPv4 828871825 0t0 TCP 172.16.201.40:36162->10.13.244.205:wap-wsp (ESTABLISHED)
......
This indicates that the worker is stuck in the tcp connection (you can see 5uin the column FD).
python, requests, , , requests, timeout.
(2) RabbitMQ, , , .
:
https://www.caktusgroup.com/blog/2013/10/30/using-strace-debug-stuck-celery-tasks/