We have problems with our celery demon, which is very flaky. We use a cloth deployment script to restart the daemon whenever we push the changes, but for some reason this causes serious problems.
Whenever a script is deployed, celery processes remain in some pseudo-dead state. They (unfortunately) still consume jobs from rabbitmq, but they will actually do nothing. Vaguely, a brief overview would show that everything seems “perfect” in this state, celeryctl status shows one node online and ps aux | grep celery shows 2 running processes.
However, when you try to start /etc/init.d/celeryd, manually stopping results in the following error:
start-stop-daemon: warning: failed to kill 30360: No such process
While in this state, the attempt to start celeryd start works correctly, but actually does nothing. The only way to fix the problem is to manually kill the running celery processes and then restart them.
Any ideas what is going on here? We also do not have full confirmation, but we think that the problem also develops in a few days (without activity it is a test server at present) on it without deployment.
source share