Understanding Celery Need

I just found out about the configuration parameter CELERYD_PREFETCH_MULTIPLIER ( docs ). The default is 4, but (I believe) I want the prefetch to be as low as possible. I set it to 1 now, which is close enough to what I'm looking for, but still I don’t understand some things:

  • Why is this prefetching a good idea? I really don’t see a reason for this if there are not many delays between the message queue and the workers (in my case, they currently work on the same host, and in the worst case, they can run on different hosts in the same data center ) The documentation only mentions flaws, but does not explain what benefits exist.

  • Many people seem to set this value to 0, expecting to be able to disable prefetching (a reasonable guess, in my opinion). However, 0 means unlimited prefetching. Why would anyone ever need unlimited prefetching to completely eliminate the concurrency / asynchrony in which you entered the task queue?

  • Why can't prefetch be turned off? It might be nice to get performance to disable it in most cases, but is there a technical reason why this is not possible? Or is it just not implemented?

  • Sometimes this parameter connects to CELERY_ACKS_LATE . For example. Roger Hu writes "[...] often that [users] really want the worker to have as many tasks as there are child processes. But this is not possible without the inclusion of late confirmations [...]" I don’t understand how these two parameters are related and why one is impossible without the other. Another link mention can be found here . Can someone explain why these two parameters are related?

+48
python celery celeryd
Apr 16 '13 at
source share
4 answers
  • Prefetching can improve performance. Workers do not need to wait for the next message from the broker to process. Communication with the broker once and processing a large number of messages gives a performance boost. Receiving a message from a broker (even from a local one) is expensive compared to accessing local memory. Workers are also allowed to receive messages in batches.

  • A prefetch set to zero means "no limit" and not unlimited

  • Setting prefetching to 1 is documentally equivalent to turning it off, but it may not always be (see https://stackoverflow.com/a/167389/ )

  • Prefetching allows you to send messages in batches. CELERY_ACKS_LATE = True prevents messages from being acknowledged when they reach work

+19
Apr 17 '13 at 13:40
source share

Just a warning: in my testing with the redis + Celery 3.1.15 browser, all the tips that I read relate to disabling prefetching CELERYD_PREFETCH_MULTIPLIER = 1 . This is clearly false.

To demonstrate this:

  • Set CELERYD_PREFETCH_MULTIPLIER = 1
  • A queue of 5 tasks, each of which will take several seconds (for example, time.sleep(5) )
  • Start viewing the length of the task queue in Redis: watch redis-cli -c llen default

  • Run celery worker -c 1

  • Please note that the queue length in Redis will immediately decrease from 5 to 3

CELERYD_PREFETCH_MULTIPLIER = 1 does not prevent prefetching, it simply limits prefetching to 1 task per queue.

-Ofair , despite what the documentation says , also does not prevent prefetching.

With the exception of changing the source code, I did not find any way to completely disable prefetching.

+13
Oct 26 '15 at 23:07
source share

Old question, but still add my answer if it helps someone. My understanding from the initial testing was the same as in David Wolever’s answer. I just experienced this more in celery 3.1.19 and -Ofair really works. It’s just not intended to disable prefetching at the working node level. This will continue. Using -Ofair has a different effect, which is at the pool level. So, to completely disable prefetching, do this -

  • Set CELERYD_PREFETCH_MULTIPLIER = 1
  • Set CELERY_ACKS_LATE = True at global or task level
  • Use -Ofair when starting workers
  • If you set concurrency to 1, then step 3 is not needed. If you want higher concurrency, then step 3 is necessary to avoid backing up to a node that can run with running tasks.

Adding additional information:

I found that a working node will always be the prefect by default. You can control only the number of jobs that it prefix using CELERYD_PREFETCH_MULTIPLIER . If set to 1, it will only precede as many tasks as the number of concurrency pools in a node. Therefore, if you have concurrency = n, the maximum tasks pre-programmed by node will be n.

Without the -Ofair option, it -Ofair to me that if one of the pool workflows was performing a multi-year task, other workers in node also stopped processing tasks already programmed with node. Using -Ofair , this has changed. Although one of the workers in node performed lengthy tasks, others did not stop processing and continued to process tasks previously programmed by node. Therefore, I see two levels of prefetching. One at the working node level. The other is at the individual worker level. Using -Ofair for me seemed to disable it at the working level.

How is ACKS_LATE related? ACKS_LATE = True means that the task will be confirmed only when the task is completed successfully. If not, I believe this will happen when it is received by the employee. In the case of prefetching, the task first gets the worker (confirmed from the logs), but will be completed later . I only realized that messages previously written to memory appear under unrecognized messages in rabbitmq. So I'm not sure that setting True to True is absolutely necessary. In any case, our tasks were set in this way (in the last) for other reasons.

+9
Jun 08 '16 at 10:30
source share

I can not comment on the answers of David Wolever, since my stackcred is not high enough. So, I formulated my comment as an answer, as I would like to share my experience with Celery 3.1.18 and the Mongodb broker. I managed to stop the prefetch with the following:

  • add CELERYD_PREFETCH_MULTIPLIER = 1 to the celery configuration
  • add CELERY_ACKS_LATE = True to the celery configuration
  • Start a celery worker with options: --concurrency=1 -Ofair

By default for CELERY_ACKS_LATE, the worker is still preselecting. Just like the OP, I do not fully understand the relationship between prefetching and late settings. I understand what David says: "CELERY_ACKS_LATE = True prevents messages from being acknowledged when they reach the worker," but I don’t understand why late acks are not compatible with prefetching. Theoretically, a prefetch would still allow you to get a late right - even if it had not been encoded as such in celery?

+5
Oct 29 '15 at 21:23
source share



All Articles