Celery: a long task for monolithic tasks and short tasks

Question

Celery: a long task for monolithic tasks and short tasks

In my solution, I use distributed tasks to monitor equipment instances over a period of time (e.g. 10 minutes). I have to do some things when:

I am starting this monitoring session
I am ending a monitoring session
(Potentially) during a monitoring session

Is it safe to start one task for the entire session (10 minutes) and complete all this, or should I separate these actions into my own tasks?

The advantages of one task, as I see it, are that it would be easier to manage time constraints and enforce them. But:

Is it good to run a large pool of (mostly) sleeping workers? For example, if I know that at best I will have 200 sessions, do I have a pool of 500 people to ensure that there are available “session” places?

+4

python celery distributed-computing rabbitmq django-celery

Goro Sep 06 '12 at 19:38

source share

1 answer

asksol · Accepted Answer · 2012-09-10T16:14:36+0000

On that

no answer for one size.

Dividing large task A into many small parts (A¹, A², A³, ...) will increase the potential for concurrency.

So, if you have one working instance with 10 worker threads / processes, Now A can work in parallel, using 10 threads instead of sequential on one thread.

The amount of detail is called task detailing (fine or coarse).

If the task is too fine-grained, the overhead of messaging will reduce performance.

Each part should have a sufficient amount of calculations / IO to compensate for the overhead when sending a task to a message to the broker, perhaps write it to disk if there are no workers to accept it, a worker to receive a message, etc. (note that the message overhead can be changed, for example, you can have a queue that is temporary (non-persistent messages to disk) and send tasks that are not so important there.)

A busy cluster can make all this controversial

Maximum parallelism can already be achieved if you have a busy cluster (for example, 3 working instances with 10 threads / processes each, all tasks performed).

Then you do not get much benefit by dividing the task, but tasks that perform I / O operations are more likely to improve than tasks that are CPU-bound (separated by I / O operations).

Lengthy tasks ok

The employee is not allergic to lengthy tasks, be it 10 minutes or an hour.

But this is not ideal, because any long-term task blocks this slot from completing any waiting tasks. To reduce this, people use routing, so you have a dedicated queue, with dedicated workers for tasks that should be completed as soon as possible.

-

Celery: a long task for monolithic tasks and short tasks

More articles: