Simulate Google Appengine application queue with Gearman

Question

Simulate Google Appengine application queue with Gearman

One of the features that I like most about the Google Task Queue is its simplicity. In particular, I like that to start a task, you need a URL and some parameters, and then messages to this URL.

This structure means that tasks always execute the latest version of code. Conversely, my gear workers run the code inside my django project - so when I hit the new version live, I have to kill the old worker and run the new one so that it uses the current version of the code.

My goal is to ensure that the task queue is independent of the code base, so that I can push a new live version without restarting any workers. So, I thought: why not make the tasks executable by the URL in the same way as the task queue with the Google engine?

The process will work as follows:

A user request arrives and launches several tasks that should not be blocked.
Each task has a unique URL, so I set the POST relay task to the specified URL.
The relay server finds the worker, passes the URL and sends the data to the worker.
The employee simply sends the data url, completing the task.

Assume the following:

Each request from the relay employee is somehow signed so that we know that it comes from the relay server, and not a malicious request.
Tasks are limited to work in less than 10 seconds (there would be no long tasks that could wait time)

What are the potential pitfalls of this approach? Here bothers me:

A server can potentially become clogged with multiple requests at the same time that are triggered by a previous request. Thus, a single user request can result in 10 simultaneous HTTP requests. I believe that I could have one worker with sleep before each speed limit request.

Any thoughts?

+4

google-app-engine django message-queue gearman

sotangochips Mar 10 '10 at 1:28

source share

1 answer

rlotun · Answer 1 · 2010-03-31T19:33:42+0000

As a user of both Django and Google AppEngine, I can certainly appreciate what you get. At work, I am currently working on the same scenario using some pretty interesting open source tools.

Take a look at Celery . This is a distributed task queue built using Python that provides three concepts - a queue, a set of workers, and a result repository. It connects to different tools for each part.
The line should be fierce and quick. Check out RabbitMQ for a great queue implementation in Erlang using the AMQP protocol.
Workers can ultimately be Python functions. You can start workers using either messages in the queue, or perhaps more appropriate to what you describe - using webhooks

Check out the Celery webhook documentation. Using all these tools, you can create a ready-made distributed queue of production tasks that will fulfill your requirements above.

I should also mention that in relation to your first trap, celery implements task speed limits using the Token Bucket algorithm.

Simulate Google Appengine application queue with Gearman

More articles: