Coordination of one periodic task between servers in a cluster

Question

Coordination of one periodic task between servers in a cluster

(I will try to keep this question as short as possible, clearly describing the situation. Please comment if something is missing.)

Situation

I am running a cluster with three servers in the same data center
Each server runs exactly the same application code to facilitate deployment.

goal

Perform one task (call it Task X) every minute on the same server.

In these conditions

The cluster remains distributed and available.
Each server runs the same application code. In other words, there is no such thing as "deploy code A to the main server and deploy code B to all secondary servers.

The reason I don’t want to distinguish between the type of server is to maintain high availability (to avoid problems when the so-called master happens), redundancy (to distribute the load) and to avoid creating a complex deployment procedure where I need to deploy various applications on different servers.

Why is it so hard? If I were to add code that would perform this task every 5 minutes, then each server would execute it, because each server would execute the same application code. Thus, they should be able to coordinate which server will work the same during each tick.

I can use distributed messaging engines like Apache Kafka or Redis . If we use such a mechanism to coordinate such a task, how would such an “algorithm” work?

I asked this question to someone else, his answer was to use a task queue. However, this does not seem to solve the problem, because the question remains: which server is going to add the task to the task queue? If all servers add the task to the queue, this will result in duplicate entries. Also, which server will perform the next task in the queue? All this must be solved by coordination within the cluster without differentiating between different types of servers.

+4

cluster-computing distribution high-availability task-queue

Tom Oct 31 '12 at 5:46

source share

2 answers

You can also use JGroups to achieve this. An example implementation can be found here.

0

Pragalathan m Oct 16 '14 at 16:43

source share

lastcanal · Accepted Answer · 2012-10-31T18:11:40+0000

It looks like you are looking for a distributed castle. Redis does a great job of setnx . If you combine it with expire , you can create global locks that will be released every N seconds.

setnx will only write the value and return true if the key does not already exist. Redis operations are atomic, so only the first server that calls setnx after the key expires will be able to complete the task.

Here is an example in ruby:

 # Attempt to get the lock for 'Task X' by setting the current server hostname if redis.setnx("lock:task:x", `hostname`.chomp) # Got the lock, now I set it to expire after 5 minutes redis.expire("lock:task:x", 60 * 5) # This server has the go ahead to execute the task execute_task_x else # Failed to get the lock. Another server is doing the work this time around end

However, you still rely on a single Redis Master server call if you are not using redis-sentinel . Take a look at the redis-sentinel documentation for information on how to configure automatic overflow.

Coordination of one periodic task between servers in a cluster

More articles: