Coordination of one periodic task between servers in a cluster

(I will try to keep this question as short as possible, clearly describing the situation. Please comment if something is missing.)

Situation

  • I am running a cluster with three servers in the same data center
  • Each server runs exactly the same application code to facilitate deployment.

goal

  • Perform one task (call it Task X) every minute on the same server.

In these conditions

  • The cluster remains distributed and available.
  • Each server runs the same application code. In other words, there is no such thing as "deploy code A to the main server and deploy code B to all secondary servers.

The reason I don’t want to distinguish between the type of server is to maintain high availability (to avoid problems when the so-called master happens), redundancy (to distribute the load) and to avoid creating a complex deployment procedure where I need to deploy various applications on different servers.

Why is it so hard? If I were to add code that would perform this task every 5 minutes, then each server would execute it, because each server would execute the same application code. Thus, they should be able to coordinate which server will work the same during each tick.

I can use distributed messaging engines like Apache Kafka or Redis . If we use such a mechanism to coordinate such a task, how would such an “algorithm” work?

I asked this question to someone else, his answer was to use a task queue. However, this does not seem to solve the problem, because the question remains: which server is going to add the task to the task queue? If all servers add the task to the queue, this will result in duplicate entries. Also, which server will perform the next task in the queue? All this must be solved by coordination within the cluster without differentiating between different types of servers.

+4
source share
2 answers

It looks like you are looking for a distributed castle. Redis does a great job of setnx . If you combine it with expire , you can create global locks that will be released every N seconds.

setnx will only write the value and return true if the key does not already exist. Redis operations are atomic, so only the first server that calls setnx after the key expires will be able to complete the task.

Here is an example in ruby:

 # Attempt to get the lock for 'Task X' by setting the current server hostname if redis.setnx("lock:task:x", `hostname`.chomp) # Got the lock, now I set it to expire after 5 minutes redis.expire("lock:task:x", 60 * 5) # This server has the go ahead to execute the task execute_task_x else # Failed to get the lock. Another server is doing the work this time around end 

However, you still rely on a single Redis Master server call if you are not using redis-sentinel . Take a look at the redis-sentinel documentation for information on how to configure automatic overflow.

+3
source

You can also use JGroups to achieve this. An example implementation can be found here.

0
source

Source: https://habr.com/ru/post/1443154/


All Articles