How to create a distributed "debounce" task to merge a Redis list?

I have the following information: several clients click on a common Redis list. A separate workflow should merge this list (process and deletion). Wait / multi-exec in place to make sure this happens smoothly.

For performance reasons, I don’t want to go directly to the drainage process, but after x milliseconds, starting from the moment when the first client pushes (then empty) the list.

This is akin to the distributed underscore / lodash debounce function , for which the timer starts to trigger the moment the first element arrives (i.e.: "lead" instead of "trailing")

I am looking for the best way to do this reliably in failover mode.

I am currently leaning towards the following method:

  • Use the Redis Set using the NX and px methods. This allows:
    • to set the value (mutex) to the allocated key space, if it does not already exist. This is what the NX argument is used for
    • key expires after x milliseconds. This is what the px argument is used for
  • This command returns 1 if a value can be set, which means that no value previously existed. It returns 0 otherwise. A 1 means that the current client is the first client to start the process, since the Redis list has been deleted. therefore
  • this client places the job in a distributed queue, which is scheduled to run in milliseconds.
  • After x milliseconds, the worker to receive the job begins the process of draining the list.

It works on paper, but it seems a bit complicated. Any other ways to make this work distributed in a fault-tolerant way?

Btw: Redis and the distributed queue are already installed, so I don’t think this is an additional burden to use this problem.

+6
source share
1 answer

Sorry for that, but for the usual answer, a ton of text / theory would be required. Because your good question, you already wrote a good answer :)

First of all, we must define the terms. Debounce in terms of underlining / lodash should be explored in an article by David Korbahos :

Debounce Think of it as "grouping multiple events into one." Imagine that you are going home, enter the elevator, the doors are closing ... and suddenly your neighbor appears in the hall and tries to jump on the elevator. Be polite! and open the doors for him: you will be absent from the departure of the elevator. Think that the same situation could happen with a third party, etc ... probably a delay of departure for several minutes.

Throttle . Think of it as a valve; it regulates the flow of executions. We can determine the maximum number of times a function can be called at a specific time. Thus, in the elevator analogy, you are polite enough for people to wait 10 seconds, but as soon as this delay has passed, you must leave!

Your debounce question The first sinse element will be redirected to a list:

So, by analogy with the elevator. The elevator should rise 10 minutes after lifting the first person. It doesn’t matter how many people clog more in the elevator.

In the case of a distributed fault-tolerant system, this should be considered as a set of requirements:

  • Processing of the new list should begin within X-time after the insertion of the first element (i.e. the creation of the list).
  • A working failure should not break anything.
  • Dead Castle is free.
  • The first requirement must be met regardless of the number of workers - be it 1 or N.

those. you must know (in a distributed way) - a group of workers must wait, or you can start processing the list. As soon as we pronounce the phrase “distributed” and “fault tolerant”. These concepts always lead with friends:

  • Atomicity (e.g. blocking)
  • Reservation

On practice

In practice, I'm afraid that your system should be a little more complicated (maybe you just did not write it, and you already have one).

Your method:

  • Pessimistic locking with a mutex via SET NX PX. NX is a guarantee that only one process at a time does the work (atomicity). PX ensures that if something happens to this process, the lock will be released by Redis (one piece of fault tolerance regarding deadlock).
  • All employees try to catch one mutex (per list key), so only one will be happy and will process the list after X-time. This process can update the TTL of the mutex (if it takes longer, as was originally required). If the process works, the mutexes will be unlocked after TTL and will be grabbed with another worker.

My suggestion

Redis fault-tolerant reliable queuing is built around RPOPLPUSH :

  • RPOPLPUSH item from processing to a special list (for each employee in the list).
  • Process element
  • Remove item from custom list

Requirements So, if a worker crashes, we can always return a broken message from a special list to the main list. And Redis guarantees the atomicity of RPOPLPUSH / RPOP. That is, there is only a problem group of workers to wait a while.

And then two options. Firstly, if many customers and smaller workers use locks on the worker side. Therefore, try to block the mutex from the worker, and if successful, start processing.

And vice versa. Use SET NX PX every time you run LPUSH / RPUSH (for the solution “Wait N time before you see me” if you have a lot of work and some push clients). So clicking:

 SET myListLock 1 PX 10000 NX LPUSH myList value 

And each worker simply checks to see if myListLock exists, we must wait at least for the key TTL before setting up mutex processing and starting to merge.

+3
source

Source: https://habr.com/ru/post/976519/


All Articles