How to handle missed event numbers using Paxos?

If we run multi-paxos, then node can see:

Propose(N) Accept!(N,Vn) Accept!(N+1,Vm) Accept!(N+4,Vo) // huh? where is +2, +3? Accept!(N+5,Vp) 

This may be due to the fact that:

  • There was a stable leader, but the network local to this node dropped another delay of +2 and +3.
  • There was a failure, so there were two attempts to offer such that +2 and +3 were unsuccessful round offers.

In general, operations on a distributed end state machine do not commute, so a node must apply all operations in order. This means that the node must be able to distinguish between two cases. If these are unsuccessful offer rounds, node has no problem. If these are lost messages, this indicates that the node should wait until they return, try restoring the lost data (for example, request a snapshot to reinitialize and intercept).

What are the options or strategies for this and what kind of service data do they create?

This question is inspired In Paxos, can an acceptor take on a different meaning after it has already accepted it?

+2
source share
1 answer

I can think of two methods to deal with this.

The easiest way would be for the node to be absent +2 and +3 to go back and try to suggest no-ops in these slots. If there were solutions, node would examine the data in the preparation round. Otherwise, no-ops decisions will be made.

Another approach would be to have an out-of-band retraining process. This may be necessary in any case: how can a node catch up if it joins the system after others?

Or you can use a combination of both. The leader can offer no-ops for any holes in his story, others can use the re-learning process. This is how my paxos system works.

+1
source

Source: https://habr.com/ru/post/955348/


All Articles