What to do if the leader does not work in Multi-Paxos for master-slave systems?

Backgound:

Section 3, titled “Implementing a State Machine, Lamport Paxos Made Simple Paper,” describes Multi-Paxos. Multi-Paxos is used on Google Paxos Made Live . ( Multi-Paxos is used in Apache ZooKeeper ). Multi-Paxos may display spaces. :

In the general case, suppose that a leader can get the α teams in front, that is, he can offer teams i + 1 through i + α teams after choosing teams 1 through i . Then a gap may arise up to the α - 1 commands.

Now consider the following scenario:

The whole system uses a master-slave architecture. Only the wizard serves client teams. The master and followers reach consensus on the sequence of commands through Multi-Paxos. The master is a leader in instances of Multi-Paxos. Suppose now that the leader and his two subordinates have states (commands selected), shown in the following figure:

Master and slaves .

Note that in the main state there is more than one space. Due to asynchrony, two slaves are lagging behind. The wizard does not work at this time.

Problem:

  • What should the followers do after they discover the failure of the master (for example, using the heartbeat mechanism)?

  • In particular, how to handle spaces and missing commands relative to the old master?

Update about Zab:

As @sbridges noted, ZooKeeper uses Zab instead of Paxos. We quote

Zab is primarily intended for primary backups (for example, for master-slave systems) such as ZooKeeper, and not for state machine replication.

It seems that Zab is closely related to my issues listed above. According to a brief Zab review article , the Zab protocol consists of two modes: recovery and translation. In recovery mode, two specific guarantees are fulfilled: never forget about perfect messages and skip messages that are skipped. My confusion regarding Zab:

  1. In recovery mode, does Zab also suffer from space problems? If so, what does Zab do?
+6
source share
4 answers

The gap must be instances of Paxos that have not reached agreement. In a Paxos Made Simple document, a space is filled with the suggestion of a special "no-op" command that leaves the state unchanged.

If you care about the order of the selected values ​​for Paxos instances, you are better off using Zab because Paxos does not preserve the causal order. https://cwiki.apache.org/confluence/display/ZOOKEEPER/PaxosRun

An invalid command must be instances of Paxos that have reached agreement but have not been studied by the student. The value is unchanged since it was accepted by the acceptor quorum. When you start the paxos instance of this instance ID, the value will be selected and restored to the same value in step 1b.

When followers / followers find a failure on the Leader, or the Leader loses support for a quorum of subordinates / subordinates, they must choose a new leader.

There should be no spaces in the zookeeper, as the follower contacts the TCP leader, which saves the FIFO.

In recovery mode, after selecting a leader, the follower first synchronizes with the leader and applies the modification to the state until it receives NEWLEADER.

In broadcast mode, the follower queues the OFFER in pendingTxns and waits for COMMIT in the same order. If the zxid from COMMIT does not match the zxid of the pending TXns chapter, the follower will exit.

https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0

+2
source

Multi-Paxos Used in Apache ZooKeeper

Zookeeper uses zab, not paxos. See this link for a difference .

In particular, each zookeeper node in the ensemble captures updates in the same order as all other nodes,

Unlike client requests, state updates must be applied in the exact initial order of primary generation, starting from the original initial state of the primary. If the primary failure is, a new recovery run cannot arbitrarily reorder the uncommitted update state or apply them starting from a different initial state.

+1
source

In particular, the ZAB document says that the newly elected leader is taking the opening to find out the next number of the era to be installed, and who has the most updated commit history. The follower writes the ACK-E message, which indicates the maximum contiguous zxid that he saw. He then says that he is conducting a phase of synchronization when he conveys a state that the followers they missed. He notes that in an interesting optimization you only need to choose a leader who has the most recent commit history.

With Paxos, you do not need to resolve spaces . If you allow spaces, then Paxos Made Simple explains how to resolve them from page 9. The new leader knows the last committed value that he saw, and possibly some of the fixed values ​​above. He explores the slots from the lowest gap that he knows about by completing phase 1 to suggest these slots. If there are values ​​in these slots, he performs step 2 to correct these values, but if he is free to set the value, he sets the value to no-op. In the end, he gets into the slot number, where there were no suggested values, and it works as usual.

Answering your questions:

  • What should the followers do after they discover the failure of the master (for example, using the beating mechanism)?

They should try to hold after a randomized delay in order to try to reduce the risk of two candidates offering at the same time who will spend messages and disk flushes, as only one can result. The leader’s randomized timeout is well documented in the Raft document; The same approach can be used for Paxos.

  1. In particular, how to handle spaces and missing commands compared to the old master?

The new leader must investigate and correct the gaps either to the highest value offered in this slot, or without surgery until he fills the gaps and then can lead to a normal state.

+1
source

@Hailin's answer explains the gap problem as follows:

There should be no spaces in the zookeeper, as the follower communicates with the TCP leader, which saves FIFO "

To add:

In the article, A Simple Fully Ordered Broadcast Protocol mentions that ZooKeeper requires a prefix property:

If $ m $ is the last message to the leader $ L $, any message offered up to $ m $ on $ L $ should also be delivered.

This property is mainly based on the TCP mechanism used in Zab. The Zab Wiki mentions that the Zab implementation should follow the following assumption (among others):

Servers must process packets in the order in which they are received. Since TCP supports ordering when sending packets, this means that packets will be processed in the order specified by the sender.

0
source

Source: https://habr.com/ru/post/955346/


All Articles