How do biphasic fixes prevent last second failure?

I am studying how a two-phase latch works in a distributed transaction. As far as I understand, in the last part of the stage, the transaction coordinator asks each node whether it is ready to commit. If everyone agrees, then this tells them to go forward and commit.

What prevents the next failure?

  • All nodes respond that they are ready to commit
  • The transaction coordinator tells them, β€œGo ahead and commit,” but one of the nodes fails before receiving this message
  • All other nodes succeed, but now the distributed transaction is corrupted.
  • I understand that when a node failure returns, its transaction will be canceled (since it never received a commit message)

I assume that each node works with a normal database that knows nothing about distributed transactions. What did I miss?

+57
database distributed-transactions
Oct 05 '08 at 12:12
source share
5 answers

Summarizing all the answers:

  1. You cannot use regular databases with distributed transactions. The database must explicitly support the transaction coordinator.

  2. No instructions were given to the nodes to roll back, because some of the nodes are already committed. The following happens: when the failed node returns, the transaction coordinator tells him to complete the commit.

+18
Feb 13 '09 at 3:49
source share

No, they are not instructed to roll back, because in the original script of the posters some of the nodes have already been completed. What happens when a node failure becomes available, the transaction coordinator tells him that he is committing the transaction again.

Since node responded positively to the prepare phase, it should be able to commit, even when it returns due to a failure.

+34
Oct 05 '08 at 12:23
source share

No. Point 4 is incorrect. Each node writes to stable storage that it was able to commit the transaction or cancel the transaction so that it could execute commands, even if they were related to failures. When the emergency node file returns, it must understand that it has the transaction in a pre-committed state, restores any corresponding locks or other controls, and then tries to contact the coordinator's website to obtain the status of the transaction.

Problems only arise if the damaged node never returns (then everything else believes that the transaction was in order or will happen when the node fails).

+16
Oct 05 '08 at 12:19
source share

Two-phase fixation is not reliable and is designed to work in 99% of cases.

"The protocol assumes that each node has a stable repository with a log with a record, that no node will work forever, that the data in the log with a log is never lost or damaged in case of failure, and that any two nodes can interact with each other."

http://en.wikipedia.org/wiki/Two-phase_commit_protocol

+9
Oct 05 '08 at 12:22
source share

There are many ways to attack two-phase commit problems. Almost all of them end as a variant of the Paxos three-phase fixation algorithm. Mike Burroughs, who developed Google’s Chubby lock service based on Paxos, said that in the lecture I saw there are two types of distributed commit algorithms - Paxos and Wrong.

One thing node has encountered, maybe when it wakes up, it says: "I never heard of this transaction if it were committed?" to a coordinator who will tell him what voting is.

Keep in mind that this is an example of a more general problem: a broken node can skip many transactions before recovering them. Therefore, it is very important that after recovery he should have a conversation with either the coordinator or another replica before making himself available. If the node itself cannot say whether it crashed, then everything becomes more attractive, but still accommodating.

If you use a quorum system to read the database, the inconsistency will be masked (and known to the database itself).

+6
Oct 05 '08 at 12:43
source share



All Articles