How can I avoid this tripartite deadlock in Postgres?

I ran into a three-stage deadlock in Postgres, and I really don’t understand what is happening, what causes it. Log message,

Process 5671 waits for ExclusiveLock on tuple (33,12) of relation 31709 of database 16393; blocked by process 5652. Process 5652 waits for ShareLock on transaction 3382643; blocked by process 5670. Process 5670 waits for ExclusiveLock on tuple (33,12) of relation 31709 of database 16393; blocked by process 5671. Process 5671: UPDATE "user" SET seqno = seqno + 1, push_pending = true WHERE user_id = $1 RETURNING seqno Process 5652: UPDATE "user" SET seqno = seqno + 1, push_pending = true WHERE user_id = $1 RETURNING seqno Process 5670: UPDATE "user" SET seqno = seqno + 1, push_pending = true WHERE user_id = $1 RETURNING seqno 

(Relation 31709 is the "user" table. user_id in all three transactions.)

This is not like your dead end, which you see in the documentation . I am not updating several rows of this table out of order. I suspect that the RETURNING clause has something to do with it, but I don’t understand why. Any ideas on how I can solve this problem, or debug it further?

Update for Erwin's questions in the comments: This is Postgres 9.3. There are other commands in this transaction, although I don’t think they relate to the user table. There is one trigger in the table for updating the updated_at column with current_timestamp() :

 CREATE OR REPLACE FUNCTION update_timestamp() RETURNS TRIGGER AS $$ BEGIN NEW.updated_at = now(); RETURN NEW; END; $$ LANGUAGE plpgsql; 

I will study transaction information when I can reproduce it; I analyze magazines after the fact.

Update # 2: I rebuilt Postgres using LOCK_DEBUG and worked with trace_locks=on , trying to wrap my head around the order in which the locks are made.

Updated lock message:

 Process 54131 waits for ShareLock on transaction 4774; blocked by process 54157. Process 54157 waits for ExclusiveLock on tuple (1,16) of relation 18150 of database 18136; blocked by process 54131. 

I clearly see the lock on ExclusiveLock :

 2014-08-05 10:35:15 EDT apiary [54131] 2/316 4773 LOG: 00000: LockAcquire: new: lock(0x10f039f88) id(18136,18150,1,16,3,1) grantMask(0) req(0,0,0,0,0,0,0)=0 grant(0,0,0,0,0,0,0)=0 wait(0) type(ExclusiveLock) 2014-08-05 10:35:15 EDT apiary [54131] 2/316 4773 LOG: 00000: GrantLock: lock(0x10f039f88) id(18136,18150,1,16,3,1) grantMask(80) req(0,0,0,0,0,0,1)=1 grant(0,0,0,0,0,0,1)=1 wait(0) type(ExclusiveLock) 2014-08-05 10:35:15 EDT apiary [54157] 3/188 4774 LOG: 00000: LockAcquire: found: lock(0x10f039f88) id(18136,18150,1,16,3,1) grantMask(80) req(0,0,0,0,0,0,1)=1 grant(0,0,0,0,0,0,1)=1 wait(0) type(ExclusiveLock) 2014-08-05 10:35:15 EDT apiary [54157] 3/188 4774 LOG: 00000: WaitOnLock: sleeping on lock: lock(0x10f039f88) id(18136,18150,1,16,3,1) grantMask(80) req(0,0,0,0,0,0,2)=2 grant(0,0,0,0,0,0,1)=1 wait(0) type(ExclusiveLock) 

(Format date program [pid] virtualtxid txid msg )

But I don’t see where ShareLock is created or why transaction 4773 will block transaction 4774. I see similar results when querying the pg_locks table: there is always a transaction waiting on ShareLock another transaction that blocks the tuple from the first transaction. Any suggestions on how to dig a ShareLock source?

Update 3: I need to update the built-in function LOCK_DEBUG_ENABLED to unconditionally return true in order to see ShareLock creation. As soon as I did this, I began to see their creation:

 2014-08-05 12:53:22 EDT apiary [76705] 2/471 6294 LOG: 00000: LockAcquire: lock [6294,0] ExclusiveLock 2014-08-05 12:53:22 EDT apiary [76705] 2/471 6294 LOG: 00000: LockAcquire: new: lock(0x115818378) id(6294,0,0,0,4,1) grantMask(0) req(0,0,0,0,0,0,0)=0 grant(0,0,0,0,0,0,0)=0 wait(0) type(ExclusiveLock) 2014-08-05 12:53:22 EDT apiary [76705] 2/471 6294 LOG: 00000: GrantLock: lock(0x115818378) id(6294,0,0,0,4,1) grantMask(80) req(0,0,0,0,0,0,1)=1 grant(0,0,0,0,0,0,1)=1 wait(0) type(ExclusiveLock) 2014-08-05 12:53:22 EDT apiary [76706] 4/153 6295 LOG: 00000: LockAcquire: lock [6294,0] ShareLock 2014-08-05 12:53:22 EDT apiary [76706] 4/153 6295 LOG: 00000: LockAcquire: found: lock(0x115818378) id(6294,0,0,0,4,1) grantMask(80) req(0,0,0,0,0,0,1)=1 grant(0,0,0,0,0,0,1)=1 wait(0) type(ShareLock) 2014-08-05 12:53:22 EDT apiary [76706] 4/153 6295 LOG: 00000: WaitOnLock: sleeping on lock: lock(0x115818378) id(6294,0,0,0,4,1) grantMask(80) req(0,0,0,0,1,0,1)=2 grant(0,0,0,0,0,0,1)=1 wait(0) type(ShareLock) 

But I'm still not sure why ShareLock is being created, and why 6295 (in this case) should wait for 6294.

+6
source share
2 answers

Perhaps this is a dead end. Each table has a foreign key user_id in the user table. When you insert into a table with a foreign key constraint, postgres needs to lock the row of the table that it refers to to make sure that it is not deleted when you insert the row that refers to it (and break the FK constraints when committing). Now it should be a common castle.

It seems that all the inserts / updates for the tables that are related to the user also update the user seq in the user table after pasting in the main table. These updates require exclusive locks and are blocked by any public locks that are not part of the current transaction. If two occur simultaneously, they are at an impasse.

For example, two transaction inserts in media_size and the source can be blocked at the same time using something like this:

  T1 T2
 -------------------------------------------------- ---------------------
 1. Insert media size
 1a.  Excl lock media size row
 1b.  Shared Lock on user row (FK)
                                      2. Insert Source
                                      2a.  Excl lock source row
                                      2b.  Shared lock on user row (FK)

 3. Update user seq
 3a.  Excl Lock on user row (BLOCKS on 2b)
                                      4. Update user seq
                                      4a.  Excl Lock on user row (Blocks on 1b)
 5. Deadlock

I think that you first switch the seq update update step so that it makes sense, since it would cause T1 and T2 to lock before trying to get a common lock (which he won’t need, since it already has a lock lock).

+5
source

Assume the default transaction isolation level is Read Committed .

The UPDATE always blocks updated rows. Concurrent transactions trying to update the same row cannot continue until the first transaction is canceled or completed.

The RETURNING clause is orthogonal to the problem.

A dead end could have occurred without Process 5671 . This is the next transaction for the same line that appeared between them. Process 5670 and Process 5652 are deadlocks, indeed, most likely due to other instructions in the same transaction. Less likely candidates will be triggers on the table.

Try splitting transactions into smaller parts that update tables and rows of the same tables in the same order. Then they cannot be blocked.

Foreign keys

Since you mention foreign keys in a later comment: they too can play a role in the dead end. Postgres 9.3 introduced new blocking levels to solve this problem:

FOR KEY SHARE and FOR NO KEY UPDATE

Details in this blog post by Michael Paquier.

What should make FK constants less of a problem. However, it is possible.

Latest issue item

Since the creation of version 9.3.0, several minor fixes have been fixed for locking mechanisms. Upgrade to the latest version, this may help.

View current locks

In response to your comment: You can find (most) currently held locks in the pg_locks system directory. Be sure to read the manual and possibly more before drawing conclusions.

+4
source

Source: https://habr.com/ru/post/973325/


All Articles