I ran into a three-stage deadlock in Postgres, and I really don’t understand what is happening, what causes it. Log message,
Process 5671 waits for ExclusiveLock on tuple (33,12) of relation 31709 of database 16393; blocked by process 5652. Process 5652 waits for ShareLock on transaction 3382643; blocked by process 5670. Process 5670 waits for ExclusiveLock on tuple (33,12) of relation 31709 of database 16393; blocked by process 5671. Process 5671: UPDATE "user" SET seqno = seqno + 1, push_pending = true WHERE user_id = $1 RETURNING seqno Process 5652: UPDATE "user" SET seqno = seqno + 1, push_pending = true WHERE user_id = $1 RETURNING seqno Process 5670: UPDATE "user" SET seqno = seqno + 1, push_pending = true WHERE user_id = $1 RETURNING seqno
(Relation 31709 is the "user" table. user_id in all three transactions.)
This is not like your dead end, which you see in the documentation . I am not updating several rows of this table out of order. I suspect that the RETURNING clause has something to do with it, but I don’t understand why. Any ideas on how I can solve this problem, or debug it further?
Update for Erwin's questions in the comments: This is Postgres 9.3. There are other commands in this transaction, although I don’t think they relate to the user table. There is one trigger in the table for updating the updated_at column with current_timestamp() :
CREATE OR REPLACE FUNCTION update_timestamp() RETURNS TRIGGER AS $$ BEGIN NEW.updated_at = now(); RETURN NEW; END; $$ LANGUAGE plpgsql;
I will study transaction information when I can reproduce it; I analyze magazines after the fact.
Update # 2: I rebuilt Postgres using LOCK_DEBUG and worked with trace_locks=on , trying to wrap my head around the order in which the locks are made.
Updated lock message:
Process 54131 waits for ShareLock on transaction 4774; blocked by process 54157. Process 54157 waits for ExclusiveLock on tuple (1,16) of relation 18150 of database 18136; blocked by process 54131.
I clearly see the lock on ExclusiveLock :
2014-08-05 10:35:15 EDT apiary [54131] 2/316 4773 LOG: 00000: LockAcquire: new: lock(0x10f039f88) id(18136,18150,1,16,3,1) grantMask(0) req(0,0,0,0,0,0,0)=0 grant(0,0,0,0,0,0,0)=0 wait(0) type(ExclusiveLock) 2014-08-05 10:35:15 EDT apiary [54131] 2/316 4773 LOG: 00000: GrantLock: lock(0x10f039f88) id(18136,18150,1,16,3,1) grantMask(80) req(0,0,0,0,0,0,1)=1 grant(0,0,0,0,0,0,1)=1 wait(0) type(ExclusiveLock) 2014-08-05 10:35:15 EDT apiary [54157] 3/188 4774 LOG: 00000: LockAcquire: found: lock(0x10f039f88) id(18136,18150,1,16,3,1) grantMask(80) req(0,0,0,0,0,0,1)=1 grant(0,0,0,0,0,0,1)=1 wait(0) type(ExclusiveLock) 2014-08-05 10:35:15 EDT apiary [54157] 3/188 4774 LOG: 00000: WaitOnLock: sleeping on lock: lock(0x10f039f88) id(18136,18150,1,16,3,1) grantMask(80) req(0,0,0,0,0,0,2)=2 grant(0,0,0,0,0,0,1)=1 wait(0) type(ExclusiveLock)
(Format date program [pid] virtualtxid txid msg )
But I don’t see where ShareLock is created or why transaction 4773 will block transaction 4774. I see similar results when querying the pg_locks table: there is always a transaction waiting on ShareLock another transaction that blocks the tuple from the first transaction. Any suggestions on how to dig a ShareLock source?
Update 3: I need to update the built-in function LOCK_DEBUG_ENABLED to unconditionally return true in order to see ShareLock creation. As soon as I did this, I began to see their creation:
2014-08-05 12:53:22 EDT apiary [76705] 2/471 6294 LOG: 00000: LockAcquire: lock [6294,0] ExclusiveLock 2014-08-05 12:53:22 EDT apiary [76705] 2/471 6294 LOG: 00000: LockAcquire: new: lock(0x115818378) id(6294,0,0,0,4,1) grantMask(0) req(0,0,0,0,0,0,0)=0 grant(0,0,0,0,0,0,0)=0 wait(0) type(ExclusiveLock) 2014-08-05 12:53:22 EDT apiary [76705] 2/471 6294 LOG: 00000: GrantLock: lock(0x115818378) id(6294,0,0,0,4,1) grantMask(80) req(0,0,0,0,0,0,1)=1 grant(0,0,0,0,0,0,1)=1 wait(0) type(ExclusiveLock) 2014-08-05 12:53:22 EDT apiary [76706] 4/153 6295 LOG: 00000: LockAcquire: lock [6294,0] ShareLock 2014-08-05 12:53:22 EDT apiary [76706] 4/153 6295 LOG: 00000: LockAcquire: found: lock(0x115818378) id(6294,0,0,0,4,1) grantMask(80) req(0,0,0,0,0,0,1)=1 grant(0,0,0,0,0,0,1)=1 wait(0) type(ShareLock) 2014-08-05 12:53:22 EDT apiary [76706] 4/153 6295 LOG: 00000: WaitOnLock: sleeping on lock: lock(0x115818378) id(6294,0,0,0,4,1) grantMask(80) req(0,0,0,0,1,0,1)=2 grant(0,0,0,0,0,0,1)=1 wait(0) type(ShareLock)
But I'm still not sure why ShareLock is being created, and why 6295 (in this case) should wait for 6294.