Invalid PostgreSQL query results with explicit locks and concurrent transactions

Question

Invalid PostgreSQL query results with explicit locks and concurrent transactions

When writing some SQL queries for PostgreSQL, I found some unusual behavior that seems a little disturbing to me.

Let's say we have the following table "test":

+----+-------+---------------------+ | id | value | created_at | +----+-------+---------------------+ | 1 | A | 2014-01-01 00:00:00 | | 2 | A | 2014-01-02 00:00:00 | | 3 | B | 2014-01-03 00:00:00 | | 4 | B | 2014-01-04 00:00:00 | | 5 | A | 2014-01-05 00:00:00 | | 6 | B | 2014-01-06 00:00:00 | | 7 | A | 2014-01-07 00:00:00 | | 8 | B | 2014-01-08 00:00:00 | +----+-------+---------------------+

In parallel, two transactions are performed: A and B.

 A: begin; /* Begin transaction A */ B: begin; /* Begin transaction B */ A: select * from test where id = 1 for update; /* Lock one row */ B: select * from test where value = 'B' order by created_at limit 3 for update; /* This query returns immediately since it does not need to return row with id=1 */ B: select * from test where value = 'A' order by created_at limit 3 for update; /* This query blocks because row id=1 is locked by transaction A */ A: update test set created_at = '2014-01-09 00:00:00' where id = 1; /* Modify the locked row */ A: commit;

As soon as transaction A commits and frees a line with id = 1, a blocked transaction request B returns the following result:

 +----+-------+---------------------+ | id | value | created_at | +----+-------+---------------------+ | 1 | A | 2014-01-09 00:00:00 | | 2 | A | 2014-01-02 00:00:00 | | 5 | A | 2014-01-05 00:00:00 | +----+-------+---------------------+

These lines, of course, are not sorted by "created_at", and the line with id = 1 should not even be among the returned lines. The fact that transactions A and B were performed simultaneously led to incorrect results in transaction B, which would not have happened if transactions had been executed one after another. This is like breaking transaction isolation.

Is this a mistake?

If this is not a mistake, and are these results expected, what does this mean in terms of the reliability of the results returned by the database? If I had a very parallel environment and the following code relied on strings that are actually ordered by date, there would be errors.

If, however, we execute the same sequence of instructions as above, but replace the update statement as follows:

 update test set value = 'B', created_at = '2014-01-09 00:00:00' where id = 1;

... then a blocked request returns the correct result:

 +----+-------+---------------------+ | id | value | created_at | +----+-------+---------------------+ | 2 | A | 2014-01-02 00:00:00 | | 5 | A | 2014-01-05 00:00:00 | | 7 | A | 2014-01-07 00:00:00 | +----+-------+---------------------+

In this case , is the blocking request executed twice since its original result is invalid?

I'm most interested in PostgreSQL, but I would also like to know if this is the case with other RDBMSs that support row-level locking, such as Oracle, SQL Server, and MySQL.

+5

sql locking postgresql transactions

Jaan Oct 6 '14 at 16:31

source share

1 answer

Mike Sherrill 'Cat Recall' · Accepted Answer · 2014-10-06T17:49:00+0000

Here are a few things going on. The first is documented behavior. Secondly, you do not see the whole story because you did not try to update anything in the "B" session.

This is like breaking transaction isolation.

Depends on what level of isolation you work at. PostgreSQL's default transaction isolation level is READ COMMITTED .

This is document behavior in PostgreSQL.

The SELECT command executed at the READ COMMITTED transaction isolation level and using the ORDER BY and lock conditions to return rows from the system. This is because ORDER BY is applied first. The command sorts the result, but then may block the attempt to get to block one or more lines. After SELECT unlocks, some of the column values of the order can be changed, which will cause these rows to appear inoperative (although they are okay in terms of the original column values).

One workaround (also documented, same link) is to move FOR UPDATE into a subquery, but this requires a table lock.

To see what PostgreSQL really does in this situation, run the update in session "B".

 create table test ( id integer primary key, value char(1) not null, created_at timestamp not null ); insert into test values (1, 'A', '2014-01-01 00:00:00'), (2, 'A', '2014-01-02 00:00:00'), (3, 'B', '2014-01-03 00:00:00'), (4, 'B', '2014-01-04 00:00:00'), (5, 'A', '2014-01-05 00:00:00'), (6, 'B', '2014-01-06 00:00:00'), (7, 'A', '2014-01-07 00:00:00'), (8, 'B', '2014-01-08 00:00:00');

  A: begin;  / * Begin transaction A * /
 B: begin;  / * Begin transaction B * /
 A: select * from test where id = 1 for update;  / * Lock one row * /
 B: select * from test where value = 'B' order by created_at limit 3 for update;  / * This query returns immediately since it does not need to return row with id = 1 * /
 B: select * from test where value = 'A' order by created_at limit 3 for update;  / * This query blocks because row id = 1 is locked by transaction A * /
 A: update test set created_at = '2014-01-09 00:00:00' where id = 1;  / * Modify the locked row * /
 A: commit;
 B: update test set value = 'C' where id in (select id from test where value = 'A' order by created_at limit 3);  / * Updates 3 rows * /
 B: commit;

Now look at the table.

  scratch = # select * from test order by id;
  id |  value |  created_at      
 ---- + ------- + ---------------------
   1 |  A |  2014-01-09 00:00:00
   2 |  C |  2014-01-02 00:00:00
   3 |  B |  2014-01-03 00:00:00
   4 |  B |  2014-01-04 00:00:00
   5 |  C |  2014-01-05 00:00:00
   6 |  B |  2014-01-06 00:00:00
   7 |  C |  2014-01-07 00:00:00
   8 |  B |  2014-01-08 00:00:00

Session "A" succeeded in updating row with identifier 1 to "2014-01-09". Session "B" succeeded in updating the three remaining lines, the value of which was "A". The update operator received locks by identifier numbers 2, 5 and 7; we know that since these lines were actually updated. An earlier select statement blocked different lines — lines 1, 2, and 5.

You can block session B update if you start a third terminal session, and block line 7 to update.

Invalid PostgreSQL query results with explicit locks and concurrent transactions

More articles: