Atomic UPSERT in SQL Server 2005

Question

Atomic UPSERT in SQL Server 2005

What is the correct template for atomic "UPSERT" (UPDATE where exists, INSERT otherwise) in SQL Server 2005?

I see a lot of SO code (for example, see Check if a line exists, otherwise insert ) with the following two-part pattern:

UPDATE ... FROM ... WHERE <condition> -- race condition risk here IF @@ROWCOUNT = 0 INSERT ...

or

 IF (SELECT COUNT(*) FROM ... WHERE <condition>) = 0 -- race condition risk here INSERT ... ELSE UPDATE ...

where <condition> is the evaluation of natural keys. None of the above approaches seem to handle concurrency. If I cannot have two lines with the same natural key, it seems that all the above risks insert rows with the same natural keys in the race conditions scripts.

I use the following approach, but I am surprised that I do not see it in the answers of people, so I wonder what is wrong with it:

 INSERT INTO <table> SELECT <natural keys>, <other stuff...> FROM <table> WHERE NOT EXISTS -- race condition risk here? ( SELECT 1 FROM <table> WHERE <natural keys> ) UPDATE ... WHERE <natural keys>

Please note that the race condition mentioned here is different from the conditions of the previous code. In earlier code, the problem was phantom reads (rows are inserted between UPDATE / IF or between SELECT / INSERT another session). In the above code, the race condition is associated with DELETE. Is it possible for the corresponding row to be deleted by another session AFTER it is executed (WHERE NOT EXISTS), but before the INSERT? It is unclear where WHERE NOT EXISTS puts a lock on anything due to UPDATE.

Is it an atom? I cannot find where this will be documented in the SQL Server documentation.

EDIT: I understand that this can be done with transactions, but I think I need to set the transaction level to SERIALIZABLE to avoid the problem of reading phantom? Is it too difficult for such a common problem?

+46

sql-server atomic sql-server-2005 upsert

rabidpebble Mar 26 '10 at 10:10

source share

5 answers

Remus Rusanu · Answer 1 · 2010-03-26 17:29

 INSERT INTO <table> SELECT <natural keys>, <other stuff...> FROM <table> WHERE NOT EXISTS -- race condition risk here? ( SELECT 1 FROM <table> WHERE <natural keys> ) UPDATE ... WHERE <natural keys>

in the first INSERT there is a race condition. The key cannot exist during the internal SELECT query, but exists during INSERT, which leads to a violation of the key.
there is a race condition between INSERT and UPDATE. A key can exist when it is checked in an internal INSERT query, but has gone by the time UPDATE ends.

For the second condition of the race, it can be argued that the key would have been deleted in any case by a parallel stream, so this is not really a lost update.

The optimal solution usually is to try out the most likely case and handle the error if it fails (inside the transaction, of course):

If the key is probably missing, always insert it first. Refer to a unique violation of the restriction, cancel the update.
If the key is probably present, always update first. Insert if row is not found. Handle a possible violation of a unique restriction, refuse to update.

In addition to being correct, this pattern is also optimal for speed: it is more efficient to try to insert and handle an exception than to make false locks. Locks mean reading logical pages (which may mean reading physical pages), and IO (even logical) is more expensive than SEH.

Update @Peter

Why is not one atomic operator? Say we have a trivial table:

 create table Test (id int primary key);

Now, if I were to run this single operator from two threads, in the loop it would be “atomic”, as you say, there cannot exist a race condition:

  insert into Test (id) select top (1) id from Numbers n where not exists (select id from Test where id = n.id);

However, a violation of the primary key occurs in a couple of seconds:

Msg 2627, Level 14, State 1, Line 4
Violation of the PRIMARY KEY constraint "PK__Test__24927208". Unable to insert duplicate key in dbo.Test object.

Why? You are right that in terms of SQL queries there will be a “right thing” on DELETE ... FROM ... JOIN , on WITH cte AS (SELECT...FROM ) DELETE FROM cte and in many other cases. But in these cases there is a crucial difference: the "subquery" refers to the purpose of the update or delete operation. For such cases, the corresponding lock will actually be used in the query plan, in fact, this behavior is crucial in some cases, for example, when implementing queues. Using tables as queues .

But in the original question, as in my example, the subquery is considered by the query optimizer as a subquery in the query, and not as a special request such as "check for update", which requires special protection against blocking. As a result, the execution of the subquery can be observed as a separate operation of the contemplative observer, thereby violating the "atomic" behavior of the operator. Unless special precautions are taken, multiple threads may try to insert the same value, both are convinced that they checked, and the value does not yet exist. Only one can succeed, the other will be in violation of the PC. Q.E.D.

Cassius Porcus · Answer 2 · 2010-04-15 13:03

When checking for the existence of a string, pass hints, locks, locks, locks. Holdlock ensures that all inserts are serialized; rowlock allows simultaneous updating of existing rows.

Updates may still block if your PK is bigint, since internal hashing is degenerate for 64-bit values.

 begin tran -- default read committed isolation level is fine if not exists (select * from <table> with (updlock, rowlock, holdlock) where <PK = ...> -- insert else -- update commit

Peter Radocchia · Answer 3 · 2010-03-27 01:01

EDIT : Remus is correct, conditional insert w / where clause does not guarantee consistent state between correlated subquery and table.

Perhaps the correct table hints can lead to a consistent state. INSERT <table> WITH (TABLOCKX, HOLDLOCK) seems to work, but I have no idea if this is the optimal lock level for conditional insertion.

In a trivial test, such as described by Remus, TABLOCKX, HOLDLOCK showed ~ 5x insertion volume without table hints and without PK errors or course errors.

ORIGINAL RESPONSE, INCORRECT:

Is it an atom?

Yes, the w / where conditional insert is atomic, and your INSERT ... WHERE NOT EXISTS() ... UPDATE form is the right way to do UPSERT.

I would add IF @@ROWCOUNT = 0 between INSERT and UPDATE:

 INSERT INTO <table> SELECT <natural keys>, <other stuff...> WHERE NOT EXISTS -- no race condition here ( SELECT 1 FROM <table> WHERE <natural keys> ) IF @@ROWCOUNT = 0 BEGIN UPDATE ... WHERE <natural keys> END

Single statements are always executed inside a transaction, either their own ( autocommit , and implicitly ) or together with other statements ( explicit ).

Marcelo Cantos · Answer 4 · 2010-03-26 11:57

One trick I've seen is to try INSERT, and if it doesn't work, do UPDATE.

thijs · Answer 5 · 2010-03-26 12:45

You can use application locks: (sp_getapplock) http://msdn.microsoft.com/en-us/library/ms189823.aspx

Atomic UPSERT in SQL Server 2005

More articles: