SQL INSERT, but avoid duplication

Question

SQL INSERT, but avoid duplication

I want to do some quick insertions, but avoid duplicates in the table. For argument, let's call him MarketPrices, I experimented with two ways of doing this, but I'm not sure how to test which will be faster.

INSERT INTO MarketPrices (SecurityCode, BuyPrice, SellPrice, IsMarketOpen) SELECT @SecurityCode, @BuyPrice, @SellPrice, @IsMarketOpen EXCEPT SELECT SecurityCode, BuyPrice, SellPrice, j.bool as IsActive FROM MarketPrices CROSS JOIN (SELECT 0 as bool UNION SELECT 1 as bool ) as j

OR

 DECLARE @MktId int SET @MktId = (SELECT SecurityId FROM MarketPrices where SecurityCode = @SecurityCode and BuyPrice=@BuyPrice and SellPrice = @SellPrice) IF (@MktId is NULL) BEGIN INSERT INTO MarketPrices (SecurityCode, BuyPrice, SellPrice, IsMarketOpen) VALUES (@SecurityCode,@BuyPrice, @SellPrice, @IsMarketOpen) END

Suppose that @whatever is an input parameter in a stored procedure.

I want to be able to insert a new entry for each SecurityCode when BuyPrice or SellPrice or both are different from any other previous event. I don't care about IsMarketOpen.

Is there anything dumb about any of the above approaches? Faster than the other?

+10

sql sql-server sql-server-2005

Ravi Nov 06 '09 at 16:17

source share

6 answers

EDIT : To prevent race conditions in a parallel environment, use WITH (UPDLOCK) in the correlated subquery.

I think this will be the standard method:

 INSERT INTO MarketPrices (SecurityCode, BuyPrice, SellPrice, IsMarketOpen) SELECT @SecurityCode, @BuyPrice, @SellPrice, @IsMarketOpen WHERE NOT EXISTS ( SELECT * FROM MarketPrices WITH (UPDLOCK) WHERE SecurityCode = @SecurityCode AND BuyPrice = @BuyPrice AND SellPrice = @SellPrice )

If any of your fields is NULL, you should add this condition.

Your first method is interesting, but the EXCEPT requirements you jump through hoops. This method is essentially the same, but it causes a column problem.

As an alternative:

 INSERT INTO MarketPrices (SecurityCode, BuyPrice, SellPrice, IsMarketOpen) SELECT SecurityCode, BuyPrice, SellPrice, @IsMarketOpen FROM ( SELECT @SecurityCode, @BuyPrice, @SellPrice EXCEPT SELECT SecurityCode, BuyPrice, SellPrice FROM MarketPrices WITH (UPDLOCK) ) a (SecurityCode, BuyPrice, SellPrice)

In this case, the good thing about EXCEPT is that it processes NULL without any additional coding on your part. To achieve the same as in the first example, you will need to test each pair for NULL, and also for equality, for a long time.

Your second method is fine, but you don't need a variable. See Tomalak Solution, he cleaned it well. In addition, you will need to explicitly handle the possibility of simultaneous inserts if this was a problem.

+6

Peter Radocchia Nov 06 '09 at 16:33

source share

Anytime I would choose a semantic solution. Your two sentences seem vague to me (although the latter is better than the former).

 IF NOT EXISTS ( SELECT 1 FROM MarketPrices WHERE SecurityCode = @SecurityCode AND BuyPrice = @BuyPrice AND SellPrice = @SellPrice ) BEGIN INSERT MarketPrices (SecurityCode, BuyPrice, SellPrice, IsMarketOpen) VALUES (@SecurityCode, @BuyPrice, @SellPrice, @IsMarketOpen) END

With a conglomerate index over SecurityCode, BuyPrice, SellPrice an EXISTS request should go fast enough.

Benchmarking is a matter of WHILE time synchronization, I would say. Check it out and see for yourself.

+3

Tomalak Nov 06 '09 at 16:33

source share

Another option: create a unique index in the fields (SecurityCode, BuyPrice, SellPrice), put a simple insert and let the database decide whether records are duplicated. Paste will fail when trying to insert a duplicate.

Using code (whether it be an external language or an SQL process) to ensure uniqueness is not strict enough and will ultimately lead to duplicates that you hope to prevent.

+2

mlibby Nov 06 '09 at 16:50

source share

Below I have added the top answers from Just adding a line, if it does not already exist , for Peter Radochchii an excellent answer.

The conclusion is that using the race safe with try/catch method is slightly (~ 1%) faster than the race safe with updlock, holdlock when there are no real collisions (i.e. you expect collisions to be very rare - this uniques script) and a bit slower (~ 20%) when there are always collisions (this is a duplicates script). This does not take into account complex problems, such as escalation locks.

Here are the results (SQL Server 2014, build 12.0.2000.8):

 duplicates (short table) try/catch: 15546 milliseconds / 100000 inserts conditional insert: 1460 milliseconds / 100000 inserts except: 1490 milliseconds / 100000 inserts merge: 1420 milliseconds / 100000 inserts race safe with try/catch: 1650 milliseconds / 100000 inserts race safe with updlock, holdlock: 1330 milliseconds / 100000 inserts uniques try/catch: 2266 milliseconds / 100000 inserts conditional insert: 2156 milliseconds / 100000 inserts except: 2273 milliseconds / 100000 inserts merge: 2136 milliseconds / 100000 inserts race safe with try/catch: 2400 milliseconds / 100000 inserts race safe with updlock, holdlock: 2430 milliseconds / 100000 inserts straight insert: 1686 milliseconds / 100000 inserts duplicates (tall table) try/catch: 15826 milliseconds / 100000 inserts conditional insert: 1530 milliseconds / 100000 inserts except: 1506 milliseconds / 100000 inserts merge: 1443 milliseconds / 100000 inserts race safe with try/catch: 1636 milliseconds / 100000 inserts race safe with updlock, holdlock: 1426 milliseconds / 100000 inserts

Duplicates section (short table):

 declare @x int, @y int, @now datetime, @duration int select @x = 1, @y = 0, @now = getdate() while @y < 100000 begin set @y = @y+1 begin try insert #temp select @x where not exists (select * from #temp where col1 = @x) end try begin catch if error_number() <> 2627 throw end catch end set @duration = datediff(ms,@now,getdate()) raiserror('duplicates (short table), race safe with try/catch: %i milliseconds / %i inserts',-1,-1,@duration,@y) with nowait go declare @x int, @y int, @now datetime, @duration int select @x = 1, @y = 0, @now = getdate() while @y < 100000 begin set @y = @y+1 insert #temp select @x where not exists (select * from #temp with (updlock, holdlock) where col1 = @x) end set @duration = datediff(ms,@now,getdate()) raiserror('duplicates (short table), race safe with updlock, holdlock: %i milliseconds / %i inserts',-1,-1,@duration, @y) with nowait go

The Reflections Section

 truncate table #temp declare @x int, @now datetime, @duration int select @x = 0, @now = getdate() while @x < 100000 begin set @x = @x+1 begin try insert #temp select @x where not exists (select * from #temp where col1 = @x) end try begin catch if error_number() <> 2627 throw end catch end set @duration = datediff(ms,@now,getdate()) raiserror('uniques, race safe with try/catch: %i milliseconds / %i inserts',-1,-1,@duration, @x) with nowait go truncate table #temp declare @x int, @now datetime, @duration int select @x = 0, @now = getdate() while @x < 100000 begin set @x = @x+1 insert #temp select @x where not exists (select * from #temp with (updlock, holdlock) where col1 = @x) end set @duration = datediff(ms,@now,getdate()) raiserror('uniques, race safe with updlock, holdlock: %i milliseconds / %i inserts',-1,-1,@duration, @x) with nowait go

Duplicate Section (high table)

 declare @x int, @y int, @now datetime, @duration int select @x = 1, @y = 0, @now = getdate() while @y < 100000 begin set @y = @y+1 begin try insert #temp select @x where not exists (select * from #temp where col1 = @x) end try begin catch if error_number() <> 2627 throw end catch end set @duration = datediff(ms,@now,getdate()) raiserror('duplicates (tall table), race safe with try/catch: %i milliseconds / %i inserts',-1,-1,@duration,@y) with nowait go declare @x int, @y int, @now datetime, @duration int select @x = 1, @y = 0, @now = getdate() while @y < 100000 begin set @y = @y+1 insert #temp select @x where not exists (select * from #temp with (updlock, holdlock) where col1 = @x) end set @duration = datediff(ms,@now,getdate()) raiserror('duplicates (tall table), race safe with updlock, holdlock: %i milliseconds / %i inserts',-1,-1,@duration, @y) with nowait go

+1

Jared Moore May 18 '15 at 20:35

source share

if you don't need to duplicate duplicates, you can always create a unique index with "ignore duplicates" set to true. SQL Server will take care of this for you.

0

IamIC Dec 11 '10 at 8:26

source share

Peter Radocchia · Accepted Answer · 2009-11-06 17:31

EDIT : To prevent race conditions in parallel environments, use WITH (UPDLOCK) in the correlated subquery or EXCEPT 'd SELECT . The script test that I wrote below does not require this, since it uses temporary tables that are visible only to the current join, but in a real environment working against user tables, this would be necessary.

MERGE does not require UPDLOCK .

Inspired by the answer mcl re: a unique index and let the database throw an error, I decided to compare conditional inserts and try / catch .

The results seem to support conditional insertion on top of try / catch, but YMMV. This is a very simple script (single column, small table, etc.) executed on the same machine, etc.

Here are the results (SQL Server 2008, build 10.0.1600.2):

 duplicates (short table) try/catch: 14440 milliseconds / 100000 inserts conditional insert: 2983 milliseconds / 100000 inserts except: 2966 milliseconds / 100000 inserts merge: 2983 milliseconds / 100000 inserts uniques try/catch: 3920 milliseconds / 100000 inserts conditional insert: 3860 milliseconds / 100000 inserts except: 3873 milliseconds / 100000 inserts merge: 3890 milliseconds / 100000 inserts straight insert: 3173 milliseconds / 100000 inserts duplicates (tall table) try/catch: 14436 milliseconds / 100000 inserts conditional insert: 3063 milliseconds / 100000 inserts except: 3063 milliseconds / 100000 inserts merge: 3030 milliseconds / 100000 inserts

Please note that even on unique inserts there is a bit more overhead to try / catch than conditional insertion. I wonder if this depends on the version, processor, number of cores, etc.

I did not compare conditional IF inserts, just WHERE . I assume that the IF sort will show more overhead since a) you have two statements, and b) you need to wrap the two statements in a transaction and set the isolation level to serializable (!). If someone wanted to verify this, you would need to change the temp table to a regular user table (serializable is not applicable to local temporary tables).

Here is the script:

 -- tested on SQL 2008. -- to run on SQL 2005, comment out the statements using MERGE set nocount on if object_id('tempdb..#temp') is not null drop table #temp create table #temp (col1 int primary key) go ------------------------------------------------------- -- duplicate insert test against a table w/ 1 record ------------------------------------------------------- insert #temp values (1) go declare @x int, @y int, @now datetime, @duration int select @x = 1, @y = 0, @now = getdate() while @y < 100000 begin set @y = @y+1 begin try insert #temp select @x end try begin catch end catch end set @duration = datediff(ms,@now,getdate()) raiserror('duplicates (short table), try/catch: %i milliseconds / %i inserts',-1,-1,@duration,@y) with nowait go declare @x int, @y int, @now datetime, @duration int select @x = 1, @y = 0, @now = getdate() while @y < 100000 begin set @y = @y+1 insert #temp select @x where not exists (select * from #temp where col1 = @x) end set @duration = datediff(ms,@now,getdate()) raiserror('duplicates (short table), conditional insert: %i milliseconds / %i inserts',-1,-1,@duration, @y) with nowait go declare @x int, @y int, @now datetime, @duration int select @x = 1, @y = 0, @now = getdate() while @y < 100000 begin set @y = @y+1 insert #temp select @x except select col1 from #temp end set @duration = datediff(ms,@now,getdate()) raiserror('duplicates (short table), except: %i milliseconds / %i inserts',-1,-1,@duration, @y) with nowait go -- comment this batch out for SQL 2005 declare @x int, @y int, @now datetime, @duration int select @x = 1, @y = 0, @now = getdate() while @y < 100000 begin set @y = @y+1 merge #temp t using (select @x) s (col1) on t.col1 = s.col1 when not matched by target then insert values (col1); end set @duration = datediff(ms,@now,getdate()) raiserror('duplicates (short table), merge: %i milliseconds / %i inserts',-1,-1,@duration, @y) with nowait go ------------------------------------------------------- -- unique insert test against an initially empty table ------------------------------------------------------- truncate table #temp declare @x int, @now datetime, @duration int select @x = 0, @now = getdate() while @x < 100000 begin set @x = @x+1 insert #temp select @x end set @duration = datediff(ms,@now,getdate()) raiserror('uniques, straight insert: %i milliseconds / %i inserts',-1,-1,@duration, @x) with nowait go truncate table #temp declare @x int, @now datetime, @duration int select @x = 0, @now = getdate() while @x < 100000 begin set @x = @x+1 begin try insert #temp select @x end try begin catch end catch end set @duration = datediff(ms,@now,getdate()) raiserror('uniques, try/catch: %i milliseconds / %i inserts',-1,-1,@duration, @x) with nowait go truncate table #temp declare @x int, @now datetime, @duration int select @x = 0, @now = getdate() while @x < 100000 begin set @x = @x+1 insert #temp select @x where not exists (select * from #temp where col1 = @x) end set @duration = datediff(ms,@now,getdate()) raiserror('uniques, conditional insert: %i milliseconds / %i inserts',-1,-1,@duration, @x) with nowait go truncate table #temp declare @x int, @now datetime, @duration int select @x = 0, @now = getdate() while @x < 100000 begin set @x = @x+1 insert #temp select @x except select col1 from #temp end set @duration = datediff(ms,@now,getdate()) raiserror('uniques, except: %i milliseconds / %i inserts',-1,-1,@duration, @x) with nowait go -- comment this batch out for SQL 2005 truncate table #temp declare @x int, @now datetime, @duration int select @x = 1, @now = getdate() while @x < 100000 begin set @x = @x+1 merge #temp t using (select @x) s (col1) on t.col1 = s.col1 when not matched by target then insert values (col1); end set @duration = datediff(ms,@now,getdate()) raiserror('uniques, merge: %i milliseconds / %i inserts',-1,-1,@duration, @x) with nowait go ------------------------------------------------------- -- duplicate insert test against a table w/ 100000 records ------------------------------------------------------- declare @x int, @y int, @now datetime, @duration int select @x = 1, @y = 0, @now = getdate() while @y < 100000 begin set @y = @y+1 begin try insert #temp select @x end try begin catch end catch end set @duration = datediff(ms,@now,getdate()) raiserror('duplicates (tall table), try/catch: %i milliseconds / %i inserts',-1,-1,@duration,@y) with nowait go declare @x int, @y int, @now datetime, @duration int select @x = 1, @y = 0, @now = getdate() while @y < 100000 begin set @y = @y+1 insert #temp select @x where not exists (select * from #temp where col1 = @x) end set @duration = datediff(ms,@now,getdate()) raiserror('duplicates (tall table), conditional insert: %i milliseconds / %i inserts',-1,-1,@duration, @y) with nowait go declare @x int, @y int, @now datetime, @duration int select @x = 1, @y = 0, @now = getdate() while @y < 100000 begin set @y = @y+1 insert #temp select @x except select col1 from #temp end set @duration = datediff(ms,@now,getdate()) raiserror('duplicates (tall table), except: %i milliseconds / %i inserts',-1,-1,@duration, @y) with nowait go -- comment this batch out for SQL 2005 declare @x int, @y int, @now datetime, @duration int select @x = 1, @y = 0, @now = getdate() while @y < 100000 begin set @y = @y+1 merge #temp t using (select @x) s (col1) on t.col1 = s.col1 when not matched by target then insert values (col1); end set @duration = datediff(ms,@now,getdate()) raiserror('duplicates (tall table), merge: %i milliseconds / %i inserts',-1,-1,@duration, @y) with nowait go

SQL INSERT, but avoid duplication

More articles: