How to quickly reduce the amount of data in accordance with existing records in the database

Question

How to quickly reduce the amount of data in accordance with existing records in the database

I'm currently trying to merge data into a database. It works great and fast. The only problem arises if there are any rows that are already in the database (duplicate key).

To meet this, I changed my program to check every new record first if it already exists in the database or not. Which ....... is slow (in current cases, I have few records, but later on it has more than 200 thousand records, which I need to check and that several times). Thus, I need to do it faster as it is now (if possible).

The data type is structured as follows:

DataTable transactionTable.Columns.Add("DeviceId", typeof(Int32)); transactionTable.Columns.Add("LogDate", typeof(DateTime)); transactionTable.Columns.Add("LogType", typeof(Int32)); transactionTable.Columns.Add("LogText", typeof(String)); transactionTable.PrimaryKey = new DataColumn[3] { transactionTable.Columns[0], transactionTable.Columns[1], transactionTable.Columns[2] };

So far, I have been as follows:

 DataTable insertTable = transactionTable.Copy(); insertTable.Clear(); using (SqlConnection sqlcon = new SqlConnection(this.GetConnString())) { sqlcon.Open(); foreach (var entry in transactionTable.AsEnumerable()) { using (SqlCommand sqlCom = sqlCon.CreateCommand()) { sqlCom.Parameters.Clear(); sqlCom.CommandText = "SELECT 1 FROM myTable WHERE" + " DeviceId = @DeviceId AND LogDate = @LogDate" + " AND LogType = @LogType" sqlCom.Parameters.AddWithValue("@DeviceId", entry.Field<Int32>("DeviceId")); sqlCom.Parameters.AddWithValue("@LogDate", entry.Field<DateTime>("LogDate")); sqlCom.Parameters.AddWithValue("@LogType", entry.Field<Int32>("LogType")); using (SqlDataREader myRead = sqlCon.ExecuteReader() { myRead.Read(); if (myRead.HasRows == false) { insertTable.Rows.Add(entry.ItemArray); } } } } } // And afterwards the bulkinsert which I think is out of scope for the question itself // (I use the insertTable there)

Now my question is: is there a way to do this faster so as not to get a key violation problem?

+5

c # sql sql-server sql-server-2012 datatable

Thomas Oct 20 '15 at 8:06

source share

3 answers

I have a similar setup.

I am using a stored procedure with the Table-Valued parameter and MERGE . See Also Tabular Parameters , for example, how to use them in .NET.

I would switch the focus of the problem from simple bulk insertion to combining a batch of rows into a table with existing data.

Destination table

 CREATE TABLE [dbo].[MyTable]( [DeviceId] [int] NOT NULL, [LogDate] [datetime] NOT NULL, [LogType] [int] NOT NULL, [LogText] [nvarchar](50) NOT NULL, CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED ( [DeviceId] ASC, [LogDate] ASC, [LogType] ASC ))

Create a custom table type

 CREATE TYPE [dbo].[MyTableType] AS TABLE( [DeviceId] [int] NOT NULL, [LogDate] [datetime] NOT NULL, [LogType] [int] NOT NULL, [LogText] [nvarchar](50) NOT NULL, PRIMARY KEY CLUSTERED ( [DeviceId] ASC, [LogDate] ASC, [LogType] ASC ))

Check and measure whether specifying PRIMARY KEY for TYPE make the overall process faster or slower.

Saved Procedure with TVP

 CREATE PROCEDURE [dbo].[MergeMyTable] @ParamRows dbo.MyTableType READONLY AS BEGIN -- SET NOCOUNT ON added to prevent extra result sets from -- interfering with SELECT statements. SET NOCOUNT ON; BEGIN TRANSACTION; BEGIN TRY MERGE INTO dbo.MyTable as Dest USING ( SELECT TT.[DeviceId], TT.[LogDate], TT.[LogType], TT.[LogText] FROM @ParamRows AS TT ) AS Src ON (Dest.[DeviceId] = Src.[DeviceId]) AND (Dest.[LogDate] = Src.[LogDate]) AND (Dest.[LogType] = Src.[LogType]) WHEN MATCHED THEN UPDATE SET Dest.[LogText] = Src.[LogText] WHEN NOT MATCHED BY TARGET THEN INSERT ([DeviceId] ,[LogDate] ,[LogType] ,[LogText]) VALUES (Src.[DeviceId], Src.[LogDate], Src.[LogType], Src.[LogText]); COMMIT TRANSACTION; END TRY BEGIN CATCH ROLLBACK TRANSACTION; END CATCH; END

Call this stored procedure by passing it a batch of lines to merge. Test and measure how performance changes with batch size. Try lots with 1K, 10K, 100K lines.

If you never want to update existing rows with new values, delete the WHEN MATCHED THEN MERGE , it will work faster.

+2

Vladimir Baranov Oct 23 '15 at 6:18

source share

You can reset and recreate your index if IGNORE_DUP_KEY set to ON. Something like that:

 ALTER TABLE datatable ADD CONSTRAINT PK_datatable PRIMARY KEY CLUSTERED (DeviceId,LogDate,LogType,LogText) WITH (IGNORE_DUP_KEY = ON)

What this option does is report a recurring key error with a different severity and message when trying to duplicate an insert for an index. It will not allow duplicates to be entered, but it will continue to insert all records that are not duplicated, and give only a warning message if duplicates were found and ignored.

Additional information at this link: Creation of unique indexes .

+2

Brian pressler Oct 23 '15 at 18:09

source share

Giorgi nakeuri · Accepted Answer · 2015-10-22T09:26:22+0000

In this case, I would use some kind of staging table. Here are a few steps:

Bulk insert into staging table (using SqlBulkCopy )
Insert into the base table using a saved process with left join to eliminate existing rows
Pivot table truncation table

Thus, you will need to remove the foreach statement in your code, add the stored proc to be inserted into the base table, add the stored proc for truncation. Or you can combine the last 2 steps in one.

How to quickly reduce the amount of data in accordance with existing records in the database

More articles: