The fastest way to insert 1 million rows in SQL Server

I am writing a stored procedure to insert rows into a table. The problem is that in some kind of operation we might want to insert more than 1 million rows, and we want to do it quickly. Another thing is that in one of the columns is Nvarchar(MAX) . We could put 1000 characters in this avg column.

First, I wrote prc to insert line by line. Then I generate random data to insert with an Nvarchar(MAX) column as a 1000 character string. Then use a loop to call prc to insert rows. The performance is very bad, which takes 48 minutes if I use a SQL server to enter the database server for insertion. If I use C # to connect to the server on my desktop (this is what we usually want to do), it takes more than 90 minutes.

Then I changed prc to take the table type as an input parameter. I somehow prepared the rows and put them in a table type parameter and insert the following command:

 INSERT INTO tableA SELECT * from @tableTypeParameterB 

I tried batch size as 1000 rows and 3000 rows (put 1000-3000 rows in @tableTypeParameterB to be inserted once). Performance is still poor. It takes about 3 minutes to insert 1 million rows if I run it on a SQL server and take about 10 minutes if I use a C # program to connect from my desktop.

tableA has a clustered index with two columns.

My goal is to make the insert as fast as possible (the goal of my idea is within 1 minute). Is there any way to optimize it?


Just an update:

I tried the Bulk Copy option, which was suggested by some people below. I tried using SQLBULKCOPY to insert 1000 rows and 10000 rows at a time. Performance is still 10 minutes to insert 1 million rows (each row has a column with 1000 characters). There is no performance improvement. Are there any other suggestions?


Comment based update required.

The data actually comes from the user interface. The user will change the use of the user interface for bulk selection of, say, a million rows and change one column from the old value to the new value. This operation will be performed in a separate procedure. But here we need to do to make a mid-level service, to get the old value and the new value from the user interface and insert them into the table. The old value and the new value can be up to 4000 characters, and the average value is 1000 characters. I think that a long string old / new value slows down, because when I change the test value old value / new value by 20-50 characters and insert it very quickly, it doesn't matter, use SQLBulkCopy or table type variable

+6
source share
5 answers

I think you are looking for Bulk Insert if you prefer to use SQL.

Or there is an ADO.NET option for batch operations , so you save the logic in your C # application. This article is also very complete.

Update

Yes, I am afraid that bulk insert will only work with imported files (from inside the database).

I have experience in a Java project where we needed to insert millions of rows (data came from outside the btw application).

The database was Oracle, so of course we used Oracle multi-line insert. It turned out that Java batch update was much faster than Oracle multi-valued insert (so-called "bulk updates").

My suggestion:

If the data you are going to manipulate comes from outside your application (if it is not already in the database), I would say just go to the ADO.NET batch inserts. I think this is your case.

Note. Keep in mind that batch insertions usually work with the same request. That is what makes them so fast.

+6
source

Calling prc in a loop leads to multiple trips to SQL.

Not sure which batch approach you used, but you should study the table value parameters: Docs here . You still have to write.

You will also want to consider memory on your server. Batching (say 10K at a time) can be a bit slower, but it can reduce the memory pressure on your server as you buffer and process the set at a time.

Table parameters provide an easy way to marshal multiple rows of data from a client application to SQL Server without the need for multiple return trips or special server logic to process the data. You can use table parameters to encapsulate data rows in the client application and send data to the server in one parameterized command. Incoming data rows are stored in a table variable that can be used with Transact-SQL.

Another option is a volume insert . TVPs benefit from reuse, however it depends on your usage pattern. The first link has a comparison note:

Using table parameters is comparable to other ways to use set-based variables; however, often using tabular parameters can be faster for large data sets. Compared to bulk operations that have a higher initial cost than tabular parameters, tabular parameter values ​​are well suited for inserting less than 1000 rows.

Table values ​​that are reused are retrieved from the temporary caching table. This table caching provides better scalability than the BULK INSERT equivalent.

Another comparison here: bcp / BULK INSERT performance versus table parameters

+2
source

Here is an example that I used earlier with SqlBulkCopy. Let me know that I only dealt with 10,000 records, but he inserted them a few seconds after starting the query. My field names were the same, so it was pretty easy. You may need to change the names of the DataTable fields. Hope this helps.

 private void UpdateMemberRecords(Int32 memberId) { string sql = string.Format("select * from Member where mem_id > {0}", memberId); try { DataTable dt = new DataTable(); using (SqlDataAdapter da = new SqlDataAdapter(new SqlCommand(sql, _sourceDb))) { da.Fill(dt); } Console.WriteLine("Member Count: {0}", dt.Rows.Count); using (SqlBulkCopy sqlBulk = new SqlBulkCopy(ConfigurationManager.AppSettings("DestDb"), SqlBulkCopyOptions.KeepIdentity)) { sqlBulk.BulkCopyTimeout = 600; sqlBulk.DestinationTableName = "Member"; sqlBulk.WriteToServer(dt); } } catch (Exception ex) { throw; } } 
0
source

Depending on your ultimate goal, it might be a good idea to learn Entity Framework (or the like). This abstracts SQL, so you don't have to worry about it in your client application, which is what it should be.

In the end, you can get something like this:

 using (DatabaseContext db = new DatabaseContext()) { for (int i = 0; i < 1000000; i++) { db.Table.Add(new Row(){ /* column data goes here */}); } db.SaveChanges(); } 

The key part here (and it comes down to a lot of other answers) is that the Entity Framework handles the creation of the actual insert statement and its binding to the database.

In the code above, nothing will actually be sent to the database until SaveChanges is called and then everything is sent.

I can’t remember where I found it, but there is research around, which suggests that you should call SaveChanges so often. From memory, I think that every 1000 records is a good choice for fixing a database. The harmonization of each entry, compared with every 100 articles, does not give a big gain in productivity, and 10,000 overcomes the limit. Do not take my word for it, although the numbers may be wrong. You seem to be good at testing, but play with things nonetheless.

-2
source

If you have SQL2014, then the speed of OLTP memory is awesome; http://msdn.microsoft.com/en-au/library/dn133186.aspx

-2
source

Source: https://habr.com/ru/post/972624/


All Articles