SQL Server 2008 Big INSERT Still Slow After Transaction Completed

My query had a problem with the big INSERT problem:

INSERT table_name SELECT .....; 

There are no indexes in the table and it takes about 20 million rows to insert into it. I am running a query in SQL Server 2008 R2 on one of our servers. Initial performance is about 40 minutes. Then I read the messages telling me about the INSERT wrapper in BEGIN TRANSACTION / COMMIT . I did this and the time spent was reduced to 6 minutes.

However, when I tried to start a transactional wrapped request over the next few times, the time returned to 40 minutes, for example, the TRANSACTION effect disappeared. I do not know what happened in the following races. Any ideas?

ADD:

Another post says TRANSACTION is for data consistency, not performance, suggesting batch insertion every 5K lines. How can I break above one INSERT SELECT statement per batch? I am embarrassed.

UPDATE:

In fact, I believe that the performance improvement is not related to the transaction, but perhaps it is server-side caching, as I run several times, and the performance is 5 minutes.

+4
source share
5 answers

When you transfer many INSERT ... VALUES to a transaction, massive acceleration is probably due to the fact that you do not need to write dirty pages of data to disk after each insertion. However, when you transfer one INSERT ... SELECT to an explicit transaction, there is no acceleration, because there was an implicit transaction before, and the mechanics have not changed. Most likely, something else has changed in your environment at the same time.

A gradual drop in performance seems to be due to an increase in the target table or an increase in the database. The former will never stop growing, the latter may become a little more variable / unpredictable, as your database continues to grow, so this is probably not a fall, this is a trend.

If you can always ensure that data is entered into an empty table, think about being more radical and throwing it away every time. Use SELECT INTO instead of INSERT ... SELECT . This may or may not work with your referential integrity requirements. The advantage of different syntax is another logging strategy.

If the table cannot be deleted before the next insertion, but you can make sure that it has never been accessed by other connections during the INSERT operation, you can use isolation levels or table hints to remove the lock from your path; however, a much safer method for a similar purpose is the TABLOCK hint. This type of hint goes to the extreme opposite, blocking the entire table at the beginning; all others are excluded, and time for blocking at the row level is not wasted.

Insert data sorted using the (cluster) primary key of the target table. You can temporarily disable other indexes during INSERT , but don't do it easily, as this is just another way to seriously damage any concurrent traffic, if it exists.

Watch the size of the mdf file. Avoid situations where you see that it grows automatically in small increments.

Last resort: Plan for HW usage and split the target table. To do this, you need to switch from thinking "faster, please" to "I need to achieve this particular speed." It is much more difficult to maintain.

+1
source

There can be many things that can affect this. The type of logging that is still happening on the server is an operation that allows you to obtain an exclusive lock on the table, hardware (mainly on the IO disk), which indexes already exist in the table, etc.

Inserting 20 million records will result in a large number of records. You want to make sure that you are performing a minimally registered operation. To do this, consider SELECT INTO (if possible). But if you're stuck in an INSERT SELECT, consider facotrs that allow SELECT INTO to be a minimally registered operation. See http://msdn.microsoft.com/en-us/library/dd425070%28v=sql.100%29.aspx

+1
source

To quickly insert bulk data, use a bulk copy.

You can use the BCP utility:

Either from .Net or you can use your own Sql client:

Alternatively, make a copy of the package in the database:

 declare @todo table( primaryKeyFieldName primary key) insert @todo select primaryKeyFieldName from SourceTable declare @batch table(primaryKeyFieldName primary key) delete @batch while exists (select 1 from @todo) begin insert @batch select top 500 primaryKeyFieldName from @todo delete todo from @todo todo inner join @batch b on b.primaryKeyFieldName = todo.primaryKeyFieldName insert DestinationTable(fields....) select s.fields, .... from SourceTable s inner join @batch b on s.primaryKeyFieldName = b.primaryKeyFieldName end 
0
source

If this is a common task, you need to bear why you cannot create an SSIS package and run it whenever you need to run it.

0
source

I just checked with the prompt "WITH (TABLOCK)" and finally got a satisfactory performance. The entire request is completed within 3 minutes compared to the original 40 minutes. This is a huge improvement, and since the query does the initial work on the table, do not worry about access conflicts.

Thanks to everyone for your helpful comment.

0
source

Source: https://habr.com/ru/post/1441554/


All Articles