Using LINQ2SQL to insert a large number of records

We have a small C # tool that we put together to analyze a data file, create some objects and paste them into the database.

Logic is essentially.

string [] lines = File.ReadAllLines("C:\\Temp\\Data.dat") foreach(string line in lines) { MyDataObject obj = ParseObject(line); myDataContext.MyDataObjects.InsertOnSubmit(obj); } myDataContext.SubmitChanges(); 

This was good from the very beginning, since the data file was only ~ 1000 lines per day, but recently this file has grown to ~ 30 000 lines, and the process has become very slow.

All calls to SubmitChanges() fine, but as soon as it starts the process of dumping 30,000 inserts into the database, it just stops. As a test, I fixed 30,000 insert statements and ran them directly from QA. It took about 8 minutes.

After 8 minutes, the C # / Linq version completed only about 25% of the inserts.

Does anyone have any suggestions on how I can optimize this?

+4
source share
5 answers

If you write a large amount of homogeneous data, SqlBulkCopy may be a more suitable tool, for example, perhaps with CsvReader for reading strings (since SqlBulkCopy can accept IDataReader , which means that you do not need to buffer all 30k lines to memory).

If the data is CSV, it could just be:

 using (CsvReader reader = new CsvReader(path)) using (SqlBulkCopy bcp = new SqlBulkCopy(CONNECTION_STRING)) { bcp.DestinationTableName = "SomeTable"; bcp.WriteToServer(reader); } 

If the data is more complex (not CSV), then SimpleDataReader may be useful - you just subclass it and add code to represent your data in a row.

+6
source

I had the same question a while ago. I inserted 1,000,000 new entries in db and I found that tat calls SubmitChanges, each of the 500 was the fastest way.

I can’t assure that 500 lines at the time are the fastest, our environment is rather strange ...

+1
source

You might want to try a multi-threaded approach.

  • Divide the record into smaller sizes (1000 each?), Place them on the stack
  • Have a class that grabs the recordset from the top of the stack and starts inserting it using a multi-threaded class that opens the DataContext and inserts itself.
  • While it inserts, a second class opens for the next set of records
  • Internal logic determines how many inserts can be started at once (5? 10?)

This can lead to faster insertions than just executing SubmitChanges () every few records, since multiple inserts can be performed at the same time.

+1
source

This is a database task and must be run through SSIS and use bulk insertion.

I can insert 30,000 records in seconds or milliseconds (depending on the number of columns and how complicated the data matching is). I have an import with more than a million records that are inserted in less time than you spend a cycle of records one at a time. I even have one 20 millionth recording file, which takes only 16 minutes.

+1
source

An old question, but after searching for my own solution, I came across this code project article , which is excellent. It mainly uses Linq2Sql attributes to create a DataTable, and then uses SQLBulkCopy to insert, which was much faster than the basic Linq2Sql implementation. The code in the article may use a little cleaning and may fall into more complex scripts with foreign keys (I didn’t have it in my model, although it was in the database), but it is ideal for my needs.

0
source

Source: https://habr.com/ru/post/1286004/


All Articles