Import SSIS data with resume

I need to push a large SQL table from my local instance in Azure SQL. Transmission - a simple, clean download - just paste the data into a new empty table.

The table is extremely large (~ 100 million rows) and consists only of GUIDs and other simple types (no timestamp or anything else).

I am creating an SSIS package using the Import / Export Data Wizard in SSMS. The package works great.

The problem is that the packet starts over a slow or intermittent connection. If the Internet connection is halfway, then there is no way to β€œresume” the transmission.

What is the best approach to developing an SSIS package to download this data in renewable mode? that is, in the event of a connection failure, or allow the task to be performed only between certain time windows.

+4
source share
2 answers

As a rule, in such a situation, I would develop a package for listing through batches of size N (lines 1k, lines 10M, etc.) and a log for the processing table, what will be the last successful transfer of the package. However, with GUIDs, you cannot split them into buckets.

In this particular case, I would modify the data stream so that it looks like Source β†’ Lookup β†’ Destination. In the search transformation, query the Azure side and get only the keys (SELECT myGuid FROM myTable). Here we are only interested in strings that do not have a match in the set of search records, since those that are expected are transmitted.

A full cache will cost about 1.5 GB (100 MB * 16 bytes) of memory, assuming that the azure side is full, plus the associated data transfer costs. This cost will be less than truncating and retransmitting all the data, but I just want to make sure that I called it.

+4
source

Just order your GUID at boot. And make sure you use max (guid) from Azure as a starting point when recovering from a crash or restart.

+3
source

Source: https://habr.com/ru/post/1390287/


All Articles