Logarithmically increasing runtime for each ForEach control loop

First, some background, I am new to SSIS, and Ive just completed my second data import project.

The package is very simple and consists of a data stream that imports a ~ 30,000 records delimited client value file into the ADO record set variable, which in turn is used to power the ForEach Loop Container pipeline, which executes the part of the SQL passing into the values from each row of the recordset.

Importing the first ~ 21,000 records took 59 hours before it failed! The last ~ 9,000 took another 8 hours. Yes, only 67 hours!

SQL consists of checking to determine if a record exists, calling the procedure for generating a new password, and finally calling another procedure to insert client data into our system. The final procedure returns a set of records, but Im distorted as a result, and so I just ignored it. I do not know if SSIS drops the recordset or not. I know that this is the slowest way to get data on the system, but I did not expect it to be slow and did not fail two-thirds of the way, and again, processing the last ~ 9,000.

When I tested a subset of ~ 3000 records on my local computer, the Execute Package Utility reported that each insert took about 1 second. A little quick math and speculation was that the total import would take about 8 hours. It seemed that for a long time, which I was expecting, gave everything I read about doing SSIS and RBAR. I decided that the final import would be a little faster, as the server would be significantly more powerful. Although I am accessing the server remotely, I would not expect this to be a problem since I have been importing in the past using special C # console applications that use simple ADO connections and do not start anything almost as slowly as slowly .

Initially, the destination table was not optimized to check for availability, and I thought that might be causing slow performance. I added an appropriate index to the table to change the test from crawl to search, expecting it to get rid of a performance problem. Oddly enough, this did not seem to have a visible effect!

We use sproc to insert data into our system for consistency. It represents the same route as the data if it is inserted into our system through our web interface. Inserting data also triggers the launch of several triggers and updates of various other objects in the database.

However, during this import, and I scratched my head, the runtime for the SQL package, which was reported as a result of the output of the Execute Package Utility, increased logarithmically during the run. What begins as an interim run time for one second period ends during the import by more than 20 seconds, and ultimately the import package is simply completely stopped.

I searched the Internet several times, thanks to Google, as well as StackOverflow, and havent found anything that describes these symptoms.

Hope someone has some tips.

thanks

In response to ErikE: (I could not put this in a comment, so I added it here.)

Eric according to your request, I ran the profiler over the database when I started three thousand test element files through my steps.

I could not easily figure out how to get SSIS to insert the visible difference into the code that the profiler will see, so I just ran the profiler for the whole run. I know that some overhead is associated with this, but, theoretically, they should be more or less consistent during the run.

The duration for each position remains fairly constant throughout the run.

The output from the trace is cut below. In the run that I did here, the first 800 overlapped the previously entered data, so the system actually didn't work (Yay indices!). As soon as the index ceases to be useful, and the system actually inserts new data, you can see how the time jump accordingly, but they do not seem to change much, if at all, between the first and last elements, with the number of readings being the largest subject.

  ------------------------------------------
 |  Item |  CPU |  Reads |  Writes |  Duration |
 ------------------------------------------
 |  0001 |  0 |  29 |  0 |  0 |
 |  0002 |  0 |  32 |  0 |  0 |
 |  0003 |  0 |  27 |  0 |  0 |
 | ... |
 |  0799 |  0 |  32 |  0 |  0 |
 |  0800 |  78 |  4073 |  40 |  124 |
 |  0801 |  32 |  2122 |  4 |  54 |
 |  0802 |  46 |  2128 |  8 |  174 |
 |  0803 |  46 |  2128 |  8 |  174 |
 |  0804 |  47 |  2131 |  15 |  242 |
 | ... |
 |  1400 |  16 |  2156 |  1 |  54 |
 |  1401 |  16 |  2167 |  3 |  72 |
 |  1402 |  16 |  2153 |  4 |  84 |
 | ... |
 |  2997 |  31 |  2193 |  2 |  72 |
 |  2998 |  31 |  2195 |  2 |  48 |
 |  2999 |  31 |  2184 |  2 |  35 |
 |  3000 |  31 |  2180 |  2 |  53 |
 ------------------------------------------

Overnight, I also set up the system through a full restart of the import with the profiler turned on to see how everyone is afraid. He managed to get 1 third of the import in 15.5 hours on my local machine. I exported the trace data to an SQL table to get some statistics. Considering the data in the trace, the delta between the inserts increases by ~ 1 second per thousand processed records, so by the time the record is 10,000, it takes 10 seconds to write to perform the insert. The actual code executed for each entry is shown below. Don’t bother criticizing the procedure; SQL was written by a self-taught developer who was originally our registrar long before a company with an actual education of developers was occupied by the company. We well know that this is not good. The main thing is that I believe that it should be executed at a constant speed, and this clearly does not.

if not exists ( select 1 from [dbo].[tblSubscriber] where strSubscriberEmail = @EmailAddress and ProductId = @ProductId and strTrialSource = @Source ) begin declare @ThePassword varchar(20) select @ThePassword = [dbo].[DefaultPassword]() exec [dbo].[MemberLookupTransitionCDS5] @ProductId ,@EmailAddress ,@ThePassword ,NULL --IP Address ,NULL --BrowserName ,NULL --BrowserVersion ,2 --blnUpdate ,@FirstName --strFirstName ,@Surname --strLastName ,@Source --strTrialSource ,@Comments --strTrialComments ,@Phone --strSubscriberPhone ,@TrialType --intTrialType ,NULL --Redundant MonitorGroupID ,NULL --strTrialFirstPage ,NULL --strTrialRefererUrl ,30 --intTrialSubscriptionDaysLength ,0 --SourceCategoryId end GO 

Results of determining the time difference between each performance (trimmed for brevity).

  ----------------------
 |  Row |  Delta (ms) |
 ----------------------
 |  500 |  510 |
 |  1000 |  976 |
 |  1500 |  1436 |
 |  2000 |  1916 |
 |  2500 |  2336 |
 |  3000 |  2816 |
 |  3500 |  3263 |
 |  4000 |  3726 |
 |  4500 |  4163 |
 |  5000 |  4633 |
 |  5500 |  5223 |
 |  6000 |  5563 |
 |  6500 |  6053 |
 |  7000 |  6510 |
 |  7500 |  6926 |
 |  8000 |  7393 |
 |  8500 |  7846 |
 |  9000 |  8503 |
 |  9500 |  8820 |
 |  10000 |  9296 |
 |  10500 |  9750 |
 ----------------------
+6
source share
1 answer

Take a few steps:

  • Tip : isolate if this is a server or client problem. Run the trace and see how long the first insert takes compared to the 3000th. Include some difference in the SQL statements at the 1st and 3000th iteration, which you can filter in the trace so as not to capture other events. Try to avoid statement completion - use batch or RPC termination.

    The answer . The recorded CPU, reading and duration from the profiler trace do not increase, but the actual elapsed / effective insert time.

  • Tip . Assuming the above pattern is true after the 10,000th insert (please let me know if they are different), I think some locks occur, maybe something like a constraint check that performs a nested loop join that will scale logarithmically with the number of rows in the table just as you see. Could you do the following:

    If none of these problems complicates the task, simply update your question with any new information and comments here, and I will continue to do my best to help.

+2
source

Source: https://habr.com/ru/post/920938/


All Articles