First, some background, I am new to SSIS, and Ive just completed my second data import project.
The package is very simple and consists of a data stream that imports a ~ 30,000 records delimited client value file into the ADO record set variable, which in turn is used to power the ForEach Loop Container pipeline, which executes the part of the SQL passing into the values from each row of the recordset.
Importing the first ~ 21,000 records took 59 hours before it failed! The last ~ 9,000 took another 8 hours. Yes, only 67 hours!
SQL consists of checking to determine if a record exists, calling the procedure for generating a new password, and finally calling another procedure to insert client data into our system. The final procedure returns a set of records, but Im distorted as a result, and so I just ignored it. I do not know if SSIS drops the recordset or not. I know that this is the slowest way to get data on the system, but I did not expect it to be slow and did not fail two-thirds of the way, and again, processing the last ~ 9,000.
When I tested a subset of ~ 3000 records on my local computer, the Execute Package Utility reported that each insert took about 1 second. A little quick math and speculation was that the total import would take about 8 hours. It seemed that for a long time, which I was expecting, gave everything I read about doing SSIS and RBAR. I decided that the final import would be a little faster, as the server would be significantly more powerful. Although I am accessing the server remotely, I would not expect this to be a problem since I have been importing in the past using special C # console applications that use simple ADO connections and do not start anything almost as slowly as slowly .
Initially, the destination table was not optimized to check for availability, and I thought that might be causing slow performance. I added an appropriate index to the table to change the test from crawl to search, expecting it to get rid of a performance problem. Oddly enough, this did not seem to have a visible effect!
We use sproc to insert data into our system for consistency. It represents the same route as the data if it is inserted into our system through our web interface. Inserting data also triggers the launch of several triggers and updates of various other objects in the database.
However, during this import, and I scratched my head, the runtime for the SQL package, which was reported as a result of the output of the Execute Package Utility, increased logarithmically during the run. What begins as an interim run time for one second period ends during the import by more than 20 seconds, and ultimately the import package is simply completely stopped.
I searched the Internet several times, thanks to Google, as well as StackOverflow, and havent found anything that describes these symptoms.
Hope someone has some tips.
thanks
In response to ErikE: (I could not put this in a comment, so I added it here.)
Eric according to your request, I ran the profiler over the database when I started three thousand test element files through my steps.
I could not easily figure out how to get SSIS to insert the visible difference into the code that the profiler will see, so I just ran the profiler for the whole run. I know that some overhead is associated with this, but, theoretically, they should be more or less consistent during the run.
The duration for each position remains fairly constant throughout the run.
The output from the trace is cut below. In the run that I did here, the first 800 overlapped the previously entered data, so the system actually didn't work (Yay indices!). As soon as the index ceases to be useful, and the system actually inserts new data, you can see how the time jump accordingly, but they do not seem to change much, if at all, between the first and last elements, with the number of readings being the largest subject.
------------------------------------------
| Item | CPU | Reads | Writes | Duration |
------------------------------------------
| 0001 | 0 | 29 | 0 | 0 |
| 0002 | 0 | 32 | 0 | 0 |
| 0003 | 0 | 27 | 0 | 0 |
| ... |
| 0799 | 0 | 32 | 0 | 0 |
| 0800 | 78 | 4073 | 40 | 124 |
| 0801 | 32 | 2122 | 4 | 54 |
| 0802 | 46 | 2128 | 8 | 174 |
| 0803 | 46 | 2128 | 8 | 174 |
| 0804 | 47 | 2131 | 15 | 242 |
| ... |
| 1400 | 16 | 2156 | 1 | 54 |
| 1401 | 16 | 2167 | 3 | 72 |
| 1402 | 16 | 2153 | 4 | 84 |
| ... |
| 2997 | 31 | 2193 | 2 | 72 |
| 2998 | 31 | 2195 | 2 | 48 |
| 2999 | 31 | 2184 | 2 | 35 |
| 3000 | 31 | 2180 | 2 | 53 |
------------------------------------------
Overnight, I also set up the system through a full restart of the import with the profiler turned on to see how everyone is afraid. He managed to get 1 third of the import in 15.5 hours on my local machine. I exported the trace data to an SQL table to get some statistics. Considering the data in the trace, the delta between the inserts increases by ~ 1 second per thousand processed records, so by the time the record is 10,000, it takes 10 seconds to write to perform the insert. The actual code executed for each entry is shown below. Donβt bother criticizing the procedure; SQL was written by a self-taught developer who was originally our registrar long before a company with an actual education of developers was occupied by the company. We well know that this is not good. The main thing is that I believe that it should be executed at a constant speed, and this clearly does not.
if not exists ( select 1 from [dbo].[tblSubscriber] where strSubscriberEmail = @EmailAddress and ProductId = @ProductId and strTrialSource = @Source ) begin declare @ThePassword varchar(20) select @ThePassword = [dbo].[DefaultPassword]() exec [dbo].[MemberLookupTransitionCDS5] @ProductId ,@EmailAddress ,@ThePassword ,NULL --IP Address ,NULL --BrowserName ,NULL --BrowserVersion ,2 --blnUpdate ,@FirstName --strFirstName ,@Surname --strLastName ,@Source --strTrialSource ,@Comments --strTrialComments ,@Phone --strSubscriberPhone ,@TrialType --intTrialType ,NULL --Redundant MonitorGroupID ,NULL --strTrialFirstPage ,NULL --strTrialRefererUrl ,30 --intTrialSubscriptionDaysLength ,0 --SourceCategoryId end GO
Results of determining the time difference between each performance (trimmed for brevity).
----------------------
| Row | Delta (ms) |
----------------------
| 500 | 510 |
| 1000 | 976 |
| 1500 | 1436 |
| 2000 | 1916 |
| 2500 | 2336 |
| 3000 | 2816 |
| 3500 | 3263 |
| 4000 | 3726 |
| 4500 | 4163 |
| 5000 | 4633 |
| 5500 | 5223 |
| 6000 | 5563 |
| 6500 | 6053 |
| 7000 | 6510 |
| 7500 | 6926 |
| 8000 | 7393 |
| 8500 | 7846 |
| 9000 | 8503 |
| 9500 | 8820 |
| 10000 | 9296 |
| 10500 | 9750 |
----------------------