Concurrent SQL BULK INSERT fails when using the ErrorFile parameter

Question

Concurrent SQL BULK INSERT fails when using the ErrorFile parameter

Windows Server 2008 R2 Enterprise, SQL Server 2008 X64, SP3, Developer Edition

I build and dynamically execute (via sp_executesql) the BULK INSERT command. General form:

BULK INSERT #HeaderRowCheck from "\\Server\Share\Develop\PKelley\StressTesting\101\DataSet.csv" with ( lastrow = 1 ,rowterminator = '\n' ,tablock ,maxerrors = 0 ,errorfile = 'C:\SQL_Packages\TempFiles\#HeaderRowCheck_257626FB-A5CD-41B8-B862-FAF8C591C7A9.log' )

(The error file name is based on the configured local folder, the loaded table and the directive generated by freshness for each start of the bulk insert — this is a routine wrapped in its own stored procedure.)

An external process (it was an SQL agent, now a WCF service) launches DTEXEC, which runs the SSIS package, which calls stored procedures in a database that cycles through the set, builds a query, and runs it for everyone. Up to four downloads from / to a given database can be started at the same time, and several databases in an SQL instance can work simultaneously, although historically, the volume was low, and, as a rule, only one instance was launched at a time. We are doing this a lot, and it has been working flawlessly for more than two years now - the security is configured correctly, files and folders are needed, everything is normal. (Good luck? I like to think wrong.)

Now we expect serious workloads, so we did some stress tests in which I run 8 runs, each of which has four processes, in which a set of four will be divided and one after another to process the files that need to be loaded (i.e. up to 32 simultaneous volume inserts. As I said, stress testing.) Low and here, when starting one or more failures during execution with an error message like:

Error #4861 encountered while loading header information from file "DataSet.csv": Cannot bulk load because the file "C:\SQL_Packages\TempFiles\#HeaderRowCheck_D0070742-76A5-4175-A1A7-16494103EF25.log" could not be opened. Operating system error code 80(The file exists.).

From start to start, an error does not occur for the same file, dataset, or point processing as a whole.

At first glance, it seems that two processes are trying to access the same error file, which will mean that they independently generate the same guid (!). My understanding is that it should be almost impossible. An alternative theory is that so much happens at the same time (possibly up to 32 simultaneous BULK INSERT commands), the SQL and / or OS are somehow confused (Im a DBA, not the network administrator). I could get around building a try-catch block to check for 4861 error and retry up to three times, but Id would rather avoid such cloning.

Since then, I threw it into a procedure that writes the name of the error file (using guid) to the table before using it. After many runs and several failures, I see that (a) the file with the + guid error is written to my table and (b) duplicate entries are not recorded.

Does anyone know what might happen?

Philip

+4

guid sql-server operating-system bulkinsert

Philip kelley Jan 20 '12 at 17:06

source share

1 answer

Philip kelley · Accepted Answer · 2012-02-03T18:25:05+0000

I opened a case with Microsoft technical support, and after a small amount back and forth, Pradeep MM (SQL Server Technical Support Technical Manager) completely coped.

The general process: read the list of files in the folder and, one after another, perform a series of voluminous inserts in these files (first, to read the first row, which we analyze for columns, and then to read data from the second + line). All massive inserts use the "ErrorFile" parameter to provide users with what information we can when their data is incorrectly formatted. The process has been working for 3 years, but in the context of recent stress tests (up to 8 simultaneous runs performed by one instance of SQL Server with all formatted files), we got the errors listed above.

We initially, although there were errors with GUID generation, due to this “already open” error, but this idea was ultimately discarded - if the newid () function did not function properly, much more people would have much more serious problems.

According to the Pradeep procedure, it is a step by step. Bulk Insert Operation Process:

BULK INSERT A command is sent and parsed for syntax errors.
Then the BULK INSERT command is compiled to generate the execution plan for the same
During the compilation phase, if in the request, if we specified ERRORFILE, then we will create ErrorFile.log and ErrorFile.Error.Txt in the specified folder (the thing is important to understand here, the file will be 0kb in size)
After the file is created, we will delete both files using the Windows API Calls
Once the execution plan is ready, we will go to the execution stage and try to run the Bulk Insert command as part of this, we will recreate the folder (according to the documentation for the electronic book) in the folder ErrorFile.log and ErrorFile.Error.Txt must be in this place, otherwise we are executing http://msdn.microsoft.com/en-us/library/ms188365.aspx
As soon as the execution is completed, if there are any errors in the Bulk insert corresponding errors are written to the error files if there are no errors, these 2 files will be deleted.

Running ProcMon (Process Monitor) during failed runs showed that the ErrorFile was successfully created and opened in step 3, but was NOT closed in step 4, as a result of which step 5 generated the error we saw. (For successful launches, the file was created and closed as expected.)

Further analysis of ProcMon showed that another process executing CMD.EXE returned operations "close the descriptor" in the file after an attempt to bulk insert. We use a procedure including xp_cmdshell to retrieve a list of files that need to be processed, and this will cause the CMD.EXE process. Heres the kicker:

... there is some business logic that runs CMD.EXE inside SQL Server, and since CMD.EXE is a child process, it inherits all the descriptors opened by the parent process (maybe this is some kind of synchronization problem when in CMD. EXE stores the descriptors of files opened at startup, and all those files whose descriptor is inherited by CMD.EXE cannot be deleted and can only be released after the destruction of CMD.EXE)

And that’s all. A single run never touches this issue, as its xp_cmdshell call completes before the release of bulk inserts. But with parallel starts, especially with many parallel starts (I only got into a problem with 5 or more transitions), there was a synchronization problem, so:

One of the SSIS packages Executes and calls a stored procedure that internally uses XP_CMDSHELL and runs CMD.EXE to list the files
The same connection to the SQL server completes the file enumeration and then starts the bulk load operation and its phase compilation for the BULK INSERT command.
In accordance with the design of Bulk Insert, we create an ErrorFile at the compilation stage, and then delete it after compilation. The phase is completed.
At the same moment, another SSIS package starts and it calls a stored procedure that internally uses XP_CMDSHELL and runs CMD.EXE to list all the files
CMD.EXE is a child process launched under the parent Process SQLServr.exe, therefore it by default inherits all the Handles created by SQLServr.exe (thus, this process receives all the handles for ERRORFILE that were created by BULK INSERT in the First Connection)
Now, in the first connection, the compilation phase is completed and therefore we are trying to delete the file, during which we must close all the handles. We see that CMD.EXE holds the handle of the file and it is still open, and therefore we cannot delete the file. So without deleting the file, we proceed to the execution phase and in the execution phase we try to create a new ERRORFILE with the same name, but since the file already exists, we fail with the error "Operating system error code 80 (File exists.).".

My short-term workaround was: (1) implement a retry loop, create a new ErrorFile name and try to add a new bulk insert up to three times before giving up, and (2) create another procedure for our nightly processes to delete all the files found in our folder "ErrorFile".

The long-term fix is to revise our code so that we don’t list files through xp_cmdshell. This seems doable since the entire ETL process is terminated and managed by the SSIS package; Alternatively, CLR routines can be created and processed. So far, given our expected workload, enough work (especially considering that everything else works just now), so it may be a little before we begin the final fix.

Published to posterity if this ever happens to you!

Concurrent SQL BULK INSERT fails when using the ErrorFile parameter

More articles: