How to check a CSV file before importing to a database using SSIS?

I have a CSV file with three columns.

sno  sname  quantity ---  -----  --------  1  aaa   23  2  bbb   null  3  ccc   34  4  ddd   ddd  5  eee   xxx  6  fff   87 

The table in the SQL Server database is as follows:

 CREATE TABLE csvtable ( sno int , sname varchar(100) , quantity numeric(5,2) ) 

I created an SSIS package to import CSV file data into a database table. I get an error at runtime because the quantity is a string. I created another table to store invalid data.

 CREATE TABLE wrongcsvtable ( sno nvarchar(10) , sname nvarchar(100) , quantity nvarchar(100) ) 

In csvtable, I would like to save the following data.

 sno sanme quantity --- ------ -------- 1 aaa 23 3 ccc 34 6 fff 87 

In the wrong table, I would like to save the following data.

 sno sanme quantity --- ------ -------- 2 bbb null 4 ddd ddd 5 eee xxx 

Can someone point me in the right direction to achieve the above exit?

+6
source share
2 answers

Here is one of the options. You can achieve this by using the Data Conversion transformation within the Data Flow Task . The following example shows how this can be achieved. This example uses SSIS 2005 with a SQL Server 2008 database.

Step by step:

  • Create a file called FlatFile.CSV and fill it with data, as shown in screenshot # 1 .

  • In the SQL database, create two tables named dbo.CSVCorrect and dbo.CSVWrong using the scripts specified in the SQL Scripts section. Fields in the dbo.CSVWrong table must have the data types VARCHAR or NVARCHAR or CHAR so that they can accept invalid entries.

  • In the SSIS package, create an OLE DB connection named SQLServer to connect to the SQL Server database and create a connection with a flat file called CSV. See Screenshot # 2 . Configure CSV with a flat file, as shown in screenshots # 3 - # 7 . All columns connected to a flat file must be configured as a string data type so that the packet does not interrupt while reading a file.

  • On the “Control Flow” tab of the package, place the Data Flow Task , as shown in screenshot # 8 .

  • On the packet data flow tab, place Flat File Source and configure it as shown in screenshots # 9 and # 10 .

  • On the Data Flow tab of the package, place the Data Conversion transformation and configure it as shown in screenshot < 11 . Click Configure Error Output and change the Error and Truncation column values ​​from the Failure component to the redirect line . See screenshot # 12 .

  • On the Data Stream tab, place OLE DB Destination and connect the green arrow Convert Data to This OLE DB Application. Configure the OLE DB assignment as shown in screenshots # 13 and # 14 .

  • On the Data Stream tab, put another OLE DB Destination and connect the red arrow from Convert Data to This OLE DB Destination. Set the OLE DB assignment as shown in screenshots # 15 and # 16 .

  • Screenshot # 17 shows the data flow task after it is fully configured.

  • Screenshot # 18 shows the data in the tables before executing the package.

  • Screenshot # 19 shows the execution of a packet in a Data Flow Task.

  • Screenshot # 20 shows the data in the tables after the package execution.

Hope this helps.

SQL scripts:

 CREATE TABLE [dbo].[CSVCorrect]( [Id] [int] IDENTITY(1,1) NOT NULL, [SNo] [int] NULL, [SName] [varchar](50) NULL, [QuantityNumeric] [numeric](18, 0) NULL, CONSTRAINT [PK_CSVCorrect] PRIMARY KEY CLUSTERED ([Id] ASC)) ON [PRIMARY] GO CREATE TABLE [dbo].[CSVWrong]( [Id] [int] IDENTITY(1,1) NOT NULL, [SNo] [varchar](50) NULL, [Quantity] [varchar](50) NULL, [SName] [varchar](50) NULL, [ErrorCode] [int] NULL, [ErrorColumn] [int] NULL, CONSTRAINT [PK_CSVWrong] PRIMARY KEY CLUSTERED ([Id] ASC)) ON [PRIMARY] GO 

Screenshot # 1:

1

Screenshot No. 2:

2

Screenshot 3:

3

Screenshot 4:

4

Screenshot No. 5:

5

Screenshot No. 6:

6

Screenshot No. 7:

7

Screenshot # 8:

8

Screenshot No. 9:

9

Screenshot No. 10:

10

Screenshot No. 11:

eleven

Screenshot No. 12:

<T411>

Screenshot No. 13:

thirteen

Screenshot No. 14:

14

Screenshot No. 15:

fifteen

Screenshot No. 16:

16

Screenshot No. 17:

17

Screenshot No. 18:

18

Screenshot # 19:

19

Screenshot No. 20:

20

+17
source

Put the conditional split in the data stream. Check for an integer. The created branch will go to wrongcsvtable , and the default branch will be csvtable

EDIT Forgotten no numerical test in conditional split. What you have to do is add a derived column transformation that converts the quantity field to an integer. In the Configure Error Output dialog box, set the error and truncation values ​​to Ignore Failure. This will pass the element through the value for the new field as NULL if the data is not numeric. After that, in the conditional split, check if the new field is null or not. Zero-field entries go to wrongcsvtable , other entries go to csvtable .

+1
source

Source: https://habr.com/ru/post/891268/


All Articles