SSIS task for inconsistent column imports?

Question

SSIS task for inconsistent column imports?

Problem.

I regularly receive feed files from different vendors. Although the column names are consistent, the problem occurs when some providers send text files with more or less columns to the feed file.

In addition, the location of these files is not compatible.

In addition to the dynamic data flow task provided by Cozy Roc, I can import these files in another way. I am not a C # guru, but I manage torrents using the Script Task or Script Component control thread data flow task.

Any suggestion, samples or direction will be appreciated.

http://www.cozyroc.com/ssis/data-flow-task

Some forums

http://www.sqlservercentral.com/Forums/Topic525799-148-1.aspx#bm526400

http://www.bidn.com/forums/microsoft-business-intelligence/integration-services/26/dynamic-data-flow

+6

c # sql-server ssis

Kip real Nov 17 '11 at 14:15

source share

2 answers

Our decision. We use parent child packages. In the parent pacakge, we take individual client files and convert them to our standard format files, and then call the child package to handle standard import using the file we created. This only works if the client agrees that they are sending, if they try to change their format from what they agreed to send to us, we will return the file.

+4

Hlgem Nov 17 '11 at 19:49

source share

billinkc · Accepted Answer · 2011-11-17T19:34:51+0000

On top of my head, I have a 50% solution for you.

Problem

SSIS really cares about metadata, so changes to it tend to throw exceptions. In this sense, DTS was much more forgiving. This strong need for consistent metadata takes advantage of the flat file problem.

Query Based Solution

If the problem is a component, do not use it. What I like about this approach is that conceptually it looks like a table query - the order of the columns doesn't matter and the extra columns don't matter.

Variables

I created 3 variables, all lines of type: CurrentFileName, InputFolder and Query.

InputFolder is hard-wired to the source folder. In my example, this is C:\ssisdata\Kipreal
CurrentFileName is the name of the file. At design time, this was input5columns.csv , but that will change at runtime.
Query is the expression "SELECT col1, col2, col3, col4, col5 FROM " + @[User::CurrentFilename]

Connection manager

Configure the connection to the input file using the JET OLEDB driver . After creating it, as described in the related article, I renamed it to FileOLEDB and set the expression in ConnectionManager "Data Source=" + @[User::InputFolder] + ";Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=\"text;HDR=Yes;FMT=CSVDelimited;\";"

Control flow

My Control Flow looks like a data flow task nested in a Foreach file enumerator

Foreach File Browser

My Foreach File Browser is configured to work with files. I put the expression in the directory for @[User::InputFolder] Please note that at this point, if the value of this folder needs to be changed, it will be updated correctly both in the connection manager and in the file enumerator. In the "Get file name" field, instead of the standard "Fully qualified" select "Name and extension"

On the Variable Variables tab, assign the value to our variable @[User::CurrentFileName]

At this point, each iteration of the loop will change @[User::Query to reflect the current file name.

Data stream

This is actually the simplest piece. Use the OLE DB source and navigate it as directed.

Use the FileOLEDB connection manager and change the data access mode to "SQL Command from variable". Use the variable @[User::Query] there, click OK, and you are ready to go.

Sample data

I created two sample files input5columns.csv and input7columns.csv. All columns from 5 are in 7, but 7 have them in a different order (col2 - ordinal position 2 and 6). I have denied all the values in 7 so that it is clear which file is working.

 col1,col3,col2,col5,col4 1,3,2,5,4 1111,3333,2222,5555,4444 11,33,22,55,44 111,333,222,555,444

and

 col1,col3,col7,col5,col4,col6,col2 -1111,-3333,-7777,-5555,-4444,-6666,-2222 -111,-333,-777,-555,-444,-666,-222 -1,-3,-7,-5,-4,-6,-2 -11,-33,-77,-55,-44,-666,-222

Launching a package results in these two screenshots.

What is missing

I do not know how to tell the query-based approach that it is OK if the column does not exist. If there is a unique key, I suppose you could define your query to have only the columns in which it should , and then search the file to try to get the columns that should be there, and the search fails if the column does not exist . Pretty kludgey though.

SSIS task for inconsistent column imports?

Problem

Query Based Solution

Variables

Connection manager

Control flow

Foreach File Browser

Data stream

Sample data

What is missing

More articles: