SSIS task for inconsistent column imports?

Problem.

I regularly receive feed files from different vendors. Although the column names are consistent, the problem occurs when some providers send text files with more or less columns to the feed file.

In addition, the location of these files is not compatible.

In addition to the dynamic data flow task provided by Cozy Roc, I can import these files in another way. I am not a C # guru, but I manage torrents using the Script Task or Script Component control thread data flow task.

Any suggestion, samples or direction will be appreciated.

http://www.cozyroc.com/ssis/data-flow-task

Some forums

http://www.sqlservercentral.com/Forums/Topic525799-148-1.aspx#bm526400

http://www.bidn.com/forums/microsoft-business-intelligence/integration-services/26/dynamic-data-flow

+6
source share
2 answers

On top of my head, I have a 50% solution for you.

Problem

SSIS really cares about metadata, so changes to it tend to throw exceptions. In this sense, DTS was much more forgiving. This strong need for consistent metadata takes advantage of the flat file problem.

Query Based Solution

If the problem is a component, do not use it. What I like about this approach is that conceptually it looks like a table query - the order of the columns doesn't matter and the extra columns don't matter.

Variables

I created 3 variables, all lines of type: CurrentFileName, InputFolder and Query.

  • InputFolder is hard-wired to the source folder. In my example, this is C:\ssisdata\Kipreal
  • CurrentFileName is the name of the file. At design time, this was input5columns.csv , but that will change at runtime.
  • Query is the expression "SELECT col1, col2, col3, col4, col5 FROM " + @[User::CurrentFilename]

variables window

Connection manager

Configure the connection to the input file using the JET OLEDB driver . After creating it, as described in the related article, I renamed it to FileOLEDB and set the expression in ConnectionManager "Data Source=" + @[User::InputFolder] + ";Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=\"text;HDR=Yes;FMT=CSVDelimited;\";"

Control flow

My Control Flow looks like a data flow task nested in a Foreach file enumerator

control flow

Foreach File Browser

My Foreach File Browser is configured to work with files. I put the expression in the directory for @[User::InputFolder] Please note that at this point, if the value of this folder needs to be changed, it will be updated correctly both in the connection manager and in the file enumerator. In the "Get file name" field, instead of the standard "Fully qualified" select "Name and extension"

Foreach File Enumerator - Collection tab

On the Variable Variables tab, assign the value to our variable @[User::CurrentFileName]

Foreach File Enumerator - Variable Mappings tab

At this point, each iteration of the loop will change @[User::Query to reflect the current file name.

Data stream

This is actually the simplest piece. Use the OLE DB source and navigate it as directed.

Data flow

Use the FileOLEDB connection manager and change the data access mode to "SQL Command from variable". Use the variable @[User::Query] there, click OK, and you are ready to go. oledb file source

Sample data

I created two sample files input5columns.csv and input7columns.csv. All columns from 5 are in 7, but 7 have them in a different order (col2 - ordinal position 2 and 6). I have denied all the values ​​in 7 so that it is clear which file is working.

 col1,col3,col2,col5,col4 1,3,2,5,4 1111,3333,2222,5555,4444 11,33,22,55,44 111,333,222,555,444 

and

 col1,col3,col7,col5,col4,col6,col2 -1111,-3333,-7777,-5555,-4444,-6666,-2222 -111,-333,-777,-555,-444,-666,-222 -1,-3,-7,-5,-4,-6,-2 -11,-33,-77,-55,-44,-666,-222 

Launching a package results in these two screenshots.

5 column file7 column file

What is missing

I do not know how to tell the query-based approach that it is OK if the column does not exist. If there is a unique key, I suppose you could define your query to have only the columns in which it should , and then search the file to try to get the columns that should be there, and the search fails if the column does not exist . Pretty kludgey though.

+9
source

Our decision. We use parent child packages. In the parent pacakge, we take individual client files and convert them to our standard format files, and then call the child package to handle standard import using the file we created. This only works if the client agrees that they are sending, if they try to change their format from what they agreed to send to us, we will return the file.

+4
source

Source: https://habr.com/ru/post/901717/


All Articles