SSIS data stream How to remove duplicate rows, but write duplicates in SSIS

I learned from Deleting Duplicates in an SSIS Data Stream how to use the sort transformation to delete rows with duplicate data values.

In my case, I am reading a delimited file, I need to exclude duplicates and write lines in which there are duplicate keys. I need to output these lines to another delimited file and send it back to the client so that they can correct the data and try again.

I can’t figure out how to do this. I will experiment with Aggregate and Merge Join, but I hope that there will be a famous template for this.

+4
source share
3 answers

Hi, my answer will work with any data, because some solutions on the Internet need a primary key of strings, no solution is required for my primary key . Here is the sample structure and sample data:

ab 1 23 1 23 16 59 12 12 13 45 12 12 45 56 

enter image description here

Just group by all columns and add the last column - count everything (if there are more than two or more columns, you just need to put all the columns in the “Unit” element and set the set group and put “Count All” at the end):

enter image description here

Then just add a conditional separator and take all the lines where there is more than one line:

enter image description here

Real example:

enter image description here

+6
source

I review a few options on my blog to remove duplicates from the data stream, a small footnote here on how to “save” duplicate lines for alternative processing.

+2
source

Perhaps this is possible with scripts.

First, you use a script to iterate over a dataset and programmatically identify duplicates. You can then write entries to the log file for the covers you found .

0
source

Source: https://habr.com/ru/post/1432788/


All Articles