SSIS data stream How to remove duplicate rows, but write duplicates in SSIS

Question

SSIS data stream How to remove duplicate rows, but write duplicates in SSIS

I learned from Deleting Duplicates in an SSIS Data Stream how to use the sort transformation to delete rows with duplicate data values.

In my case, I am reading a delimited file, I need to exclude duplicates and write lines in which there are duplicate keys. I need to output these lines to another delimited file and send it back to the client so that they can correct the data and try again.

I can’t figure out how to do this. I will experiment with Aggregate and Merge Join, but I hope that there will be a famous template for this.

+4

duplicate-removal duplicates ssis

John saunders Sep 06 '12 at 18:30

source share

3 answers

I review a few options on my blog to remove duplicates from the data stream, a small footnote here on how to “save” duplicate lines for alternative processing.

+2

Todd McDermid Sep 08 '12 at 16:10

source share

Perhaps this is possible with scripts.

First, you use a script to iterate over a dataset and programmatically identify duplicates. You can then write entries to the log file for the covers you found .

0

dev_etter Sep 06 '12 at 20:55

source share

Justin · Accepted Answer · 2012-09-06T21:05:50+0000

Hi, my answer will work with any data, because some solutions on the Internet need a primary key of strings, no solution is required for my primary key . Here is the sample structure and sample data:

ab 1 23 1 23 16 59 12 12 13 45 12 12 45 56

enter image description here

Just group by all columns and add the last column - count everything (if there are more than two or more columns, you just need to put all the columns in the “Unit” element and set the set group and put “Count All” at the end):

enter image description here

Then just add a conditional separator and take all the lines where there is more than one line:

enter image description here

Real example:

SSIS data stream How to remove duplicate rows, but write duplicates in SSIS

More articles: