Delete all occurrences of the duplicated row

Question

Delete all occurrences of the duplicated row

If I want to delete rows where some fields are duplicated, I use sort -u -kn, n. But this retains one origin. If I want to remove all occurrences of a duplicate, is there a quick bash or awk way to do this?

For example, I have:

1 apple 30 2 banana 21 3 apple 9 4 mango 2

I want to:

 2 banana 21 4 mango 2

I will nip and then use the hash in perl, but for large files this will be slow.

+4

sorting bash awk duplicates

annavt Feb 09 '11 at 17:09

source share

2 answers

This will keep your output in the same order as your input:

 awk '{seen[$2]++; a[++count]=$0; key[count]=$2} END {for (i=1;i<=count;i++) if (seen[key[i]] == 1) print a[i]}' inputfile

+3

Dennis williamson Feb 09 '11 at 17:35

source share

Jeremiah willcock · Accepted Answer · 2011-02-09T17:16:28+0000

Try sort -k <your fields> | awk '{print $3, $1, $2}' | uniq -f2 -u | awk '{print $2, $3, $1}' sort -k <your fields> | awk '{print $3, $1, $2}' | uniq -f2 -u | awk '{print $2, $3, $1}' sort -k <your fields> | awk '{print $3, $1, $2}' | uniq -f2 -u | awk '{print $2, $3, $1}' delete all lines that are duplicated (without saving any copies). If you do not need the last field, change the first awk command to cut -f 1-5 -d ' ' , change the value of -f2 in uniq to -f1 and delete the second awk command.

Delete all occurrences of the duplicated row

More articles: