Delete all occurrences of the duplicated row

If I want to delete rows where some fields are duplicated, I use sort -u -kn, n. But this retains one origin. If I want to remove all occurrences of a duplicate, is there a quick bash or awk way to do this?

For example, I have:

1 apple 30 2 banana 21 3 apple 9 4 mango 2 

I want to:

 2 banana 21 4 mango 2 

I will nip and then use the hash in perl, but for large files this will be slow.

+4
source share
2 answers

Try sort -k <your fields> | awk '{print $3, $1, $2}' | uniq -f2 -u | awk '{print $2, $3, $1}' sort -k <your fields> | awk '{print $3, $1, $2}' | uniq -f2 -u | awk '{print $2, $3, $1}' sort -k <your fields> | awk '{print $3, $1, $2}' | uniq -f2 -u | awk '{print $2, $3, $1}' delete all lines that are duplicated (without saving any copies). If you do not need the last field, change the first awk command to cut -f 1-5 -d ' ' , change the value of -f2 in uniq to -f1 and delete the second awk command.

+2
source

This will keep your output in the same order as your input:

 awk '{seen[$2]++; a[++count]=$0; key[count]=$2} END {for (i=1;i<=count;i++) if (seen[key[i]] == 1) print a[i]}' inputfile 
+3
source

Source: https://habr.com/ru/post/1339065/


All Articles