Finding duplicates in a field and printing them in unix bash

Question

Finding duplicates in a field and printing them in unix bash

I have a file containing

apple apple banana orange apple orange

I want the script to detect duplicates of apple and orange and tell the user that the following: apple and orange are repeated. I tried

 nawk '!x[$1]++' FS="," filename

to find duplicate element so how can i print them in unix bash?

+4

unix bash awk

t28292 Jul 29 '13 at 6:41

source share

3 answers

devnull · Answer 1 · 2013-07-29T06:52:15+0000

To print duplicate lines, you can say:

 $ sort filename | uniq -d apple orange

If you want to print the counter, put the -c option to uniq :

 $ sort filename | uniq -dc 3 apple 2 orange

Varun · Answer 2 · 2013-07-29T07:00:19+0000

+1 for devnul answer . However, if the file contains spaces instead of newlines as a delimiter. then the following will work.

 tr [:blank:] "\n" < filename | sort | uniq -d

hek2mgl · Answer 3 · 2013-07-29T06:50:57+0000

Update:

The question has changed significantly. Before responding to this, the input file should look like this:

 apple apple banana orange apple orange banana orange apple ...

However, the solution will work anyway, but it can be a little complicated for this special use case.

The following awk script will complete the task:

 awk '{i=1;while(i <= NF){a[$(i++)]++}}END{for(i in a){if(a[i]>1){print i,a[i]}}}' your.file

Output:

 apple 3 orange 2

This is more understandable in this form:

 #!/usr/bin/awk { i=1; # iterate through every field while(i <= NF) { a[$(i++)]++; # count occurrences of every field } } # after all input lines have been read ... END { for(i in a) { # ... print those fields which occurred more than 1 time if(a[i] > 1) { print i,a[i]; } } }

Then make an executable file and run it, passing it the name of the input file:

 chmod +x script.awk ./script.awk your.file

Finding duplicates in a field and printing them in unix bash

More articles: