Finding duplicates in a field and printing them in unix bash

I have a file containing

apple apple banana orange apple orange 

I want the script to detect duplicates of apple and orange and tell the user that the following: apple and orange are repeated. I tried

 nawk '!x[$1]++' FS="," filename 

to find duplicate element so how can i print them in unix bash?

+4
source share
3 answers

To print duplicate lines, you can say:

 $ sort filename | uniq -d apple orange 

If you want to print the counter, put the -c option to uniq :

 $ sort filename | uniq -dc 3 apple 2 orange 
+10
source

+1 for devnul answer . However, if the file contains spaces instead of newlines as a delimiter. then the following will work.

 tr [:blank:] "\n" < filename | sort | uniq -d 
+4
source

Update:

The question has changed significantly. Before responding to this, the input file should look like this:

 apple apple banana orange apple orange banana orange apple ... 

However, the solution will work anyway, but it can be a little complicated for this special use case.


The following awk script will complete the task:

 awk '{i=1;while(i <= NF){a[$(i++)]++}}END{for(i in a){if(a[i]>1){print i,a[i]}}}' your.file 

Output:

 apple 3 orange 2 

This is more understandable in this form:

 #!/usr/bin/awk { i=1; # iterate through every field while(i <= NF) { a[$(i++)]++; # count occurrences of every field } } # after all input lines have been read ... END { for(i in a) { # ... print those fields which occurred more than 1 time if(a[i] > 1) { print i,a[i]; } } } 

Then make an executable file and run it, passing it the name of the input file:

 chmod +x script.awk ./script.awk your.file 
+1
source

Source: https://habr.com/ru/post/1494018/


All Articles