Perl - find duplicate lines in a file or array

I am trying to print duplicate lines from a file descriptor, rather than deleting them or something else that I see on other issues. I don't have enough experience with perl to be able to do this quickly, so I ask here. How to do it?

+6
source share
4 answers

Using standard Perl abbreviations:

my %seen; while ( <> ) { print if $seen{$_}++; } 

As a "single line":

 perl -ne 'print if $seen{$_}++' 

Additional data? This prints <file name>:<line number>:<line> :

 perl -ne 'print ( $ARGV eq "-" ? "" : "$ARGV:" ), "$.:$_" if $seen{$_}++' 

Explanation %seen :

  • %seen declares a hash. For each unique line in the input (which in this case comes from while(<>) ), $seen{$_} will have a scalar slot in the hash called the text of the line (this is what $_ does in the {} brackets).
  • Using the postfix increment ( x++ ) operator, we take the value for our expression, remembering to increment it after the expression. So, if we didn’t “see,” the string $seen{$_} is undefined, but when forced into a numeric “context” like this, it is accepted as 0 - and false.
  • Then it will increase to 1.

So, when while starts, all the lines are “zero” (if this helps you think of the lines as “not %seen ”), then the first time we see the line, perl takes the value undefined, which fails if - and increments the counter in the scalar slot to 1. Thus, it is equal to 1 for any future cases in which it passes the if condition and is printed.

Now, as I said above, %seen declares a hash, but when strict turned off, any variable expression can be created in place. So, the first time perl sees $seen{$_} , he knows that I'm looking for %seen , he doesn’t have it, so he creates it.

The added neat thing about this is that in the end, if you want to use it, you have an account of how many times each line was repeated.

+22
source

try it

 #!/usr/bin/perl -w use strict; use warnings; my %duplicates; while (<DATA>) { print if !defined $duplicates{$_}; $duplicates{$_}++; } 
+3
source

Prints only once:

 perl -ne "print if $seen{$_}++ == 1" 
+2
source

If you have a Unix-like system, you can use uniq :

 uniq -d foo 

or

 uniq -d foo 

should do what you want. Additional information: man uniq .

0
source

Source: https://habr.com/ru/post/887280/


All Articles