How to sort lines in a text file in Perl?

Question

How to sort lines in a text file in Perl?

I have several text files ( A.txtand B.txt) that look like this (can have ~ 10,000 lines each)

processa,id1=123,id2=5321
processa,id1=432,id2=3721
processa,id1=3,id2=521
processb,id1=9822,id2=521
processa,id1=213,id2=1
processc,id1=822,id2=521

I need to check if every line in the file A.txtis present in B.txt( B.txtalso, this is normal).

The fact is that lines can be in any order in two files, so I think that I will sort them in a certain order in both files in O(nlogn), and then match each line in A.txtwith the next line in B.txtin O(n). I can implement the hash, but the files are large, and this comparison happens only once, after which these files are restored, so I do not think this is a good idea.

What is the best way to sort files in Perl? Any orders will be made, it just has to be ordering.

For example, when ordering in a dictionary, it will be

processa,id1=123,id2=5321
processa,id1=213,id2=1
processa,id1=3,id2=521
processa,id1=432,id2=3721
processb,id1=9822,id2=521
processc,id1=822,id2=521

As I mentioned earlier, any ordering will be just as fine if Perl quickly does this.

I want to do this from inside Perl code, after opening the file like this

open (FH, "<A.txt");

Any comments, ideas, etc. will be helpful.

+3

sorting scripting perl

Lazer Aug 27 '10 at 18:46

source share

6 answers

zigdon · Answer 1 · 2010-08-27T19:03:17+0000

To sort the file in a script, you still have to load the whole thing into memory. If you do this, I'm not sure what the advantage of sorting is, but just loading it into a hash?

Something like this will work:

my %seen;
open(A, "<A.txt") or die "Can't read A: $!";
while (<A>) {
    $seen{$_}=1;
}
close A;

open(B, "<B.txt") or die "Can't read B: $!";
while(<B>) {
  delete $seen{$_};
}
close B;

print "Lines found in A, missing in B:\n";
join "\n", keys %seen;

FMc · Answer 2 · 2010-08-27T22:50:26+0000

. , grep.

use strict;
use warnings;

my ($fileA, $fileB) = @ARGV;

# Load all lines: $h{LINE}{FILE_NAME} = TALLY
my %h;
$h{$_}{$ARGV} ++ while <>;

# Do whatever you need.
my @all_lines = keys %h;
my @in_both   = grep {     keys %{$h{$_}} == 2       } keys %h;
my @in_A      = grep {     exists $h{$_}{$fileA}     } keys %h;
my @only_in_A = grep { not exists $h{$_}{$fileB}     } @in_A;
my @in_A_mult = grep {            $h{$_}{$fileA} > 1 } @in_A;

recursive9 · Answer 3 · 2010-08-27T19:02:09+0000

, (600 ) Apache Perl, . 30 script, . , , .

DVK · Answer 4 · 2010-08-27T19:22:02+0000

, Perl? 3 (, , ), :

my $cmd = "sort $file1 > $file1.sorted";
$cmd .= "; sort $file2 > $file2.sorted";
$cmd .= "; comm -23 $file1.sorted $file2.sorted |wc -l";
my $count = `$cmd`;
$count =~ s/\s+//g;
if ($count != 0) {
    print "Stuff in A exists that aren't in B\n";
}

, comm , .

HerbN · Answer 5 · 2010-08-27T20:39:32+0000

, A.txt B.txt

open FILE.B, "B.txt";
open FILE.A, "A.txt";

my %bFile;

while(<FILE.B>) {
   ($process, $id1, $id2) = split /,/;
   $bFile{$process}{$id1}{$id2}++;
}

$missingRows = 0;

while(<FILE.A>) {
   $missingRows++ unless $bFile{$process}{$id1}{$id2};
   # If we've seen a given entry already don't add it
   next if $missingRows; # One miss means they aren't all verified
}

$is_Atxt_Subset_Btxt = $missingRows?FALSE:TRUE;

A, B, B, A.

cjm · Answer 6 · 2010-08-27T21:04:32+0000

, CPAN . Sort::External, File::Sort . , , .

AnyDBM_File, , . , , DBM , , , , .

How to sort lines in a text file in Perl?

More articles: