Directory sorting in perl based on numbers

Question

Directory sorting in perl based on numbers

I think I need something like Schwartzian Transform to make this work, but it's hard for me to understand this, since perl is not my strongest language.

I have a directory with content as such:

album1.htm album2.htm album3.htm .... album99.htm album100.htm

I am trying to get the album with the highest number from this directory (in this case, album100.htm). Please note that file timestamps are not a reliable means of identifying things, as people add old "missing" albums after the fact.

The previous developer simply used the code snippet below, but it clearly breaks when there are more than 9 albums in the catalog.

 opendir(DIR, PATH) || print $!; @files = readdir(DIR); foreach $file ( sort(@files) ) { if ( $file =~ /album/ ) { $last_file = $file; } }

+1

sorting perl readdir

AvatarKava Jun 2 '10 at 18:36

source share

6 answers

To find the largest number, try doing an individual sort ...

 sub sort_files { (my $num_a = $a) =~ s/^album(\d+)\.htm$/$1/; (my $num_b = $b) =~ s/^album(\d+)\.htm$/$1/; return $num_a <=> $num_b; } my @sorted = sort \&sort_files @files; my $last = pop @sorted;

Also consider the module File :: Next . This will allow you to select only files that begin with the word "album." This is a little easier for me than readdir.

+3

Robert Wohlfarth Jun 2 '10 at 19:04

source share

The reason you run into difficulties is the operator, <=> is the numerical comparison, cmp is the default value, and this is string comparison.

 $ perl -E'say for sort qw/01 1 02 200/'; 01 02 1 200

With a little change, we get something much closer to the rule:

 $ perl -E'say for sort { $a <=> $b } qw/01 1 02 200/'; 01 1 02 200

However, in your case, you need to delete unnecessary numbers.

 $ perl -E'say for sort { my $s1 = $a =~ m/(\d+)/; my $s2 = $b =~ /(\d+)/; $s1 <=> $s2 } qw/01 1 02 200/'; 01 1 02 200

Here he is prettier:

 sort { my $s1 = $a =~ m/(\d+)/; my $s2 = $b =~ /(\d+)/; $s1 <=> $s2 }

This is not perfect, but it should give you a good idea of your grade problem.

Oh, and as a continuation, Shcwartzian Transform solves another problem: it prevents you from performing a difficult task (unlike the one you need - a regular expression) for a while in the search algorithm. This is due to the need to cache results (not in order to be unexpected). Essentially, what you do is map the input of the problem to the output (usually in an array) [$input, $output] , then you sort the outputs $a->[1] <=> $b->[1] . Now that your data is sorted, you will go back to get the original inputs $_->[0] .

 map $_->[0], sort { $a->[1] <=> $b->[1] } map [ $_, fn($_) ] , qw/input list here/ ;

This is great because it is so compact, being so effective.

+2

Evan carroll Jun 2 '10 at 19:18

source share

Here you go using the Shwartzian Transform:

 my @files = <DATA>; print join '', map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [ m/album(\d+)/, $_ ] } @files; __DATA__ album12.htm album1.htm album2.htm album10.htm

+1

Pedro silva Jun 2 '10 at 19:28

source share

Here's an alternative solution using reduce :

 use strict; use warnings; use List::Util 'reduce'; my $max = reduce { my ($aval, $bval) = ($a =~ m/album(\d+)/, $b =~ m/album(\d+)/); $aval > $bval ? $a : $b } <DATA>; print "max album is $max\n"; __DATA__ album1.htm album100.htm album2.htm album3.htm album99.htm

+1

Ether Jun 2 '10 at 19:47

source share

Here's a general solution:

 my @sorted_list = map { $_->[0] } # we stored it at the head of the list, so we can pull it out sort { # first test a normalized version my $v = $a->[1] cmp $b->[1]; return $v if $v; my $lim = @$a > @$b ? @$a : @$b; # we alternate between ascii sections and numeric for ( my $i = 2; $i < $lim; $i++ ) { $v = ( $a->[$i] || '' ) cmp ( $b->[$i] || '' ); return $v if $v; $i++; $v = ( $a->[$i] || 0 ) <=> ( $b->[$i] || 0 ); return $v if $v; } return 0; } map { # split on digits and retain captures in place. my @parts = split /(\d+)/; my $nstr = join( '', map { m/\D/ ? $_ : '0' x length() } @parts ); [ $_, $nstr, @parts ]; } @directory_names ;

+1

Axeman Jun 2 '10 at 20:35

source share

Matteo iva · Accepted Answer · 2010-06-02T18:52:31+0000

If you just need to find the album with the highest number, you do not need to sort the list, just skip it and keep track of the maximum.

 #!/usr/bin/perl use strict; use warnings; my $max = 0; while ( <DATA> ) { my ($album) = $_ =~ m/album(\d+)/; $max = $album if $album > $max; } print "album$max.htm"; __DATA__ album1.htm album100.htm album2.htm album3.htm album99.htm

Directory sorting in perl based on numbers

More articles: