Directory sorting in perl based on numbers

I think I need something like Schwartzian Transform to make this work, but it's hard for me to understand this, since perl is not my strongest language.

I have a directory with content as such:

album1.htm album2.htm album3.htm .... album99.htm album100.htm 

I am trying to get the album with the highest number from this directory (in this case, album100.htm). Please note that file timestamps are not a reliable means of identifying things, as people add old "missing" albums after the fact.

The previous developer simply used the code snippet below, but it clearly breaks when there are more than 9 albums in the catalog.

 opendir(DIR, PATH) || print $!; @files = readdir(DIR); foreach $file ( sort(@files) ) { if ( $file =~ /album/ ) { $last_file = $file; } } 
+1
source share
6 answers

If you just need to find the album with the highest number, you do not need to sort the list, just skip it and keep track of the maximum.

 #!/usr/bin/perl use strict; use warnings; my $max = 0; while ( <DATA> ) { my ($album) = $_ =~ m/album(\d+)/; $max = $album if $album > $max; } print "album$max.htm"; __DATA__ album1.htm album100.htm album2.htm album3.htm album99.htm 
+7
source

To find the largest number, try doing an individual sort ...

 sub sort_files { (my $num_a = $a) =~ s/^album(\d+)\.htm$/$1/; (my $num_b = $b) =~ s/^album(\d+)\.htm$/$1/; return $num_a <=> $num_b; } my @sorted = sort \&sort_files @files; my $last = pop @sorted; 

Also consider the module File :: Next . This will allow you to select only files that begin with the word "album." This is a little easier for me than readdir.

+3
source

The reason you run into difficulties is the operator, <=> is the numerical comparison, cmp is the default value, and this is string comparison.

 $ perl -E'say for sort qw/01 1 02 200/'; 01 02 1 200 

With a little change, we get something much closer to the rule:

 $ perl -E'say for sort { $a <=> $b } qw/01 1 02 200/'; 01 1 02 200 

However, in your case, you need to delete unnecessary numbers.

 $ perl -E'say for sort { my $s1 = $a =~ m/(\d+)/; my $s2 = $b =~ /(\d+)/; $s1 <=> $s2 } qw/01 1 02 200/'; 01 1 02 200 

Here he is prettier:

 sort { my $s1 = $a =~ m/(\d+)/; my $s2 = $b =~ /(\d+)/; $s1 <=> $s2 } 

This is not perfect, but it should give you a good idea of ​​your grade problem.

Oh, and as a continuation, Shcwartzian Transform solves another problem: it prevents you from performing a difficult task (unlike the one you need - a regular expression) for a while in the search algorithm. This is due to the need to cache results (not in order to be unexpected). Essentially, what you do is map the input of the problem to the output (usually in an array) [$input, $output] , then you sort the outputs $a->[1] <=> $b->[1] . Now that your data is sorted, you will go back to get the original inputs $_->[0] .

 map $_->[0], sort { $a->[1] <=> $b->[1] } map [ $_, fn($_) ] , qw/input list here/ ; 

This is great because it is so compact, being so effective.

+2
source

Here you go using the Shwartzian Transform:

 my @files = <DATA>; print join '', map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [ m/album(\d+)/, $_ ] } @files; __DATA__ album12.htm album1.htm album2.htm album10.htm 
+1
source

Here's an alternative solution using reduce :

 use strict; use warnings; use List::Util 'reduce'; my $max = reduce { my ($aval, $bval) = ($a =~ m/album(\d+)/, $b =~ m/album(\d+)/); $aval > $bval ? $a : $b } <DATA>; print "max album is $max\n"; __DATA__ album1.htm album100.htm album2.htm album3.htm album99.htm 
+1
source

Here's a general solution:

 my @sorted_list = map { $_->[0] } # we stored it at the head of the list, so we can pull it out sort { # first test a normalized version my $v = $a->[1] cmp $b->[1]; return $v if $v; my $lim = @$a > @$b ? @$a : @$b; # we alternate between ascii sections and numeric for ( my $i = 2; $i < $lim; $i++ ) { $v = ( $a->[$i] || '' ) cmp ( $b->[$i] || '' ); return $v if $v; $i++; $v = ( $a->[$i] || 0 ) <=> ( $b->[$i] || 0 ); return $v if $v; } return 0; } map { # split on digits and retain captures in place. my @parts = split /(\d+)/; my $nstr = join( '', map { m/\D/ ? $_ : '0' x length() } @parts ); [ $_, $nstr, @parts ]; } @directory_names ; 
+1
source

Source: https://habr.com/ru/post/1200041/


All Articles