Sort file names by numerical value

Preamble: I don’t like to ask such questions, but I’m stuck with it, and just learning Perl ... seems like an easy task, but I don’t know where to look.

I have a folder with a lot of xml files, all of them are called ".xml". I need to process these files in their numerical order, so "9123.xml" should appear before "2384747.xml". I have successfully sorted the list alphabetically as follows:

opendir(XMLDIR,$xmldirname); my @files = sort {$a cmp $b} readdir(XMLDIR); 

but that’s not what I need.

I also tried

 my @files = sort {$a <=> $b} readdir(XMLDIR); 

which obviously fails because the file names contain ".xml" and are not numeric in general.

Can someone open their heart and save me a week of viewing Perl tutorials?

+2
source share
4 answers

Despite your complaints, sort { $a <=> $b } readdir(XMLDIR) works. When Perl treats the string 2384747.xml as a number (as <=> does), it is treated as having a value of 2384747 .

 $ perl -wE'say 0+"2384747.xml"' Argument "2384747.xml" isn't numeric in addition (+) at -e line 1. 2384747 

Of course, these warnings are a problem. The decision you made tries to delete them, but does not delete them all, because it does not take into account what readdir will return . and .. You must first delete files that you do not need.

Here are two simple solutions:

 my @files = sort { no warnings 'numeric'; $a <=> $b } grep { /^(\d)\.xml/ } readdir(XMLDIR); my @files = sort { ( $a =~ /(\d+)/ )[0] <=> ( $b =~ /(\d+)/ )[0] } grep { /^(\d)\.xml/ } readdir(XMLDIR); 

In this particular case, you can optimize your code:

 my @files = map { "$_.xml" } # Recreate the file name. sort { $a <=> $b } # Compare the numbers. map { /^(\d)\.xml/ } # Extract the number from desired files. readdir(XMLDIR); 

The easiest and fastest solution is to use a natural look.

 use Sort::Key::Natural qw( natsort ); my @files = natsort grep !/^\.\.?/, readdir(XMLDIR); 
+4
source

You are actually pretty close. Just remove the ".xml" when inside your comparison:

 opendir(XMLDIR,$xmldirname); my @files = sort {substr($a, 0, index($a, '.')) <=> substr($b, 0, index($b, '.'))} readdir(XMLDIR); 
+2
source

The problem is that <=> cannot work on something that is not completely a number, in fact, if you use warnings; , you will get a message similar to this at runtime:

The argument "11139.xml" is not numeric in sorting by line testort.pl 9.

What you can do is extract the file name from the extension, sort it by file name, and then merge the extensions again. This can be done quite simply with Schwartzian transform :

 use strict; use warnings; use Data::Dumper; # get all of the XML files my @xml_files = glob("*.xml"); print 'Unsorted: ' . Dumper \@xml_files; @xml_files = map { join '.', @$_ } # join filename and extension sort { $a->[0] <=> $b->[0] } # sort against filename map { [split /\./] } @xml_files; # split on '.' print 'Sorted: ' . Dumper \@xml_files; __END__ Unsorted: $VAR1 = [ '11139.xml', '18136.xml', '28715.xml', '6810.xml', '9698.xml' ]; Sorted: $VAR1 = [ '6810.xml', '9698.xml', '11139.xml', '18136.xml', '28715.xml' ]; 
+1
source
 my @files = sort { my ($x) = split /\./, $a; my ($y) = split /\./, $b; $x <=> $y } readdir(XMLDIR); 

Or without temporary variables:

 my @files = sort {(split /\./, $a)[0] <=> (split /\./, $b)[0]} readdir(XMLDIR); 
+1
source

Source: https://habr.com/ru/post/1200033/


All Articles