Sort strings with specific letter order with Perl

Question

Sort strings with specific letter order with Perl

I am trying to sort name lists with Perl in a specific letter order to perform some special functions.
Sorting will work the same as sort { $a cmp $b } , but with a different letter sequence.
For example, random ordering of characters "abdrtwsuiopqe987654" ...

I tried to work with sort { $a myFunction $b } , but I'm new to Perl and I don’t see how to properly organize myFunction to get what I want.

Is there a specific function (package) that provides this functional function?
Do you have an example of a custom sort function related to strings?
Do you know how (or in what source file) the cmp function implemented with Perl to see how it works?

+6

sorting perl cpan

Jeanjoux Apr 13 '15 at 18:00

source share

2 answers

 use Sort::Key qw(keysort); my @sorted = keysort { tr/abdrtwsuiopqe987654/abcdefghijklmnopqrs/r } @data;

Or in old perls that do not support the r flag in tr/.../.../r

 my @sorted = keysort { my $key = $_; $key =~ tr/abdrtwsuiopqe987654/abcdefghijklmnopqrs/; $key } @data;

You can also create a specialized sort routine for this kind of data as follows:

 use Sort::Key::Maker 'my_special_sort', sub { tr/abdrtwsuiopqe987654/abcdefghijklmnopqrs/r }, qw(string); my @sorted = my_special_sort @data; my @sorted2 = my_special_sort @data2;

0

salva Apr 23 '15 at 8:55

source share

ikegami · Accepted Answer · 2015-04-13T18:11:22+0000

Perhaps this is the fastest ^[1] :

 sub my_compare($$) { $_[0] =~ tr{abdrtwsuiopqe987654}{abcdefghijklmnopqrs}r cmp $_[1] =~ tr{abdrtwsuiopqe987654}{abcdefghijklmnopqrs}r } my @sorted = sort my_compare @unsorted;

Or, if you want something more dynamic, the following may be the fastest ^[2] :

 my @syms = split //, 'abdrtwsuiopqe987654'; my @map; $map[ord($syms[$_])] = $_ for 0..$#syms; sub my_compare($$) { (pack 'C*', map $map[ord($_)], unpack 'C*', $_[0]) cmp (pack 'C*', map $map[ord($_)], unpack 'C*', $_[1]) } my @sorted = sort my_compare @unsorted;

We could compare character by character, but it will be much slower.

 use List::Util qw( min ); my @syms = split //, 'abdrtwsuiopqe987654'; my @map; $map[ord($syms[$_])] = $_ for 0..$#syms; sub my_compare($$) { my $l0 = length($_[0]); my $l1 = length($_[1]); for (0..min($l0, $l1)) { my $ch0 = $map[ord(substr($_[0], $_, 1))]; my $ch1 = $map[ord(substr($_[1], $_, 1))]; return -1 if $ch0 < $ch1; return +1 if $ch0 > $ch1; } return -1 if $l0 < $l1; return +1 if $l0 > $l1; return 0; } my @sorted = sort my_compare @unsorted;

Technically, this can be done faster with GRT.

 my @sorted = map /\0(.*)/s, sort map { tr{abdrtwsuiopqe987654}{abcdefghijklmnopqrs}r . "\0" . $_ } @unsorted;

Technically, this can be done faster with GRT.

 my @sorted = map /\0(.*)/s, sort map { ( pack 'C*', map $map[ord($_)], unpack 'C*', $_ ) . "\0" . $_ } @unsorted;

cmp is implemented by the scmp operator.

 $ perl -MO=Concise,-exec -e'$x cmp $y' 1 <0> enter 2 <;> nextstate(main 1 -e:1) v:{ 3 <#> gvsv[*x] s 4 <#> gvsv[*y] s 5 <2> scmp[t3] vK/2 6 <@> leave[1 ref] vKP/REFC

The scmp operator scmp implemented by the pp_scmp function in pp.c , which is actually just a wrapper for sv_cmp_flags in sv.c when use locale; It does not work. sv_cmp_flags uses the C memcmp library function or a version with UTF-8 support (depending on the type of scalar).

Sort strings with specific letter order with Perl

More articles: