How to optimize two-dimensional hash movement in Perl?

Question

How to optimize two-dimensional hash movement in Perl?

I have a hash of %signal_db hashes. Typical element: $signal_db{$cycle}{$key} . There are 10,000 seconds of signals and 10,000 seconds of keys.

Is there any way to optimize (temporarily) this piece of code:

 foreach my $cycle (sort numerically keys %signal_db) { foreach my $key (sort keys %{$signal_db{$cycle}}) { print $signal_db{$cycle}{$key}.$key."\n"; } }

Elements should be printed in the same order as in my code.

+6

optimization perl multidimensional-array traversal hash

Igor Oks Jan 15 '12 at 16:50

source share

3 answers

 my %signal_db = map {$_ => {1 .. 1000}} 1 .. 1000; sub numerically {$a <=> $b} sub orig { my $x; foreach my $cycle (sort numerically keys %signal_db) { foreach my $key (sort keys %{$signal_db{$cycle}}) { $x += length $signal_db{$cycle}{$key}.$key."\n"; } } } sub faster { my $x; our ($cycle, $key, %hash); # move allocation out of the loop local *hash; # and use package variables which are faster to alias into foreach $cycle (sort {$a <=> $b} # the {$a <=> $b} literal is optimized keys %signal_db) { *hash = $signal_db{$cycle}; # alias into %hash foreach $key (sort keys %hash) { $x += length $hash{$key}.$key."\n"; # simplify the lookup } } } use Benchmark 'cmpthese'; cmpthese -5 => { orig => \&orig, faster => \&faster, };

which gets:

  Rate orig faster
 orig 2.56 / s - -15%
 faster 3.03 / s 18% -

Not a huge win, but it is something. There is not much that you can optimize without changing the data structure for using pre-sorted arrays. (or writing it all down in XS)

Switching foreach loops to use external package variables saves a little time since perl doesn't need to create vocabulary in a loop. In addition, package variables seem a little faster for aliases. Reducing internal search at the same level also helps.

Guess you are printing to STDOUT and then redirecting the output to a file? If so, using Perl to directly open the output file and then printing to this descriptor may help improve file I / O performance. Another micro-optimization may be an experiment with different record sizes. For example, does this save time to build an array in the inner loop, and then attach / print it at the bottom of the outer loop? But this is something that is quite device dependent (and possibly pointless due to other layers of IO caching), so I will leave this test to you.

+4

Eric Strom Jan 15 '12 at 21:28

source share

At first, I experimented with Sort::Key module , because sorting takes longer than just looping and typing. Also, if the internal hash keys are (basically) identical, then you should just pre-empt them, but I assume that it is not, otherwise you will already do it.

Obviously, you should try to assign the $ signal_db {$ loop} link. You may find that each faster than keys plus a search, especially if used with Sort::Key . I would check if the card works faster than foreach , maybe the same thing, but who knows. You might find that print is faster if you pass it a list or call it several times.

I have not tried this code, but threw away all these ideas, except each gives:

 foreach my $cycle (nsort keys %signal_db) { my $r = $signal_db{$cycle}; map { print ($r->{$_},$_,"\n"); } (nsort keys %$r); }

There is an article on perl sorting here , check out the Schwartz transform if you want to see how each can be used.

If your code does not have to be safe, then you could disable Perl protection against algorithmic complexity attacks by setting PERL_HASH_SEED or related variables and / or recompiling Perl with the changed settings, so that the perl keys and values commands returned the keys or values in sorted order already, thus saving you considerable time sorting them. But before that, check out this 28C3 talk . I don't know, even if this works, you will need to read this part of the Perl source code, maybe just implement your loop in C.

+3

Jeff burdges Jan 15 '12 at 18:16

source share

ruz · Accepted Answer · 2012-01-15T19:58:45+0000

Two micro-optimizations: displaying an internal hash instead of constantly dereferencing and a buffer instead of constant printing. You can get rid of sorting using alternative storage formats by testing two options. Results:

  Rate original try3 alternative alternative2 original 46.1/s -- -12% -21% -32% try3 52.6/s 14% -- -10% -22% alternative 58.6/s 27% 11% -- -13% alternative2 67.5/s 46% 28% 15% --

Conclusion:

It is better to use a pre-sorted storage format, but without a C-win, it will probably be within 100% (on my test dataset). The data information provided assumes that the keys in the external hash are almost sequential numbers, so this screams for the array.

Script:

 #!/usr/bin/env perl use strict; use warnings; use Benchmark qw/timethese cmpthese/; my %signal_db = map { $_ => {} } 1..1000; %$_ = map { $_ => $_ } 'a'..'z' foreach values %signal_db; my @signal_db = map { { cycle => $_ } } 1..1000; $_->{'samples'} = { map { $_ => $_ } 'a'..'z' } foreach @signal_db; my @signal_db1 = map { $_ => [] } 1..1000; @$_ = map { $_ => $_ } 'a'..'z' foreach grep ref $_, @signal_db1; use Sort::Key qw(nsort); sub numerically { $a <=> $b } my $result = cmpthese( -2, { 'original' => sub { open my $out, '>', 'tmp.out'; foreach my $cycle (sort numerically keys %signal_db) { foreach my $key (sort keys %{$signal_db{$cycle}}) { print $out $signal_db{$cycle}{$key}.$key."\n"; } } }, 'try3' => sub { open my $out, '>', 'tmp.out'; foreach my $cycle (map $signal_db{$_}, sort numerically keys %signal_db) { my $tmp = ''; foreach my $key (sort keys %$cycle) { $tmp .= $cycle->{$key}.$key."\n"; } print $out $tmp; } }, 'alternative' => sub { open my $out, '>', 'tmp.out'; foreach my $cycle (map $_->{'samples'}, @signal_db) { my $tmp = ''; foreach my $key (sort keys %$cycle) { $tmp .= $cycle->{$key}.$key."\n"; } print $out $tmp; } }, 'alternative2' => sub { open my $out, '>', 'tmp.out'; foreach my $cycle (grep ref $_, @signal_db1) { my $tmp = ''; foreach (my $i = 0; $i < @$cycle; $i+=2) { $tmp .= $cycle->[$i+1].$cycle->[$i]."\n"; } print $out $tmp; } }, } );

How to optimize two-dimensional hash movement in Perl?

More articles: