Cached Schwartz Transform

Question

Cached Schwartz Transform

I am experiencing "Intermediate Perl," and it's pretty cool. I just finished the Schwartz Transformation section, and after it plunged into me, I began to wonder why the transformation doesn’t use the cache. In lists with multiple duplicate values, the conversion recounts the value for each of them, so I thought why not use a hash to cache the results. Here is the code:

# a place to keep our results my %cache; # the transformation we are interested in sub foo { # expensive operations } # some data my @unsorted_list = ....; # sorting with the help of the cache my @sorted_list = sort { ($cache{$a} //= &foo($a)) <=> ($cache{$b} //= &foo($b)) } @unsorted_list;

Am I missing something? Why isn’t the cached version of the Schwartz transform listed in the books, and generally just better, because at first glance I think the cached version should be more efficient?

Edit : daxim indicated in the comments that this is called an orc maneuver . Therefore, I did not go crazy, although I did not quite understand the name.

+4

optimization sorting caching perl memoization

davidk01 Jan 13 '11 at 18:20

source share

2 answers

Caching the Schwartz transform would be useful if you call foo() on several transforms:

 @sorted1 = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [$_, foo($_)] } @unsorted1; @sorted2 = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [$_, foo($_)] } @unsorted2;

If @unsorted1 and @unsorted2 have basically the same values, you will call foo() for the same values twice. If this feature is an expensive computing machine, you probably want to cache the results.

The easiest way to do this is to use the Memoize module :

 use Memoize; memoize('foo');

If you add these two lines, you do not need to worry about setting the cache for foo() yourself, Memoize processes it for you.

Edit: I just noticed that your species does not perform the Black Forest transformation. The whole point behind ST is that you only perform your expensive function once for each member of the list, so you make the whole map sort map construct. Although you could probably do some handwritten caching as it was, it would be non-standard Perl (in the sense that someone expected to see ST, and then would have to sit there and find out what your code is ) and can quickly become a service nightmare.

But yes, if your list has duplicate values, using a cache (manual or with Memoize ) can lead to faster Schwartz conversion. I say “maybe” because there may be times when searching for hashes is actually more expensive than calling foo() ( Memoize docs use sub foo { my $x = shift; return $x * $x } as an example of one of of these instances).

+2

Canspice Jan 13 '11 at 18:49

source share

mob · Accepted Answer · 2011-01-13T18:36:27+0000

(many other comments fixed)

To the extent that array searches are more efficient than hash searches (that is, $a->[1] faster than $cache{$a} ), the canonical form can be more efficient than your code, even with lots of duplicates.

Test results:

Here is my benchmarking:

 # when does an additional layer of caching improve the performance of # the Schwartzian transform? # methods: # 1. canonical Schwartzian transform # 2. cached transform # 3. canonical with memoized function # inputs: # 1. few duplicates (rand) # 2. many duplicates (int(rand)) # functions: # 1. fast # 2. slow use Benchmark; use Math::BigInt; use strict qw(vars subs); use warnings; no warnings 'uninitialized'; # fast_foo: a cheap operation, slow_foo: an expensive operation sub fast_foo { my $x = shift; exp($x) } sub slow_foo { my $x = shift; my $y = new Math::BigInt(int(exp($x))); $y->bfac() } # XXX_memo_foo: put caching optimization inside call to 'foo' my %fast_memo = (); sub fast_memo_foo { my $x = shift; if (exists($fast_memo{$x})) { return $fast_memo{$x}; } else { return $fast_memo{$x} = fast_foo($x); } } my %slow_memo = (); sub slow_memo_foo { my $x = shift; if (exists($slow_memo{$x})) { return $slow_memo{$x}; } else { return $slow_memo{$x} = slow_foo($x); } } my @functions = qw(fast_foo slow_foo fast_memo_foo slow_memo_foo); my @input1 = map { 5 * rand } 1 .. 1000; # 1000 random floats with few duplicates my @input2 = map { int } @input1; # 1000 random ints with many duplicates sub canonical_ST { my $func = shift @_; my @sorted = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, $func->($_)] } @_; return; } sub cached_ST { my $func = shift @_; my %cache = (); my @sorted = sort { ($cache{$a} //= $func->($a)) <=> ($cache{$b} //= $func->{$b}) } @_; return; } foreach my $input ('few duplicates','many duplicates') { my @input = $input eq 'few duplicates' ? @input1 : @input2; foreach my $func (@functions) { print "\nInput: $input\nFunction: $func\n-----------------\n"; Benchmark::cmpthese($func =~ /slow/ ? 30 : 1000, { 'Canonical' => sub { canonical_ST($func, @input) }, 'Cached' => sub { cached_ST($func, @input) } }); } }

and results (Strawberry Perl 5.12):

  Input: few duplicates
 Function: fast_foo
 -----------------
            Rate Canonical Cached
 Canonical 160 / s - -18%
 Cached 196 / s 22% -

 Input: few duplicates
 Function: slow_foo
 -----------------
             Rate Canonical Cached
 Canonical 7.41 / s - -0%
 Cached 7.41 / s 0% -

 Input: few duplicates
 Function: fast_memo_foo
 -----------------
            Rate Canonical Cached
 Canonical 153 / s - -25%
 Cached 204 / s 33% -

 Input: few duplicates
 Function: slow_memo_foo
 -----------------
             Rate Cached Canonical
 Cached 20.2 / s - -7%
 Canonical 21.8 / s 8% -

 Input: many duplicates
 Function: fast_foo
 -----------------
            Rate Canonical Cached
 Canonical 179 / s - -50%
 Cached 359 / s 101% -

 Input: many duplicates
 Function: slow_foo
 -----------------
             Rate Canonical Cached
 Canonical 11.8 / s - -62%
 Cached 31.0 / s 161% -

 Input: many duplicates
 Function: fast_memo_foo
 -----------------
            Rate Canonical Cached
 Canonical 179 / s - -50%
 Cached 360 / s 101% -

 Input: many duplicates
 Function: slow_memo_foo
 -----------------
             Rate Canonical Cached
 Canonical 28.2 / s - -9%
 Cached 31.0 / s 10% -

I am a little overwhelmed by these results - the canonical Schwartz transform has only a slight advantage in the most favorable conditions (expensive function call, several duplicates or lack of memories), and in other cases it is a rather significant drawback.The OP caching scheme inside the sort function even surpasses memoization outside sort . I did not expect this when I did tests, but I think the OP is for something.

Cached Schwartz Transform

Test results:

More articles: