Strange work observed with memoized function

Question

Strange work observed with memoized function

I was going with something that uses the Euclidean algorithm to calculate the GCD of two numbers. I, as usual, implemented a standard single-line liner, and it worked perfectly. It is used in an algorithm that calculates a series and calls gcd() several times per element, as n gets larger. I decided to see if I could do it better while retaining my memoirs, so here is what I tried:

 size_t const gcd(size_t const a, size_t const b) { return b == 0 ? a : gcd(b, a % b); } struct memoized_gcd : private std::unordered_map<unsigned long long, size_t> { size_t const operator()(size_t const a, size_t const b) { unsigned long long const key = (static_cast<unsigned long long>(a) << 32) | b; if (find(key) == end()) (*this)[key] = b == 0 ? a : (*this)(b, a % b); return (*this)[key]; } }; //std::function<size_t (size_t, size_t)> gcd_impl = gcd<size_t,size_t>; std::function<size_t (size_t, size_t)> gcd_impl = memoized_gcd();

I call the selected function through an instance of std::function later. Interestingly, when, for example, n = 10000, the calculation is performed for 8 seconds on this computer, and with the version stored in memory it is close to the minute, all other things being equal.

Am I missing something obvious? I use key as appropriate, so I don't need to specialize std::hash for the hash map. The only thing I can think of is perhaps that the memoized version does not receive TCO and gcd() , or that the call via std::function slow for the functor (although I use it for both), or it is possible that I am set aside. Guru, show me the way.

Notes

I tried this on win32 and win64 with g ++ 4.7.0 and linux x86 with g ++ 4.6.1 and 4.7.1.

I also tried the version with std::map<std::pair<size_t, size_t>, size_t> , which had comparable performance with the unmemoized version.

+6

c ++ performance optimization c ++ 11

Keith layne Jul 13 '12 at 1:15

source share

3 answers

No wonder. On modern processors, memory access is very slow, especially if it is not in the cache. It is often faster to recalculate a value than to store it in memory.

0

Antimony Jul 13 '12 at 1:22

source share

Frequent heap allocation (when creating new entries). Also std :: unordered_map overhead (while it may be constant time, it is of course slower than shifting a simple array). The cache also skips (access pattern function and size).

If you want to make a “clean” comparison, you can try converting it to using a static, stacked simple array; it may be a rare lookup table that uses more memory, but it will be more specific to memoization iff , you can put the entire memoized array in your processor cache.

0

Preet kukreti Jul 13 '12 at 2:07

source share

Igor ostrovsky · Accepted Answer · 2012-07-13T05:23:27+0000

The main problem with your version of GCD is that it can use a huge amount of memory depending on the usage pattern.

For example, if you calculate GCD (a, b) for all pairs 0 <= a <10,000, 0 <= b <10,000, the reminder table will contain 100,000,000 entries. Since on x86 each record is 12 bytes, the hash table occupies at least 1.2 GB of memory. Work with this amount of memory will be slow.

And of course, if you evaluate a GCD with values> = 10,000, you can make the table arbitrarily large ... at least until you finish the address space or commit limit.

Summary In general, remembering GCD is a bad idea because it leads to unlimited memory usage.

There are a few subtle points that could be discussed:

Since the table exceeds various sizes, it will be stored in slow and slow memory: first L1 cache, then L2 cache, L3 cache (if any), physical memory, disk. Obviously, the cost of memoization rises sharply as the table grows.
If you know that all inputs are in a small range (e.g. 0 <= x <100), using memoization or a pre-computed table can still be an optimization. It's hard to be sure - you will have to measure in your specific scenario.
There are potentially other ways to optimize GCD. For example, I'm not sure if g ++ will automatically recognize tail recursion in this example. If not, you can improve performance by rewriting recursion in a loop.

But, as I said, it is not at all surprising that the algorithm you published does not work well.

Strange work observed with memoized function

More articles: