Improving treap implementation

Here is my implementation of the kind of treap (with implicit keys and some additional information stored in nodes): http://hpaste.org/42839/treap_with_implicit_keys

According to profiling data, GC takes 80% of the time for this program. As far as I understand, this is due to the fact that every time a node is "changed", each node on the path to the root is recreated.

Is there something I can do here to improve performance, or do I need to go down to the realm of the ST monad?

+17
performance optimization garbage-collection data-structures haskell
Jan 07 '11 at 15:07
source share
1 answer

Using GHC 7.0.3, I can reproduce your heavy GC behavior:

$ time ./A +RTS -s %GC time 92.9% (92.9% elapsed) ./A +RTS -s 7.24s user 0.04s system 99% cpu 7.301 total 

I spent 10 minutes going through the program. Here is what I did, in order:

  • Set the flag GHC -H, increasing the limits in GC
  • Check unpacking
  • Investment improvement
  • Adjust the selection area of ​​the first generation

The result is 10x acceleration and GC about 45% of the time.




In order to use the magic -H GHC magic flag, we can slightly reduce the execution time:

  $ time ./A +RTS -s -H %GC time 74.3% (75.3% elapsed) ./A +RTS -s -H 2.34s user 0.04s system 99% cpu 2.392 total 

Not bad!

UNPACK pragmas on Tree nodes will do nothing, so delete them.

Attaching update reduces execution time:

  ./A +RTS -s -H 1.84s user 0.04s system 99% cpu 1.883 total 

like insert height

  ./A +RTS -s -H 1.74s user 0.03s system 99% cpu 1.777 total 

So, although it's fast, the GC is still dominant - since we are testing the distribution, after all. One thing we can do is increase the size of the first generator:

  $ time ./A +RTS -s -A200M %GC time 45.1% (40.5% elapsed) ./A +RTS -s -A200M 0.71s user 0.16s system 99% cpu 0.872 total 

And increasing the rollout threshold, as JohnL suggested, helps a bit,

  ./A +RTS -s -A100M 0.74s user 0.09s system 99% cpu 0.826 total 

which is 10 times faster than we started? Not bad.




Using ghc-gc-tune , you can see the runtime as a function of -A and -H ,

Time and gc

Interestingly, the best working times use very large -A values, for example.

 $ time ./A +RTS -A500M ./A +RTS -A500M 0.49s user 0.28s system 99% cpu 0.776s 
+26
Apr 17 2018-11-11T00:
source share



All Articles