Haskell: Not a memory leak from ST / GC?

I have a calculation inside ST that allocates memory through Data.Vector.Unboxed.Mutable. A vector is never read or written, and no link is saved outside of runST (as far as I know). The problem is that when I run my ST calculations several times, I sometimes seem to save memory for the vector around.

Distribution Statistics:

5,435,386,768 bytes allocated in the heap 5,313,968 bytes copied during GC 134,364,780 bytes maximum residency (14 sample(s)) 3,160,340 bytes maximum slop 518 MB total memory in use (0 MB lost due to fragmentation) 

Here I call runST 20x with different values ​​for my calculation and a 128 MB vector (again - unused, not returned or not mentioned outside of ST). The maximum residence looks good, basically just my vector plus a few MB of other things. But general memory usage indicates that I have four copies of the vector at the same time. This scales perfectly with the size of the vector, for 256 MB we get 1030 MB, as expected.

Using a 1GB vector ends up in memory (4x1GB + overhead> 32 bits). I don’t understand why RTS preserves apparently unused, unidentified memory, and not just GC'ing, at least at the moment when the distribution would otherwise fail.

Starting with + RTS -S shows the following:

  Alloc Copied Live GC GC TOT TOT Page Flts bytes bytes bytes user elap user elap 134940616 13056 134353540 0.00 0.00 0.09 0.19 0 0 (Gen: 1) 583416 6756 134347504 0.00 0.00 0.09 0.19 0 0 (Gen: 0) 518020 17396 134349640 0.00 0.00 0.09 0.19 0 0 (Gen: 1) 521104 13032 134359988 0.00 0.00 0.09 0.19 0 0 (Gen: 0) 520972 1344 134360752 0.00 0.00 0.09 0.19 0 0 (Gen: 0) 521100 828 134360684 0.00 0.00 0.10 0.19 0 0 (Gen: 0) 520812 592 134360528 0.00 0.00 0.10 0.19 0 0 (Gen: 0) 520936 1344 134361324 0.00 0.00 0.10 0.19 0 0 (Gen: 0) 520788 1480 134361476 0.00 0.00 0.10 0.20 0 0 (Gen: 0) 134438548 5964 268673908 0.00 0.00 0.19 0.38 0 0 (Gen: 0) 586300 3084 268667168 0.00 0.00 0.19 0.38 0 0 (Gen: 0) 517840 952 268666340 0.00 0.00 0.19 0.38 0 0 (Gen: 0) 520920 544 268666164 0.00 0.00 0.19 0.38 0 0 (Gen: 0) 520780 428 268666048 0.00 0.00 0.19 0.38 0 0 (Gen: 0) 520820 2908 268668524 0.00 0.00 0.19 0.38 0 0 (Gen: 0) 520732 1788 268668636 0.00 0.00 0.19 0.39 0 0 (Gen: 0) 521076 564 268668492 0.00 0.00 0.19 0.39 0 0 (Gen: 0) 520532 712 268668640 0.00 0.00 0.19 0.39 0 0 (Gen: 0) 520764 956 268668884 0.00 0.00 0.19 0.39 0 0 (Gen: 0) 520816 420 268668348 0.00 0.00 0.20 0.39 0 0 (Gen: 0) 520948 1332 268669260 0.00 0.00 0.20 0.39 0 0 (Gen: 0) 520784 616 268668544 0.00 0.00 0.20 0.39 0 0 (Gen: 0) 521416 836 268668764 0.00 0.00 0.20 0.39 0 0 (Gen: 0) 520488 1240 268669168 0.00 0.00 0.20 0.40 0 0 (Gen: 0) 520824 1608 268669536 0.00 0.00 0.20 0.40 0 0 (Gen: 0) 520688 1276 268669204 0.00 0.00 0.20 0.40 0 0 (Gen: 0) 520252 1332 268669260 0.00 0.00 0.20 0.40 0 0 (Gen: 0) 520672 1000 268668928 0.00 0.00 0.20 0.40 0 0 (Gen: 0) 134553500 5640 402973292 0.00 0.00 0.29 0.58 0 0 (Gen: 0) 586776 2644 402966160 0.00 0.00 0.29 0.58 0 0 (Gen: 0) 518064 26784 134342772 0.00 0.00 0.29 0.58 0 0 (Gen: 1) 520828 3120 134343528 0.00 0.00 0.29 0.59 0 0 (Gen: 0) 521108 756 134342668 0.00 0.00 0.30 0.59 0 0 (Gen: 0) 

Here, it looks like we have "live bytes" exceeding ~ 128 MB.

Profile +RTS -hy basically just says that we allocate 128 MB:

http://imageshack.us/a/img69/7765/45q8.png

I tried to reproduce this behavior in a simpler program, but even with replicating the exact installation using ST, a Reader containing a vector, the same monad / program structure, etc. a simple test program does not show this. To simplify my large program, the behavior also ends eventually when you delete apparently completely unrelated code.

Qs:

  • Do I really keep this vector about 4 times out of 20?
  • If so, how can I say, since +RTS -hy and maximum residency claim that I am not, and what can I do to stop this behavior?
  • If not, why doesn't Haskell execute GC'ing and run out of address space / memory, and what can I do to stop this behavior?

Thanks!

+6
source share
1 answer

I suspect this is a bug in the GHC and / or RTS.

First of all, I’m sure there’s no real space leak or something like that.

Causes:

  • Vector is never used anywhere. Do not read, do not write, do not refer. It must be compiled after running runST. Even when the ST calculation returns one Int, which is immediately printed to evaluate it, a memory problem still exists. There is no link to this data.
  • Each profiling mode offered by RTS is in violent agreement with the fact that I have never had more than one value of the selected / link to one vector. Every statistic and a pretty chart speaks about it.

Now here is an interesting bit. If I manually force GC by calling System.Mem.performGC after each run of my function, the problem completely disappears.

So, we have a case where the runtime has a GB memory, which (obviously !,!) can be fixed by the GC, and even in accordance with its own statistics, no one else holds. When the memory pool runs out, runtime is not collected, but instead requests the OS for more memory. And even when this finally fails, the runtime is still not going to (to remove GBs of memory, obviously), but instead decides to abort the program with an error outside the memory.

I am not an expert on Haskell, GHC or GC. But it looks terribly broken for me. I will report this as an error.

+2
source

Source: https://habr.com/ru/post/951908/


All Articles