Making sense in the GHC profiler

I am trying to understand the meaning of the GHC profiler. There is a fairly simple application that uses the werq and lens-aeson , and while studying GHC profiling, I decided to play a little with it.

Using various parameters ( time tool, +RTS -p -RTS and +RTS -p -h ), I acquired completely different amounts of memory usage. Having all these numbers, I’m now completely lost, trying to understand what is happening and how much memory the application actually uses.

This situation reminds me of Arthur Bloch’s phrase: β€œA man with a clock knows what it is. A man with a two clock is never sure.”

Can you please suggest me how I can read all these numbers and what is the meaning of each of them.

Here are the numbers:

time -l reports 19M

 #/usr/bin/time -l ./simple-wreq ... 3.02 real 0.39 user 0.17 sys 19070976 maximum resident set size 0 average shared memory size 0 average unshared data size 0 average unshared stack size 21040 page reclaims 0 page faults 0 swaps 0 block input operations 0 block output operations 71 messages sent 71 messages received 2991 signals received 43 voluntary context switches 6490 involuntary context switches 

Using the +RTS -p -RTS flag +RTS -p -RTS reports 92M . Although he says "total alloc", it seems strange to me that a simple application like this can allocate and release 91M

 # ./simple-wreq +RTS -p -RTS # cat simple-wreq.prof Fri Oct 14 15:08 2016 Time and Allocation Profiling Report (Final) simple-wreq +RTS -N -p -RTS total time = 0.07 secs (69 ticks @ 1000 us, 1 processor) total alloc = 91,905,888 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc main.g Main 60.9 88.8 MAIN MAIN 24.6 2.5 decodeLenient/look Data.ByteString.Base64.Internal 5.8 2.6 decodeLenientWithTable/fill Data.ByteString.Base64.Internal 2.9 0.1 decodeLenientWithTable.\.\.fill Data.ByteString.Base64.Internal 1.4 0.0 decodeLenientWithTable.\.\.fill.\ Data.ByteString.Base64.Internal 1.4 0.1 decodeLenientWithTable.\.\.fill.\.\.\.\ Data.ByteString.Base64.Internal 1.4 3.3 decodeLenient Data.ByteString.Base64.Lazy 1.4 1.4 individual inherited COST CENTRE MODULE no. entries %time %alloc %time %alloc MAIN MAIN 443 0 24.6 2.5 100.0 100.0 main Main 887 0 0.0 0.0 75.4 97.4 main.g Main 889 0 60.9 88.8 75.4 97.4 object_ Data.Aeson.Parser.Internal 925 0 0.0 0.0 0.0 0.2 jstring_ Data.Aeson.Parser.Internal 927 50 0.0 0.2 0.0 0.2 unstream/resize Data.Text.Internal.Fusion 923 600 0.0 0.3 0.0 0.3 decodeLenient Data.ByteString.Base64.Lazy 891 0 1.4 1.4 14.5 8.1 decodeLenient Data.ByteString.Base64 897 500 0.0 0.0 13.0 6.7 .... 

+RTS -p -h and hp2ps show me the following image and two numbers: 114K in the header and something around 1.8Mb on the graph. Memory profiling

And, just in case, here is the application:

 module Main where import Network.Wreq import Control.Lens import Data.Aeson.Lens import Control.Monad main :: IO () main = replicateM_ 10 g where g = do r <- get "http://httpbin.org/get" print $ r ^. responseBody . key "headers" . key "User-Agent" . _String 

UPDATE 1: Thanks everyone for the incredible good reviews. As suggested, I add +RTS -s output, so the whole picture is created for everyone who reads it.

 #./simple-wreq +RTS -s ... 128,875,432 bytes allocated in the heap 32,414,616 bytes copied during GC 2,394,888 bytes maximum residency (16 sample(s)) 355,192 bytes maximum slop 7 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 194 colls, 0 par 0.018s 0.022s 0.0001s 0.0022s Gen 1 16 colls, 0 par 0.027s 0.031s 0.0019s 0.0042s 

UPDATE 2 : executable file size:

 #du -h simple-wreq 63M simple-wreq 
+5
source share
2 answers

A man with a watch knows what it is. A man with two hours is never sure.

Ah, but what do watches do? Both are intended to display the current time in UTC? Or should one of them show time in UTC, and the other - time at a specific place on Mars? As long as they are synchronized, the second scenario will not be a problem, right?

And this is exactly what is happening here. You are comparing different memory dimensions:

  • maximum residence
  • total allocated memory

Maximum residence is the highest amount of memory that your program is ever using at the moment. This is 19 MB. However, the total amount of allocated memory is much larger, since it works like a GHC: it "allocates" memory for objects that collect garbage, which is almost everything that is not unpacked.

Consider example C for this:

 int main() { int i; char * mem; for(i = 0; i < 5; ++i) { mem = malloc(19 * 1000 * 1000); free(mem); } return 0; } 

Whenever we use malloc , we allocate 19 megabytes of memory. However, we free the memory immediately after. The largest amount of memory that we have ever had at one point is 19 megabytes (and a bit more for the stack and the program itself).

However, in general, we allocate 5 * 19M, 95M total . However, we could run our small program with only 20 megabytes of RAM. This is the difference between shared memory and maximum residency . Please note that residency reported in time is always at least du <executable> , since it must also be in memory.

As they say, the easiest way to generate statistics is -s , which will show what was the maximum residence from the point of view of the Haskell program. In your case, it will be 1.9M , the number in your heap profile (or double the amount caused by profiling). And yes, Haskell executables tend to get extremely large as libraries are statically linked.

+7
source

time -l displays the (resident, i.e. not unloaded) process size visible to the operating system (obviously). This includes the twice the maximum Haskell heap size (due to the GHC GC running), plus everything else allocated by RTS or other C libraries, plus the code of your executable, as well as the libraries it depends on, and etc. I think in this case the main contributor to 19M is the size of your exectuable.

total alloc - The total amount allocated to the Haskell heap. This is not at all an indicator of the maximum heap size (this is what people usually call "how much memory is used by my program"). The distribution is very cheap, and accommodation rates of around 1 GB / s are typical of the Haskell program.

The number in the hp2ps output header "114.272 bytes x seconds" is again something completely different: it is the integral of the graph and is measured in bytes * seconds, not in bytes. For example, if your program rests on a 10 MB structure for 4 seconds, this will increase this number by 40 MB * s.

The number shown on the graph of about 1.8 MB represents the actual maximum Haskell heap size, which is probably of most interest to you.

You have omitted the most useful source of numbers about running your program, which runs it with +RTS -s (this does not even require it to be built with profiling).

+4
source

Source: https://habr.com/ru/post/1258211/


All Articles