Efficient string data logging in Haskell ST Monad

Question

Efficient string data logging in Haskell ST Monad

I have a Haskell program that generates ~ 280M to write text data during a run inside the ST monad. Here, almost all memory consumption goes (with the protocol disabled, the program allocates a total of 3 MB of real memory).

The problem is that my memory is running out. While the program memory consumption exceeds 1.5 GB, and finally, it ends when it tries to write a log line to a file.

The log function takes a string and accumulates the log data into the line builder stored in STRef in the environment:

import qualified Data.ByteString.Lazy.Builder as BB ... myLogFunction s = do ... lift $ modifySTRef myStringBuilderRef (<> BB.stringUtf8 s)

I tried to introduce rigor using beat patterns and modify STRef ', but this further worsened memory consumption.

I am writing a log line according to the recommendation of the hPutBuilder documentation, for example:

  hSetBinaryMode h True hSetBuffering h $ BlockBuffering Nothing BB.hPutBuilder h trace

This consumes several additional GB of memory. I tried various buffering settings and first converted to lazy ByteString (slightly better).

Qs:

How can I minimize memory consumption while the program is running? I would expect that given the hard-coded ByteString representation and the corresponding degree of rigor, I would need a little more memory than the ~ 280M of actual log data that I store.
How to write the result to a file without allocating memory? I don’t understand why Haskell needs GBs of memory to just transfer some resident data to a file.

Edit:

Here's a memory profile for a small run (~ 42 MB of log data). Total memory usage is 3 MB with log disabled.

  15,632,058,700 bytes allocated in the heap 4,168,127,708 bytes copied during GC 343,530,916 bytes maximum residency (42 sample(s)) 7,149,352 bytes maximum slop 931 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 29975 colls, 0 par 5.96s 6.15s 0.0002s 0.0104s Gen 1 42 colls, 0 par 6.01s 7.16s 0.1705s 1.5604s TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.00s ( 0.00s elapsed) MUT time 32.38s ( 33.87s elapsed) GC time 11.97s ( 13.31s elapsed) RP time 0.00s ( 0.00s elapsed) PROF time 0.00s ( 0.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 44.35s ( 47.18s elapsed) Alloc rate 482,749,347 bytes per MUT second Productivity 73.0% of total user, 68.6% of total elapsed

Edit:

I ran a memory profile with a little log run:

profile http://imageshack.us/a/img14/9778/6a5o.png

I tried to add beat patterns, $ !, deepseq / $ !!, force, etc. in appropriate places, but it does not seem to make any difference. How to get Haskell to actually take my string / printf expression etc. And put it in a tight ByteString instead of storing all those [Char] lists and unappreciated tricks around?

Edit:

Here's the actual full trace function

 trace s = do enable <- asks envTraceEnable when (enable) $ do envtrace <- asks envTrace let b = B8.pack s lift $ b `seq` modifySTRef' envtrace (<> BB.byteString b)

Is this "strict" enough? Do I need to keep track of anything if I call this typeclass function inside my ReaderT / ST monad? Just so that it is actually called and not put off in any way.

 do trace $ printf "%i" myint

excellent?

Thanks!

+6

logging out-of-memory haskell lazy-evaluation bytestring

NBFGRTW Aug 15 '13 at 8:55

source share

1 answer

danidiaz · Accepted Answer · 2013-08-15T12:16:26+0000

Since log messages take up so much memory, it would be more efficient to write them to a file as soon as they are created. This seems impossible because we are inside the ST monad and you cannot perform IO during the ST monad.

But there is a solution: use some kind of coroutine monad transformer, like the "pipe" package. The following is an example of using pipes-3.3.0 :

 {-# LANGUAGE ExplicitForAll #-} {-# LANGUAGE RankNTypes #-} {-# LANGUAGE LiberalTypeSynonyms #-} import Control.Monad import Control.Monad.ST import Control.Monad.ST (stToIO) -- Transforms ST computations into IO computations import Control.Monad.Trans import Control.Monad.Morph (hoist) -- Changes the base monad of a monad transformer import Control.Proxy.Prelude (stdoutD) -- Consumer that prints to stdout import Control.Proxy.Core import Control.Proxy.Core.Correct import Data.STRef simpleST :: ST s Bool simpleST= do ref <- newSTRef True writeSTRef ref False readSTRef ref -- Like simpleST, but emits log messages during the computation loggingST :: Producer ProxyCorrect String (ST s) Bool loggingST = do ref <- lift $ newSTRef True respond "Before writing" lift $ writeSTRef ref False respond "After writing" lift $ readSTRef ref adapt :: (forall s . Producer ProxyCorrect String (ST s) a) -> Producer ProxyCorrect String IO a adapt x = hoist stToIO x main :: IO () main = do result <- runProxy $ (\_ -> adapt loggingST) >-> stdoutD putStrLn . show $ result

It prints a log to stdout. At startup, it displays the following:

 Before writing After writing False

It works as follows: you send log messages to the producer using respond , while still in the ST monad. This way you can log in and still be sure that some strange things will not do your calculations. However, this forces you to transfer your code using elevators.

Once you have built your ST calculation, you will convert the manufacturer’s base monad from ST to IO using hoist . hoist is a useful feature that allows you to change the tablecloth while the dishes are still on the table.

Now we are in an IO-country! It remains only to connect the manufacturer to the consumer who actually writes the messages (here they are printed on stdout, but you can just as easily connect to the consumer who writes to the file.)

Efficient string data logging in Haskell ST Monad

More articles: