Why doesn't Data.Binary encodeFile act lazily?

In GHCI, I run this simple test:

encodeFile "test" [0..10000000] 

The line runs very fast (<10 seconds), but my memory usage increases to ~ 500 MB before it runs out. Should encodeFile be lazy since it uses ByteString.Lazy?


Edit: Roman answer below is great! I also want to point this answer to another question that explains why Data.Binary does strong encoding in lists and provides a slightly more elegant work.

+6
source share
1 answer

Here's how serialization of lists is determined:

 instance Binary a => Binary [a] where put l = put (length l) >> mapM_ put l 

That is, first we serialize the length of the list, and then serialize the list itself.

To find out the length of the list, we need to evaluate the entire list. But we cannot collect garbage because its elements are necessary for the second part, mapM_ put l . Thus, the entire list must be stored in memory after the length is estimated and before the serialization of the elements.

Here's what the heap profile looks like:

profile

Notice how it grows when a list is built to calculate its length, and then decreases when items are serialized and can be assembled by GC.

So how to fix this? In your example, you already know the length. So you can write a function that takes a known length, rather than calculating it:

 import Data.Binary import Data.ByteString.Lazy as L import qualified Data.ByteString as B import Data.Binary.Put main = do let len = 10000001 :: Int bs = encodeWithLength len [0..len-1] L.writeFile "test" bs putWithLength :: Binary a => Int -> [a] -> Put putWithLength len list = put len >> mapM_ put list encodeWithLength :: Binary a => Int -> [a] -> ByteString encodeWithLength len list = runPut $ putWithLength len list 

This program runs within 53,000 heaps.

You can also enable the security function in putWithLength : calculate the length when serializing the list and check with the first argument at the end. If there is a discrepancy, make a mistake.

Exercise : Why do you need to pass the length to putWithLength instead of using the computed value as described above?

+9
source

Source: https://habr.com/ru/post/921353/


All Articles