Haskell Alex: lexer base memory

I am trying to write a simple lexer that will print all the words in its input, where the word is the maximum sequence of letters a-zA-Z. All other characters should be ignored.

My Alex program for this, which uses a wrapper basic-bytestring, uses as much memory as the input size. I would expect it to work in constant memory.

The heap profile using -hcshows only one block of pinned memory, rapidly increasing to the size of the input, and then slowly decreasing to 0.

Interestingly, when using a wrapper basicand regular strings, only read-only memory is used.

Input file alex

{
module Main where
import Data.ByteString.Lazy as B
}

%wrapper "basic-bytestring"

$letters = [a-zA-Z]
$nonletters = [~$letters\n]

tokens :-
  $nonletters+  ;
  $letters+     {B.copy}

{
main = do
  buf <- B.getContents
  let toks = alexScanTokens buf
  mapM_ B.putStrLn toks
}

When run with an input of size 10M, the output +RTS -sis

   2,924,029,784 bytes allocated in the heap
       7,869,696 bytes copied during GC
       9,958,560 bytes maximum residency (5 sample(s))
       1,423,704 bytes maximum slop
              22 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      5634 colls,     0 par    0.06s    0.05s     0.0000s    0.0002s
  Gen  1         5 colls,     0 par    0.00s    0.00s     0.0004s    0.0011s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    2.79s  (  2.81s elapsed)
  GC      time    0.06s  (  0.06s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    2.85s  (  2.86s elapsed)

  %GC     time       2.0%  (1.9% elapsed)

  Alloc rate    1,047,072,808 bytes per MUT second

  Productivity  98.0% of total user, 97.6% of total elapsed

, , .

+4

Source: https://habr.com/ru/post/1568303/


All Articles