Running memory when counting characters in a large file

I want to count the number of occurrences of each character in a large file. Although I know that counting should be strictly implemented in Haskell (which I tried to achieve with foldl), I still do not have enough memory. For comparison: the file size is about 2 GB, and the computer is 100 GB of memory. There are not many different characters in this file - maybe 20. What am I doing wrong?

ins :: [(Char,Int)] -> Char -> [(Char,Int)]
ins [] c = [(c,1)]
ins ((c,i):cs) d
    | c == d = (c,i+1):cs
    | otherwise = (c,i) : ins cs d

main = do
    [file] <- getArgs
    txt <- readFile file
    print $ foldl' ins [] txt
+4
source share
1 answer

ins thunks, . foldl' , . deepseq Control.DeepSeq, .

Data.Map.Strict . , IO 2 , lazy ByteString .

Bellow :

import System.Environment (getArgs)
import Data.Map.Strict (empty, alter)
import qualified Data.ByteString.Lazy.Char8 as B

main :: IO ()
main = getArgs >>= B.readFile . head >>= print . B.foldl' go empty
  where
  go = flip $ alter inc
  inc :: Maybe Int -> Maybe Int
  inc Nothing  = Just 1
  inc (Just i) = Just $ i + 1
+7

Source: https://habr.com/ru/post/1664690/


All Articles