I have a 300MB file ( link ) with utf-8 characters. I want to write a haskell program equivalent to:
cat bigfile.txt | grep "^en " | wc -l
This works in 2.6 on my system.
Right now, I am reading the file as a regular String (readFile) and have this:
main = do contents <- readFile "bigfile.txt" putStrLn $ show $ length $ lines contents
After a couple of seconds I get this error:
Dictionary.hs: bigfile.txt: hGetContents: invalid argument (Illegal byte sequence)
I assume I need to use something more than utf-8? How can I do this both quickly and utf-8? I read about Data.ByteString.Lazy for speed, but Real World Haskell says it does not support utf-8.
source share