So, I played with several Haskell XML libraries, including hexpat and xml-enumerator. After reading the IO chapter at Real World Haskell (http://book.realworldhaskell.org/read/io.html), I got the impression that if I ran the following code, it would be garbage collected when I go through it.
However, when I run it in a large file, memory usage continues to increase when it starts.
runghc parse.hs bigfile.xml
What am I doing wrong? Am I mistaken in my assumption? Does the card / filter display it all?
import qualified Data.ByteString.Lazy as BSL import qualified Data.ByteString.Lazy.UTF8 as U import Prelude hiding (readFile) import Text.XML.Expat.SAX import System.Environment (getArgs) main :: IO () main = do args <- getArgs contents <- BSL.readFile (head args) -- putStrLn $ U.toString contents let events = parse defaultParseOptions contents mapM_ print $ map getTMSId $ filter isEvent events isEvent :: SAXEvent String String -> Bool isEvent (StartElement "event" as) = True isEvent _ = False getTMSId :: SAXEvent String String -> Maybe String getTMSId (StartElement _ as) = lookup "TMSId" as
My ultimate goal is to parse a huge XML file using a simple saxophone interface. I do not want to know the whole structure in order to be notified that I have found an "event".
source share