Lazy I.O.
Lazy IO works like that
readFile :: FilePath -> IO ByteString
where a ByteString guaranteed to be read only by ByteString . For this, we could (almost) write
-- given 'readChunk' which reads a chunk beginning at n readChunk :: FilePath -> Int -> IO (Int, ByteString) readFile fp = readChunks 0 where readChunks n = do (n', chunk) <- readChunk fp n chunks <- readChunks n' return (chunk <> chunks)
but here we note that the readChunks n' I / O readChunks n' is executed until even the partial result available as chunk is returned. This means that we are not at all lazy. To combat this, we use unsafeInterleaveIO
readFile fp = readChunks 0 where readChunks n = do (n', chunk) <- readChunk fp n chunks <- unsafeInterleaveIO (readChunks n') return (chunk <> chunks)
which causes readChunks n' to return immediately, so the IO action will only be executed when this thunk is forced.
This is the dangerous part: with unsafeInterleaveIO we put off a bunch of IO operations at non-deterministic points in the future, which depend on how we consume our ByteString chunks.
Correction of a problem with coroutines
We would like to take the chunk processing step between the call to readChunk and recursion readChunks .
readFileCo :: Monoid a => FilePath -> (ByteString -> IO a) -> IO a readFileCo fp action = readChunks 0 where readChunks n = do (n', chunk) <- readChunk fp n a <- action chunk as <- readChunks n' return (a <> as)
Now we have the ability to perform arbitrary IO actions after loading each small fragment. This allows us to do a lot more work gradually, without fully loading the ByteString into memory. Unfortunately, this is not a very complicated composition - we need to build our action consumption and pass it to our producer ByteString to launch it.
IO-based pipes
This is basically what pipes solves - it allows us to easily compose efficient coroutines. For example, we will now write the file reader as a Producer , which can be considered as โstreamingโ pieces of a file when its effect gets to run in the end.
produceFile :: FilePath -> Producer ByteString IO () produceFile fp = produce 0 where produce n = do (n', chunk) <- liftIO (readChunk fp n) yield chunk produce n'
Note the similarities between this code and readFileCo above - we just replace the coroutine action call with the yield chunk we created so far. This call to yield a Producer type instead of the raw IO action that we can compose with other Pipe types to create a convenient consumption pipeline called Effect IO() .
All this channel construction is performed statically, without any IO actions. Here's how pipes allow you to write your coroutines more easily. All effects are triggered immediately when we call runEffect in our main IO action.
runEffect :: Effect IO () -> IO ()
Attoparsec
So why do you want to connect attoparsec to pipes ? Well, attoparsec optimized for lazy parsing. If you produce pieces served to an attoparsec parser in an effectful manner, then you will be at a dead end. Could you
- Use strict I / O and load the entire string into memory only for lazy use by your analyzer. It is simple, predictable, but inefficient.
- Use lazy I / O and lose the ability to reason about when your I / O production effects will actually be triggered, causing possible resource leaks or closed-handle exceptions according to the consumption schedule of the items analyzed. This is more efficient than (1), but can easily become unpredictable; or,
- Use
pipes (or conduit ) to create a coroutine system that includes your lazy attoparsec analyzer allowing it to work with the minimum necessary input, while at the same time generating the analyzed values โโas lazily as possible throughout the stream.