As many anyChar
pointed out, the problem. But not only in prose
, but also in code
. The problem with code
is that content <- many anyChar
will consume everything: newlines and the \end{code}
tag.
So, you need to somehow tell prose and code. An easy (but perhaps too naive) way to do this is to look for a backslash:
literateFile = many codeOrProse <* eof code = do string "\\begin{code}" content <- many $ noneOf "\\" string "\\end{code}" return $ Haskell content prose = do content <- many1 $ noneOf "\\" return $ Text content
Now you are not completely getting the desired result, because part of Haskell
will also contain newline characters, but you can easily filter them (taking into account the filterNewlines
function, filterNewlines
can say `content <- filterNewlines <$> (many $ noneOf "\\")
) .
Edit
Ok, I think I found a solution (requires a new version of Parsec, due to lookAhead
):
import Text.ParserCombinators.Parsec import Control.Applicative hiding (many, (<|>)) main = do contents <- readFile "hello.lhs" let results = parseLiterate contents print results data Element = Text String | Haskell String deriving (Show) parseLiterate :: String -> Either ParseError [Element] parseLiterate input = parse literateFile "" input literateFile = many codeOrProse codeOrProse = code <|> prose code = do string "\\begin{code}\n" c <- untilP (string "\\end{code}\n") string "\\end{code}\n" return $ Haskell c prose = do t <- untilP $ (string "\\begin{code}\n") <|> (eof >> return "") return $ Text t untilP p = do s <- many $ noneOf "\n" newline s' <- try (lookAhead p >> return "") <|> untilP p return $ s ++ s'
untilP p
parses the string, then checks if the beginning of the next string can be parsed successfully with p
. If so, it returns an empty string, otherwise it will continue. lookAhead
required, because otherwise the begin \ end tags will be used and code
will not be able to recognize them.
I suppose that it could still be made more concise (that is, you don't need to repeat the string "\\end{code}\n"
inside the code
).
source share