Consider the following simple grammar of a CSV document (in ABNF):
csv = *crow crow = *(ccell ',') ccell CR ccell = "'" *(ALPHA / DIGIT) "'"
We want to write a converter that converts this grammar to TSV (data separated by tables):
tsv = *trow trow = *(tcell HTAB) tcell CR tcell = DQUOTE *(ALPHA / DIGIT) DQUOTE
First of all, create a type of algebraic data that describes our abstract syntax tree. Typical synonyms are included to facilitate understanding:
data XSV = [Row] type Row = [Cell] type Cell = String
Writing a parser for this grammar is pretty simple. We write the parser as if we were describing ABNF:
csv :: Parser XSV csv = XSV <$> many crow crow :: Parser Row crow = do cells <- ccell `sepBy` (char ',') newline return cells ccell :: Parser Cell ccell = do char '\'' content <- many (digit <|> letter) char '\'' return content
This analyzer uses do
notation. A do
is followed by a sequence of instructions. For parsers, these statements are simply different parsers. To bind the result of the analyzer, you can use <-
. This way you create a large parser by combining several smaller parsers. To get interesting effects, you can also combine the parser using special combinators (for example, a <|> b
, which analyzes either a
, b
or many a
, which analyzes as much as a
). Remember that Parsec is not returned by default. If the parser can fail after consuming characters, add try
to enable reverse lookup for a single instance. try
slows down parsing.
The result is a csv
parser that parses our CSV document in an abstract syntax tree. Now it's easy to turn this into another language (e.g. TSV):
xsvToTSV :: XSV -> String xsvToTSV xst = unlines (map toLines xst) where toLines = intersperse '\t'
By connecting these two things, you get a conversion function:
csvToTSV :: String -> Maybe String csvToTSV document = case parse csv "" document of Left _ -> Nothing Right xsv -> xsvToTSV xsv
And it's all! Parsec has many other features for creating extremely complex parsers. Real World Haskell has a good chapter on parsers, but it's a bit outdated. However, most of them are still true. If you have further questions, feel free to ask.