Choosing multiple valid parsers at one input

Question

Choosing multiple valid parsers at one input

I would like to know a better way to parse input when multiple parsers can succeed. I set out my first unsuccessful attempt and an inelegant solution, which, I hope, may be more idiomatic.

For example, I would like to use lex "the", "quick" and "fox" from the following sentence in my own data constructors:

"the quick brown fox jumps over the lazy dog".

Therefore, the following type constructors are specified:

 data InterestingWord = Quick | The | Fox deriving Show data Snippet = Word InterestingWord | Rest String deriving Show

I would like the result of the parsing to be:

 [Word The, Rest " ", Word Quick, Rest " brown ", Word Fox, Rest " jumped over ", Word The, Rest " lazy dog"]

Here are two solutions:

 import Text.Parsec import Data.Maybe import Data.Ord import Data.List data InterestingWord = Quick | The | Fox deriving Show data Snippet = Word InterestingWord | Rest String deriving Show testCase = "the quick brown fox jumped over the lazy dog" -- Expected output: -- [Word The, -- Rest " ", Word Quick, -- Rest " brown ", Word Fox, -- Rest " jumped over ", Word The, -- Rest " lazy dog"] toString Quick = "quick" toString The = "the" toString Fox = "fox" -- First attempt -- Return characters upto the intended word along -- with the word itself upto word = do pre <- manyTill anyChar $ lookAhead $ string (toString word) word' <- try $ string (toString word) return [Rest pre, Word word] -- Parsers for the interesting words parsers = [upto Quick, upto The, upto Fox] -- Try each parser and return its results with the -- rest of the input. -- An incorrect result is produced because "choice" -- picks the first successful parse result. wordParser = do snippets <- many $ try $ choice parsers leftOver <- many anyChar return $ concat $ snippets ++ [[Rest leftOver]] -- [Rest "the ",Word Quick,Rest " brown fox jumped over the lazy dog"] test1 = parseTest wordParser testCase -- Correct -- In addition to the characters leading upto the -- word and the word, the position is also returned upto' word = do result <- upto word pos <- getPosition return (pos, result) -- The new parsers parsers' = [upto' Quick, upto' The, upto' Fox] -- Try each of the given parsers and -- possibly returning the results and -- the parser but don't consume -- input. tryAll = mapM (\p -> do r <- optionMaybe $ try (lookAhead p) case r of Just result -> return $ Just (p, result) Nothing -> return $ Nothing ) -- Pick the parser that has consumed the least. firstSuccess ps = do successes <- tryAll ps >>= return . catMaybes if not (null successes) then return $ Just (fst $ head (sortBy (comparing (\(_,(pos,_)) -> pos)) successes)) else return $ Nothing -- Return the parse results for the parser that -- has consumed the least wordParser' = do parser <- firstSuccess parsers' case parser of Just p -> do (_,snippet) <- p return snippet Nothing -> parserZero -- Returns the right result test2 = parseTest (many wordParser' >>= return . concat) testCase

The first attempt, "test1" does not produce the desired result, because the "choice" returns the first parser that succeeds when I really want it, this is the first parser that succeeds when consuming the smallest characters. This is what I am trying to do next, holding the source position when the input was parsed, and using the parser with the smallest initial position.

This case seems widespread enough that I feel that I am missing some obvious combinator spell. Can anyone suggest the best deals?

Thanks!

-deech

+4

haskell parsec

Deech Feb 10 '12 at 17:30

source share

2 answers

As I understand it, you want to parse until the first interesting word that you see. At the moment, you understand every interesting word and check what interesting word you found closer.

My suggestion is to define a parser that checks if you have an interesting word (only one of the options can be successful, so there is no need to do anything more interesting than a simple choice). Then you move forward until the first parser completes successfully, which happens when you come across an interesting word. This gives you the first interesting word, because you know nothing before it contains any interesting word.

Yes, this does not answer the question of determining which parser match is the shortest. This wraps around this question, giving a solution to your real problem that doesn't care about which parser matches the shortest.

0

Retief Feb 10 '12 at 21:33

source share

Louis Wasserman · Accepted Answer · 2012-02-10T18:42:40+0000

This is not a particularly common need, but here is the implementation:

 import Control.Monad import "parsec3" Text.Parsec import Data.Maybe import Data.List import Data.Ord longestParse :: [Parsec String () a] -> Parsec String () a longestParse parsers = do allParses <- sequence [lookAhead $ optionMaybe $ try $ liftM2 (,) parse getPosition | parse <- parsers] -- allParses :: [Maybe (a, SourcePos)] (bestParse, bestPos) <- case catMaybes allParses of [] -> fail "No valid parse" -- maybe we can do something better? successfulParses -> return $ minimumBy (comparing snd) successfulParses setPosition bestPos return bestParse

Choosing multiple valid parsers at one input

More articles: