What is an elegant way to drop a routine into a parser?

Question

What is an elegant way to drop a routine into a parser?

Problem

I am translating a non-deterministic parser for self-education, it looks like this:

newtype Parser ab = P { runP :: [a] -> [(b, [a])] }

And I want the opportunity for the subroutine to appear in the form: [a] -> [b] , which takes a character buffer and sends it to the list of results. The trick here is that the subroutine itself is a state-based calculation, and it passes through the states on every successful call (think of it as the ultimate state machine). In particular:

If the routine displays an empty list [] , the parser inserts another char into the buffer and returns it to the routine, which starts again.
If the subroutine displays a nonempty list [b] , the buffer is first cleared, and the parser inserts another char into the buffer, which leads to its subroutine. [b] stored somewhere
Until the evacuation condition is reached, steps 1 and 2 are started again and again. All interim results must be somehow combined.
After reaching the escape condition, the subroutine returns the results of bs back to the parser and combines it with the remaining stream as as follows:
rs = fmap (flip (,) as) bs :: [(b, [a])]

which satisfies runP signature

A function can have this signature: withf :: ([a] -> [b]) -> Parser ab

The important thing is that withf g needs to be a parser, so I can build stronger parsers using <*> . Note that the signature of the function suggests g is a pure function, so it is unlikely to be correct.

Tried solutions

I tried to implement this with various coroutine packages, but it is more reasonable for me to use the lift parser in the context of computing a coroutine, constructing it with a converter that also goes up into the context, which means that it is no longer a parser.

I also tried to implement withf as a primitive function that would have access to the constructor of the Parser value. Basically translating steps 1..4 into code. The biggest problem I have here is who is responsible for what information:

buffer status
status of intermediate results
the logic of combining intermediate results.
how the exception condition should be met, or even better written in withf

I also tried various homebrew coroutine implementations, baked directly into the parser (therefore, not using the Parser type defined above), with little success.

Anyone who could point me in the right direction is welcome.

+4

coroutine subroutine parsing haskell

chibro2 Aug 2 '13 at 1:40

source share

2 answers

First, let's define a new data type to present possible analysis results.

 data Step r = Done | Fail | Succ r

The analyzer can either end with Done , indicate unsuccessful parsing with Fail , or successfully return the parsed value of r with Succ r .

We will make our Step data type an instance of Monoid typeclass

 instance Monoid (Step r) where mempty = Done Done `mappend` _ = Done Fail `mappend` x = x Succ r `mappend` _ = Succ r

If our parser is Done , we must terminate immediately. A Fail means that we must check the result of the next Step for possible success. Succ r , of course, means that we have successfully analyzed the value.

Now let's define a type synonym for Parser . He should be able to accumulate the analyzed results ( Writer ) and maintain a clean state, which is an input that has not yet been consumed ( State ).

 {-# LANGUAGE FlexibleContexts #-} import Control.Monad.State import Control.Monad.Writer import Data.List import Data.Foldable type Parser ws = WriterT w (State s) evalParser :: Parser wsr -> s -> w evalParser = evalState . execWriterT

Here is the actual parser

 parser :: (MonadState [s] m, MonadWriter [w] m) => ([s] -> Step [w]) -> m () parser sub = do bufs <- gets inits -- try our subroutine on increasingly long prefixes until we are done, -- or there is nothing left to parse, or we successfully parse something case foldMap sub bufs of Done -> return () Fail -> return () Succ r -> do -- record our parsed result tell r -- remove the parsed result from the state modify (drop $ length r) -- parse some more parser sub

and a simple test case

 test :: String test = evalParser (parse sub) "aabbcdde" where sub "aabb" = Succ "aabb" sub "cdd" = Succ "cdd" sub "e" = Done sub _ = Fail -- test == "aabbcdd"

+2

cdk Aug 2 '13 at 4:15

source share

Petr pudlák · Accepted Answer · 2013-08-02T07:31:07+0000

First use MonadPlus instead of [] in the parser. This will make it more general and simplify the code a bit (we won't have as many nested [] s):

 newtype Parser amb = P { runP :: [a] -> m (b, [a]) }

I suggest you change the signature of your routines. You need:

if the routine requires more input or not, and
save a fortune somewhere.

This can be easily done with this type signature:

 newtype Sub ab = Sub { runSub :: Either (a -> Sub ab) [b] }

A subroutine either produces a result or requests a new input, but also creates a new subroutine. That way, you can save any state you need by passing it to the returned routine. Then the conversion function will look like this:

 withf :: (MonadPlus m) => Sub ab -> Parser amb withf start = P $ f (runSub start) where f (Right bs) xs = msum [ return (b, xs) | b <- bs ] f (Left r) [] = mzero -- No more input, can't proceed. f (Left r) (x:xs) = f (runSub (rx)) xs

Update: Another approach we could take is to understand that the parser is actually a StateT transformer whose state is [a] :

 type Parser amb = StateT [a] mb runP :: (Monad m) => Parser amb -> [a] -> m (b, [a]) runP = runStateT

Indeed, runP exactly runStateT !

Thus, we get a copy of Monad for Parser for free. And now we can divide our task into smaller blocks. First, we create a parser that consumes a single input or does not work:

 receive :: (MonadPlus m) => Parser ama receive = get >>= f where f [] = mzero -- No more input, can't proceed. f (x:xs) = put xs >> return x

and then use it to describe withf :

 withf :: (MonadPlus m) => Sub ab -> Parser amb withf start = f (runSub start) where f (Right bs) = msum (map return bs) f (Left r) = receive >>= f . runSub . r

Note that if m is MonadPlus , then StateT sm is also MonadPlus , so we can use mzero and msum with Parser .

What is an elegant way to drop a routine into a parser?

More articles: