Fighting applicative analysis

So, I'm going to write a complex parser using only the applicative one (the parser in question does not even implement Monad at all).

For trivial parsers, this is pretty simple. For non-trivial ... not so much. The applicative interface seems to force you to write everything in style without limits. This is hard to handle.

Consider, for example:

call = do n <- name char '(' trim as <- sepBy argument (char ',' >> trim) char ')' trim char '=' r <- result return $ Call {name = n, args = as, result = r} 

Now try writing that using applicative:

 call = (\ n _ _ as _ _ _ _ r -> Call {name = n, args = as, result = r}) <$> name <*> char '(' <*> trim <*> sepBy argument (const const () <$> char ',' <*> trim) <*> char ')' <*> trim <*> char '=' <*> trim <*> result 

The applicator made me put the bind variables very far from the actual parser. (For example, try to confirm that as actually attached to the sepBy argument ... it’s not at all easy to verify that I didn’t get the wrong number of _ patterns!)

Another very unintuitive thing is that <*> applies the function to the value, but *> and <* are just pure sequencing. It took ages to wrap my mind. The names of the different methods would make it much, much clearer. (But, apparently, Monad grabbed >> and << , unfortunately.) It seems that they can be folded, which gives things like

 exit = "EXIT:" *> trim *> name <* char '(' <* trim <* char ')' <* trim 

It is not obvious that you can do this. And, for me, this code is really not terribly readable. More importantly, I still do not understand how you collect multiple values ​​while discarding several other values.

All in all, I'm willing so that I can just use the notation! I really don't need to change effects based on previous results; I do not need the strength of the Monad. But the designation is so readable. (I keep wondering if it is really possible to implement this, can you syntactically say when a particular do block can be mechanically converted to applicative?)

Does anyone know about this? In particular, how can I move variable bindings closer to the parser to which they are bound?

+5
source share
3 answers

I don’t really have a solution to the problem, but maybe some kind of intuition can help you simplify the creation of applicative parsers. When it comes to application, there are two types of “sequences” to consider:

  • Sequencing parsing operations: this is what determines the parser writing order.
  • Base Values ​​Sequence: This one is more flexible as you can combine them in any order you like.

When the two sequences correspond well to each other, the result is a very nice and compact representation of the parser in applicative notation. For instance:

 data Infix = Infix Double Operator Double infix = Infix <$> number <*> operator <*> number 

The problem is that when the sequence does not match exactly, you have to massage the base values ​​so that things can work (you cannot change the order of the parsers):

 number = f <$> sign <*> decimal <*> exponent where f sign decimal exponent = sign * decimal * 10 ^^ exponent 

Here, to calculate the number, you need to make a somewhat nontrivial combination of operations that is performed by the local function f .

Another typical situation is that you need to drop some value:

 exponent = oneOf "eE" *> integer 

Here *> discards the value on the left, keeping the value on the right. The <* operator does the opposite, discarding the right and retaining the left. When you have a chain of such operations, you should decode them using left associativity:

 p1 *> p2 <* p3 *> p4 <* p5 ≡ (((p1 *> p2) <* p3) *> p4) <* p5 

This is artificially far-fetched: you do not want to do this at all. It is better to break the expression into meaningful parts (and preferably give meaningful names). One common template that you will see:

 -- discard the result of everything except `p3` p1 *> p2 *> p3 <* p4 <* p5 

However, there is a small caveat if you want to apply something else to p3 or if p3 consists of several parts, you will have to use parentheses:

 -- applying a pure function f <$> (p1 *> p2 *> p3 <* p4 <* p5) ≡ p1 *> p2 *> (f <$> p3) <* p4 <* p5 -- p3 consists of multiple parts p1 *> p2 *> (p3' <*> p3'') <* p4 <* p5) 

Again, in these situations, it is often better to simply split the expression into meaningful fragments with names.

Attributive notation in a sense forces you to divide parsers into logical fragments so that they are easier to read, unlike monadic notations, where you could do everything in one monolithic block.

+4
source

Well, your sample analyzer is artificially complicated.

There are many trim entries that you can abstract from:

 token p = p <* trim 

You can also ignore what occurs between a pair of matching parentheses:

 parens p = token (char '(') *> p <* token (char ')') 

Now what remains:

 call = (\ n as _ r -> Call {name = n, args = as, result = r}) <$> name <*> parens (sepBy argument (() <$ token (char ','))) <*> token (char '=') <*> result 

Finally, you should not consider occurrences of _ , but rather learn to use <$ and <* . Here are some useful rules:

  • Use *> in the combination foo *> p <* bar , for example, in parens above, nowhere else.

  • Make your parsers the form f <$> p1 <*> ... <*> pn , and now choose between <$> and <$ in the first position or between <*> and <* in all other positions, purely to the question about You are interested in the result of the subsequent analyzer. If so, use option c > otherwise use one that is missing. Then you never need to ignore any arguments in f , because you don't even get access to them. In the above example, only token = remains, which is not interesting to us, so we can say

     call = Call <$> name <*> parens (sepBy argument (() <$ token (char ','))) <* token (char '=') <*> result 

(It is assumed that Call actually accepts only these three arguments.) I would say that this version is easier to read than the original do based version.

To answer your more general question: yes, you can recognize do -states that do not need the power of monads. Simply put, these are those that are just a sequence of bindings with return at the very end, and all the associated variables are used only in the final return and nowhere else. There is a suggestion to add this to the GHC. (Personally, however, I am not a big fan of this. I believe that the applicative notation is more functional than the notation.)

+9
source

Write less parsers. For example, your arguments look like (argument[, argument…]) . It is easy to express through

 argListP :: Parser [Argument] argListP = char '(' *> trim *> argument `sepBy` (char ',' *> trim) <* char ')' 

which is still quite readable: a '(' followed by spaces, arguments separated by a comma and a space, and a trailing ')'. The same can be done for result :

 resultP :: Parser Result resultP = trim *> char '=' *> result 

As you can see, this is still readable: arbitrary spaces followed by an equal sign and some kind of result. Now call almost trivial:

 call :: Parser Call call = Call <$> name <*> argListP <*> resultP 
+4
source

Source: https://habr.com/ru/post/1209226/


All Articles