How does backtracking work in peg.js (with examples)?

I defined the following minimal Peg.js grammar:

start = "A1" / "A123" 

which you can try in the sandbox .

I would expect it to match โ€œA1โ€ as well as โ€œA123โ€ (according to my idea of โ€‹โ€‹how the rollback works). But this is not so: the grammar recognizes "A1", but not "A123".

Note I am not looking for advice "to cancel the order of your conditions", as in the related question How to turn a simple grammar into something that works in PEG.js (expected "a" but "found" ). Rather, I am looking to understand the behavior that I see, and why Peg.js backtracking does not apply in this case. For an explanation of why reordering my terms doesn't help, see a More Realistic Example Below.


For a more realistic example, consider parsing units. The grammar must recognize metric units (for example, "m", "mol") with additional prefixes such as "mm", "mmol", as well as non-metric units such as "yr", "week" or "mo".

The following Peg.js grammar will not recognize moles because it receives spontaneous consumption of mo and does not back down. (Changing the order of the terms does not help, or, rather, leads to the recognition of "mo" due to "mole" or "mmol."

 start = nonmetric / metric / prefix metric metric = "mol" / "l" / "m" / "g" nonmetric = "yr" / "mo" / "week" / "day" / "hour" prefix = "m" / "k" / "c" 

I can do a similar thing in Antlr with good success:

 grammar units; start : nonmetric | metric | prefix metric; metric : 'mol' | 'l' | 'm' | 'g'; nonmetric : 'yr' | 'mo' | 'week' | 'day' | 'hour'; prefix : 'm' | 'k' | 'c'; 
+6
source share
2 answers

The problem is the concept of reverse tracking. PEG analyzers do not return, like other recursive descent parsers, or Prolog do. Rather, when you are faced with a choice, the PEG parser will try to use each option until it is successful. Once this succeeds, he will execute it no matter how this rule was invoked.

From Wikipedia article :

Unlike context-free grammars and regular expressions, these operators always behave greedily, consuming as much input as possible and never backtrack.

What you ask in a difficult case is the same as in this question . The answer so far is yes: you have to adjust the rules in PEG grammars to make sure that the longest option is always the first one, even if the result is a slightly uglier grammar.

One way to customize PEG grammars is to use lookaheads (one of the main reasons why images are displayed in PEG):

 start = nonmetric / metric / prefix metric metric = "mol" / "l" / !"mo" "m" / "g" nonmetric = "yr" / !"mol" "mo" / "week" / "day" / "hour" prefix = !("mol"/"mo") "m" / "k" / "c" 
+8
source

This is by design. It is up to you to specify the correct order or rules to be used for matching.

Quote from the original white page :

These tools do not make language syntax design easy, of course. In place to determine whether the two possible alternatives in CFG are possible, PEGs represent language designers with a similar challenge to determining whether the two alternatives in the expression can be reordered without affecting the language. This question is often obvious, but sometimes not, and in general it cannot be unresolved. However, as in the case of detecting ambiguity in CFG, we have the hope of finding automatic algorithms to determine order sensitivity or insensitivity in conservative situations.

In this simple case, PEG.js might be a little smarter and admit that the rules you specified are ambiguous. Probably worth asking the author.

0
source

Source: https://habr.com/ru/post/972290/


All Articles