Disambiguate abstract syntax in another to write DCG Prolog parser

Question

Disambiguate abstract syntax in another to write DCG Prolog parser

P => Program K => Block

S => Single team

C => Commands

E => Expression

B => Boolean-expr

I => ID

N> Digit

P :: = K.

K :: = begin C end

C :: = C1; C2 | S

E :: = - E | E1 + E2 | E1 - E2 | E1 E2 | E1 div E2 | E1 mod E2 | (E) | me | N

B :: = E1 = E2 | E1> E2 | E1 <E2 | E1! = E2 | not B | B1 and B2 | B1 or B2 | (IN)

I have to remove the ambiguities in E and B so that I can write the DCG parser in the prolog.

+5

prolog dcg left-recursion

cgadjoro Oct 30 '12 at 0:33

source share

2 answers

@chac already gave you a pretty good answer, showing you the usual way to resolve this.

Let me read your question again: "You must remove the ambiguities in E and B so that" you "can write the DCG parser in Prolog." This means that you only need to eliminate the ambiguity so far that you can write a DCG parser in Prolog. The good news is: you don’t have to remove any ambiguities at all to write a DCG parser! Here's how to do it:

The source of ambiguity is such works as

C :: = C; WITH

or other + operators - matching div mod and

Let me stick with a simplified grammar:

E :: = E + E | "1"

We could encode this as

 e --> "1". e --> e, "+", e.

Unfortunately, Prolog does not end for a request like

 ?- L = "1+1+1", phrase(e,L). L = "1+1+1" ; ERROR: Out of local stack

In fact, it ends, but only because my computer memory is finite ...

Even for:

 ?- L = "1", phrase(e,L). L = "1" ; ERROR: Out of local stack

Is this a problem of ambiguity? No! This is just a Prolog procedural issue that cannot handle left recursions directly. Here is a way to get Prolog to process it:

  e ([_ | S], S) -> "1".
 e ([_ | S0], S) -> e (S0, S1), "+", e (S1, S).

 ? - L = "1 + 1 + 1", phrase (e (L, []), L).
 L = "1 + 1 + 1";
 L = "1 + 1 + 1";
 false

 ? - L = "1", phrase (e (L, []), L).
 L = "1";
 false

At the moment, we have only defined the grammar, most of the time when you are also interested in seeing the corresponding syntax tree:

  e ( integer (1), [_ | S], S) -> "1".
 e ( plus (L, R), [_ | S0], S) -> e ( L, S0, S1), "+", e ( R, S1, S).

 ? - L = "1 + 1 + 1", phrase (e (Tree, L, []), L).
 L = "1 + 1 + 1",
 Tree = plus (integer (1), plus (integer (1), integer (1)));
 L = "1 + 1 + 1",
 Tree = plus (plus (integer (1), integer (1)), integer (1));
 false

Now we see that there is ambiguity with plus ! Your original grammar took this as (1 + 1) +1 and 1+ (1 + 1), which in itself is not a problem if the corresponding semantics ensure that associativity is observed. In most cases, it is ambiguous to be left-associative, which means (1 + 1) +1, but this does not apply to all infix operators.

+3

false Oct 30 '12 at 16:44

source share

Capellic · Accepted Answer · 2012-10-30T08:37:57+0000

Prolog evaluates from top to bottom, then LL grammar methods can be adapted. DCGs are more powerful than LL (1), but residual recursion should still be eliminated.

B easier to handle: the left factor is production.

 B ::= E Bx | not B | (B) Bx ::= = E | > E | < E | != E | and B | or B

E is harder, since the mul token introduces even more ambiguity. Tentatively

 E ::= − E | (E) | I | N | El El ::= Ex E | epsilon Ex ::= + El | − El | div El | mod El

epsilon (empty setting) in DCG can be written this way

 epsilon --> [].

If you need to handle priority and associativity (in both B and E), you will need more work. You can refer to these older workflow answers.

Disambiguate abstract syntax in another to write DCG Prolog parser

More articles: