Eliminate left recursion on this PEG.js grammar

Question

Eliminate left recursion on this PEG.js grammar

(Note: I read other questions, such as, but I could not figure it out).

I wrote this grammar:

start = call ident = [az]+ spaces = [ ]+ call = f:ident spaces g:(call / ident) { return f + "(" + g + ")"; }

With this input

 abcd

he returns

 "a(b(c(d)))"

I want too

 "a(b)(c)(d)"

I think this abandoned recursive rule may give me something similar, but PEG.js does not support left recursion.

 call = f:(call / ident) spaces g:ident { return f + "(" + g + ")"; }

How can I remove left recursion in this case?

PS: You can check it out for an online demo of PEG.js

+4

parsing grammar peg pegjs

user1527166 Nov 19 '12 at 15:01

source share

3 answers

user1201210 · Answer 1 · 2012-11-23T03:26:05+0000

Good question. Start with the branch of your first ident from everything else, as it receives a special treatment (without parentheses). Then set aside the rule for processing spaces ident recursion, which will collect the values that go in parentheses. The loop wraps the ident text and adds any new text that is collected recursively.

Here is a short version of the rules (note that tail is a separate rule):

 head: ident tail?; //the "head" ident is separated tail: spaces ident tail?; //each "tail" ident is looped over

Here is the PEG script:

 start = head ident = [az]+ spaces = [ ]+ head = head:ident tail:tail? { return head + tail; } tail = spaces next:ident tail:tail? { return "(" + next + ")" + tail }

Edit: Here is an alternative that does the work in one rule and is more like yours.

 start = head ident = [az]+ spaces = [ ]+ head = head:ident tail:(spaces next:ident{return "(" + next + ")" })* { return head + tail.join("") }

The output for abcd is "a(b)(c)(d)" for both scripts.

Wolftivy · Answer 2 · 2013-11-13T07:42:29+0000

If I understand correctly, your problem is not left behind by recursion, this is the structure of the parsing tree.

You correctly eliminated left recursion, but unfortunately the only way to get rid of left recursion is to exclude left recursion in the raw parsing tree. Most of the theory for this material comes down to the correct set of lines. You are still consistent with the same set of rows, so the theory is happy, but you need a left recursive parse tree. Read more about this issue on wikipedia .

AFAIK, you cannot get the original output of the PEG parser so that it remains recursive. However, you can do whatever you want with the exit. So, parse it as an array and then postprocess to give it a nice left structure.

Doing this with a simplified (no spaces, no multi-valued identifiers) grammar:

  start = call id = [az] call = arr:id+ { var acc = arr[0] for (i = 1; i < arr.length; i++) { acc = [acc, arr[i]] } return acc; }

This parses abcd to [ [ [ 'a', 'b' ], 'c' ], 'd' ] . I just used + instead of recursion, and then ran through the resulting array, building the structure we needed. Wikipedia has a few notes on how to leave recursion using PEG .

Suppose you need a data structure. If you just want to use parens, replace the action with the following:

 var acc = arr[0] for (i = 1; i < arr.length; i++) { acc = acc + '(' + arr[i] + ')' } return acc;

Which gives a(b)(c)(d) .

To return spaces and a multi-line identifier, you can do this:

  start = call id = [az]+ _ = [ ]+ call = a:id as:arg* { arr = [a].concat(as) var acc = arr[0] for (i = 1; i < arr.length; i++) { acc = acc + '(' + arr[i] + ')' } return acc; } arg = _ a:id {return a}

Boriel · Answer 3 · 2016-09-01T11:30:59+0000

You can reformat the call without terminals and put its repeating part in a split rule using the + operator, as follows:

 start = call ident = i:[az]+ { return i.join(''); } spaces = [ ]+ call = f:ident g:args+ { return f + g.join(''); } args = spaces a:ident { return "(" + a + ")"; }

Eliminate left recursion on this PEG.js grammar

More articles: