Left recursion analysis

Description:

When reading the compiler construct in C , I described the following rules for describing grammar without context:

A grammar that recognizes a list of one or more operators, each of which is an arithmetic expression, followed by a semicolon. Statements consist of a series of expressions with comma delimiters, each of which contains a series of numbers separated by either asterisks (for multiplication) or plus signs (for addition).

And here is the grammar:

1. statements ::= expression; 2. | expression; statements 3. expression ::= expression + term 4. | term 5. term ::= term * factor 6. | factor 7. factor ::= number 8. | (expression) 

The book says that this recursive grammar has a serious problem. The right side of several works appears on the left side, as in work 3 (And this property is called left recursion ), and some parsers, such as a recursive descent parser, cannot process left recursion of a production. They just cling forever.

You can understand the problem by looking at how the parser decides to apply a specific production when it replaces a non-terminal that has more than one right side. A simple case appears in Productions 7 and 8. The parser can choose which production should be applied when it expands by looking at the next input character. If this character is a number, then the compiler applies Production 7 and replaces the coefficient with a number. If the next input character was an open parenthesis, the parser will use Production 8. However, the choice between Productions 5 and 6 cannot be resolved this way. In the case of Product 6, the right-hand side of the term begins with a factor, which in the bundle begins with either a numerical or a left bracket. Therefore, the parser would like to apply Production 6 when the term is replaced and the next input character is replaced by a number or left bracket. Production 5 - the other right side - begins with a term that can start with a factor that can start with a numerical or left bracket, and these are the same characters that were used to select Production 6.


Question:

This second quote from the book completely lost me. Thus, using an example of some operators like (for example) 5 + (7*4) + 14 :

  • What is the difference between factor and term? using the same example
  • Why can't a recursive descent parser process outputs with left recursion? (Explain the second quote).
+6
source share
2 answers
  • What is the difference between factor and term? using the same example

I am not giving the same example as I will not give you a clear idea of ​​what you doubt!

Considering,

 term ::= term * factor | factor factor ::= number | (expression) 

Now suppose, if I ask you to find the factors and terms in the expression 2 * 3 * 4. Now that the multiplication remains associative, it will be evaluated as: -

 (2*3)*4 

As you can see, here (2 * 3) is the term and the coefficient is 4 (number). Likewise, you can extend this approach to any level to draw on an idea of ​​terms.

In accordance with this grammar, if in this expression there is a chain of multiplication, then its part, leaving one factor, is a term that, in turn, gives another part - another term, leaving another one factor and so on. Here's how expressions are evaluated.

  1. Why can't a recursive descent parser process outputs with left recursion? (Explain the second quote).

Your second statement is completely clear in essence. A recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures (or a non-recursive equivalent), where each such procedure usually implements one of the grammar derivatives.

This is said so because it is clear that the recursive descent parser will enter an infinite loop if the nonterminal continues to expand in itself.

Similarly, speaking of a recursive descent parser, even with backtracking --- When we try to expand a nonterminal, we may eventually try to expand the same nonterminal again without consuming any input.

 A-> Ab 

Here, when expanding the non-terminal A, you can continue to expand in

 A-> AAb -> AAAb -> ... -> infinite loop of A. 

Therefore, we prevent left-recursive statements when working with recursive descent parsers.

+3
source
  • The rule coefficient corresponds to the string β€œ1 * 3”, the term of the rule does not match (although it will match β€œ(1 * 3).” In essence, each rule represents one priority level. expression contains the operators with the lowest priority, factor second lowest and term highest: If you use a term and want to use a lower priority operator, you need to add parentheses.

  • If you implement a recursive descent parser using recursive functions, a rule like a ::= b "*" c | d a ::= b "*" c | d can be implemented as follows:

     // Takes the entire input string and the index at which we currently are // Returns the index after the rule was matched or throws an exception // if the rule failed parse_a(input, index) { try { after_b = parse_b(input, index) after_star = parse_string("*", input, after_b) after_c = parse_c(input, after_star) return after_c } catch(ParseFailure) { // If one of the rules b, "*" or c did not match, try d instead return parse_d(input, index) } } 

    Something like this will work fine (in practice, you might not want to use recursive functions, but the approach you would use will still behave the same way). Now consider the left recursive rule a ::= a "*" b | c a ::= a "*" b | c :

     parse_a(input, index) { try { after_a = parse_a(input, index) after_star = parse_string("*", input, after_a) after_b = parse_c(input, after_star) return after_b } catch(ParseFailure) { // If one of the rules a, "*" or b did not match, try c instead return parse_c(input, index) } } 

Now the first thing the parse_a function parse_a is call itself again at the same index. This recursive call will call itself again. And this will continue indefinitely, or rather, until the stack overflows and the entire program crashes. If instead of recursive functions we use a more efficient approach, we get an infinite loop, not a stack overflow. In any case, we do not get the desired result.

+2
source

Source: https://habr.com/ru/post/987670/


All Articles