Infinite recursion in ANTLR grammar

I am writing a simple grammar to recognize some expressions. Here I am posting a simpler version that I wrote just to simplify my explanation. This simpler version can recognize expressions like:

  • this is text
  • [n] this is another text [/ n]
  • [n] [n] this compound Expression [/ p] [/ p]

My problem is when I express the following expression: [i] this should only generate a recognition exception [/ n]

A recognition exception is thrown, but the parser is injected into infinte recursion because it matches "[", but if it does not match "i", it loses itself. I think this is because my text component of the grammar cannot contain square brackets. So, I am posting a grammar.

grammar ErrorTest; expression : rawText EOF | command EOF ; rawText : word+ ; word : ESPACE* TEXT ESPACE* ; command : simpleCommand | compoundCommand ; simpleCommand : HELP ; compoundCommand : rawText | BEGIN compoundCommand END ; HELP : '[help]'; BEGIN : '[n]'; END : '[/n]'; ESPACE : ' '; TEXT : ~(' '|'['|']')*; 

How can I solve it?

+6
source share
2 answers
Word

matches an empty string because in

 word : ESPACE* TEXT ESPACE* ; 

TEXT corresponds to an empty line that calls

 rawText : word+ ; 

for an infinite loop.

Edit

 TEXT : ~(' '|'['|']')*; 

to

 TEXT : ~(' '|'['|']')+; 

that will make your grammar only completely ambiguous.

The way to think about this is that rawText can match an empty string in many ways.

  • Zero Text Tones
  • One TEXT token with a length of 0.
  • Two TEXT tokens with a length of 0.
  • Three TEXT tokens with a length of 0.
  • ...

This manifests itself when you have a syntax error ( [i] ), because it tries each of these alternatives to see if any of them resolve the error.


To get rid of any quadratic behavior, you must make it absolutely unique.

 rawText : ign (word (ign word)*)? ign; ign : ESPACE*; word : TEXT; 

The problem with the naive fix is ​​that rawText can match "foo" several ways:

  • TEXT("foo")
  • TEXT("fo"), ESPACE(""), TEXT("o")
  • TEXT("f"), ESPACE(""), TEXT("oo")
  • TEXT("f"), ESPACE(""), TEXT("o"), ESPACE(""), TEXT("o")
+6
source

Why not do something like this:

 grammar Test; expression : atom+ EOF ; atom : TEXT | ESPACE | command ; command : simpleCommand | compoundCommand ; simpleCommand : HELP ; compoundCommand : BEGIN atom+ END ; HELP : '[help]'; BEGIN : '[n]'; END : '[/n]'; ESPACE : ' '; TEXT : ~(' '|'['|']')+; 

which will look opaque for example

 this is [n][n]a [help][n]compound[/n] expression[/n][/n] 

into the following parse tree:

enter image description here

(click to enlarge)

+1
source

Source: https://habr.com/ru/post/914175/


All Articles