When you create a lexer and parser from your grammar, the following error is displayed on your console:
error (211): CoffeeScript.g: 52: 3: [fatal] the rule expression has a non-LL (*) solution due to recursive calls to the rules available from alts 1,3. Solve using left factoring or using syntactic predicates or using the backtrack = true parameter.
warning (200): CoffeeScript.g: 52: 3: The solution can match the input, for example, "{NUMBER, STRING}", using several alternatives: 1, 3
As a result, alternative 3 was disabled for this input.
(I emphasized the important bits)
This is only the first error, but you start with the first and get a little luck, errors below this first will also disappear if you correct the first.
The above error means that when you try to parse either NUMBER
or STRING
with the parser generated from your grammar, the parser can go in two ways when it ends in the expression
rule
expression
: value // choice 1
| assign // choice 2
| operation // choice 3
;
Namely, choice 1 and choice 3 can both parse a NUMBER
or STRING
, as you can see from the βpathsβ that the parser can follow to correspond to these two options:
choice 1:
expression
value
literal
alphaNumeric: {NUMBER, STRING}
choice 3:
expression
operation
logicOp
relationOp
shiftOp
additiveOp
mathOp
questionOp
term
value
literal
alphaNumeric: {NUMBER, STRING}
In the last part of the warning, ANTLR informs you that it ignores choice 3 whenever either NUMBER
or STRING
parsed, forcing choice 1 to match that input (since it is defined before choice 3).
Thus, either CoffeeScript grammar is ambiguous in this regard (and somehow removes this ambiguity), or your implementation is wrong (I assume the latter :)). You need to correct this ambiguity in your grammar: i. Do not allow the choice of expression
1 and 3 to match the same input.
I noticed 3 more things in your grammar:
1
Take the following lexer rules:
NEW: 'new';
...
UNARY: '!' | '~' | NEW
Remember that the UNARY
token UNARY
never match the text 'new'
, since the NEW
token is defined before it. If you want to enable UNARY
macth, remove the NEW
rule and run:
UNARY: '!' | '~' | 'new';
2
In cases where you collect several types of tokens in one, for example LOGIC
:
LOGIC: '&&' | '||';
and then you use this token in the parser rules, for example:
logicOp
: compareOp (LOGIC compareOp) *
;
But if you are going to evaluate such an expression at a later stage, you do not know what corresponds to this LOGIC
token ( '&&'
or '||'
), and you will need to check the internal text token to find out. You better do something like this (at least if you do some kind of evaluation at a later stage):
AND: '&&';
OR: '||';
...
logicOp
: compareOp (AND compareOp // easier to evaluate, you know it an AND expression
| OR compareOp // easier to evaluate, you know it an OR expression
) *
;
3
Are you missing spaces (and no tabs?) With:
WS: ('') + {skip ();};
but does CoffeeScript code make it a code block with spaces (and tabs), like Python? But maybe you will do it at a later stage?
I just saw that the grammar you are looking at is a jison grammar (which more or less represents an implementation of the bison in JavaScript). But the bison, and hence jison, generates LR parsers , while ANTLR generates LL parsers . Therefore, an attempt to approach the rules of the original grammar will only lead to big problems.