Antlr error: the following token definition can never be matched since previous tokens match the same input

Question

Antlr error: the following token definition can never be matched since previous tokens match the same input

I'm writhing in plain language with antlr, I defined Lexer grammar in AntlrWorks, but when I want to generate java code, it gives me an error:

Antlr error: the following token definition can never be matched because previous tokens correspond to the same input: FLOAT_OR_INT, OPEN_PAR, CLOSE_PAR, .... (for almost all rules!)

I am new to antlr, I assume this is due to the order of the rules, but I don’t know how they should be, what is my mistake?

here is the grammar:

lexer grammar OurCompiler; options { k=5; } ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ; protected INT : ('0'..'9')+ ; protected FLOAT : INT '.' INT ; FLOAT_OR_INT : ( INT '.' ) => FLOAT { $setType(FLOAT); } | INT { $setType(INT); } ; OPENPAR_OR_OUTPUT_OPERATOR : '(' { $setType(OPEN_PAR); } | '(' '(' { $setType(OUTPUT_OPERATOR); } ; CLOSEPAR_OR_INPUT_OPERATOR : ')' { $setType(CLOSE_PAR); } | ')' ')' { $setType(INPUT_OPERATOR); } ; protected OPEN_PAR : '(' ; protected CLOSE_PAR : ')' ; protected INPUT_OPERATOR : ')' ')' ; protected OUTPUT_OPERATOR : '(' '(' ; BOOLEAN : 't' 'r' 'u' 'e' | 'f' 'a' 'l' 's' 'e' ; LOWER : '<' ; LOWER_EQUAL : LOWER '=' ; UPPER : '>' ; UPPER_EQUAL : UPPER '=' ; ASSIGN : '=' ; EQUAL : '=' '=' ; NOT : '!' ; NOT_EQUAL : NOT '=' ; ADD : '+' ; ADD_TO_PREVIOUS : ADD '=' ; INCREMENT : ADD ADD ; MINUS : '-' ; MINUS_FROM_PREVIOUS : MINUS '=' ; DECREMENT : MINUS MINUS ; MULTIPLY : '*' ; MULTIPLY_TO_PREVIOUS : MULTIPLY '=' ; DIVIDE : '/' ; DIVIDE_FROM_PREVIOUS : DIVIDE '=' ; MODE : '%' ; OPEN_BRAKET : '[' ; CLOSE_BRAKET : ']' ; OPEN_BRACE : '{' ; CLOSE_BRACE : '}' ; COLON : ':' ; SEMICOLON : ';' ; COMMA : ',' ; SINGLE_LINE_COMMENT : '#' '#' ( ~ ('\n'|'\r') )* ( '\n' | '\r' ('\n')? )? { $setType(Token.SKIP); newline(); } ; MULTIPLE_LINE_COMMENT : '#' ( options {greedy=false;} : . )* '#' { $setType(Token.SKIP); } ; WS : ( ' ' | '\t' | '\r' { newline(); } | '\n' { newline(); } ) { $setType(Token.SKIP); } ; protected ESC_SEQ : '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\') ; STRING : '"' ( ESC_SEQ | ~('\\'|'"') )* '"' ; CHAR : '\'' ( ESC_SEQ | ~('\''|'\\') ) '\'' ; INT_KEYWORD : 'i' 'n' 't' ; FLOAT_KEYWORD : 'f' 'l' 'o' 'a' 't' ; CHAR_KEYWORD : 'c' 'h' 'a' 'r' ; STRING_KEYWORD : 's' 't' 'r' 'i' 'n' 'g' ; BOOLEAN_KEYWORD : 'b' 'o' 'o' 'l' 'e' 'a' 'n' ; INPUT_KEYWORD : 'i' 'n' ID { $setType(ID); } | 'i' 'n' ; OUTPUT_KEYWORD : 'o' 'u' 't' ID { $setType(ID); } | 'o' 'u' 't' ; IF_KEYWORD : 'i' 'f' ; FOR_KEYWORD : 'f' 'o' 'r' ; SWITCH_KEYWORD : 's' 'w' 'i' 't' 'c' 'h' ; CASE_KEYWORD : 'c' 'a' 's' 'e' ; BREAK_KEYWORD : 'b' 'r' 'e' 'a' 'k' ; DEFAULT_KEYWORD : 'd' 'e' 'f' 'a' 'u' 'l' 't' ; WHILE_KEYWORD : 'w' 'h' 'i' 'l' 'e' ; ELSE_KEYWORD : 'e' 'l' 's' 'e' ; ELSEIF_KEYWORD : 'e' 'l' 's' 'e' 'i' 'f' ; AND_KEYWORD : 'a' 'n' 'd' ; OR_KEYWORD : 'o' 'r' ; NOT_KEYWORD : 'n' 'o' 't' ; CONSTANT_KEYWORD : 'c' 'o' 'n' 's' 't' 'a' 'n' 't' ;

+4

antlr

nafiseh Feb 14 '12 at 7:29

source share

1 answer

Bart kiers · Accepted Answer · 2012-02-14T08:59:50+0000

I have 7 comments about your grammar by looking at it:

one

k=? denotes the appearance of the parser rules, and since your is a lexical grammar, delete it;

2

Although not so, BOOLEAN_KEYWORD : 'b' 'o' 'o' 'l' 'e' 'a' 'n'; quite verbose. Instead, do BOOLEAN_KEYWORD : 'boolean'; .

3

The protected keyword has changed to fragment in ANTLR 3. But you do strange things. Follow these rules:

 fragment INT : ('0'..'9')+ ; fragment FLOAT : INT '.' INT ; FLOAT_OR_INT : ( INT '.' ) => FLOAT { $setType(FLOAT); } | INT { $setType(INT); } ;

You create two fragments, and then FLOAT_OR_INT check the predicate if it "sees" INT , and then '.' and then changes it to FLOAT . The following does the same thing and is much more readable / better / preferable:

 FLOAT : DIGIT+ '.' DIGIT+ ; INT : DIGIT+ ; fragment DIGIT : '0'..'9' ;

4

.* does not matter by default, so change:

 '#' ( options {greedy=false;} : . )* '#'

in

 '#' .* '#'

or even better:

 '#' ~'#'+ '#'

5

Rule:

 OPENPAR_OR_OUTPUT_OPERATOR : '(' { $setType(OPEN_PAR); } | '(' '(' { $setType(OUTPUT_OPERATOR); } ;

should be simple:

 OUTPUT_OPERATOR : '((' ; OPEN_PAR : '(' ;

6

ANTLR lexer tries to match as many characters as possible. Whenever two rules correspond to the same number of characters, the rule defined by firs will “win”. That is why you must define all your *_KEYWORD rules before the rule ID .

7

Finally, you don’t need to check if "in" or "out" should be checked using ID (and then change the type of token). Whenever a lexer "sees" an input as "inside" , it always creates one ID token, not INPUT_KEYWORD , followed by an ID , since the lexer matches as much as possible (see Note # 6).

It seems you are trying to learn ANTLR by trial and error, or using outdated documentation. This is not a way to learn ANTLR. Try to get the Parr The Definitive ANTLR Reference to find out correctly.

Good luck

EDIT

Well, if you can't get it to work, here is the working version of your grammar:

 lexer grammar OurCompiler; // A bit of an odd name for a lexer... K_INT : 'int'; K_FLOAT : 'float'; K_CHAR : 'char'; K_STRING : 'string'; K_BOOLEAN : 'boolean'; K_INPUT : 'in'; K_OUTPUT : 'out'; K_IF : 'if'; K_FOR : 'for'; K_SWITCH : 'switch'; K_CASE : 'case'; K_BREAK : 'break'; K_DEFAULT : 'default'; K_WHILE : 'while'; K_ELSE : 'else'; K_ELSEIF : 'elseif'; K_AND : 'and'; K_OR : 'or'; K_NOT : 'not'; K_CONSTANT : 'constant'; BOOLEAN : 'true' | 'false'; FLOAT : DIGIT+ '.' DIGIT+; INT : DIGIT+; STRING : '"' ( ESC_SEQ | ~('\\'|'"') )* '"'; CHAR : '\'' ( ESC_SEQ | ~('\''|'\\') ) '\''; ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*; INPUT_OPERATOR : '))'; OUTPUT_OPERATOR : '(('; OPEN_PAR : '('; CLOSE_PAR : ')'; LOWER : '<'; LOWER_EQUAL : '<='; UPPER : '>'; UPPER_EQUAL : '>='; ASSIGN : '='; EQUAL : '=='; NOT : '!'; NOT_EQUAL : '!='; ADD : '+'; ADD_TO_PREVIOUS : '+='; INCREMENT : '++'; MINUS : '-'; MINUS_FROM_PREVIOUS : '-='; DECREMENT : '--'; MULTIPLY : '*'; MULTIPLY_TO_PREVIOUS : '*='; DIVIDE : '/'; DIVIDE_FROM_PREVIOUS : '/='; MODE : '%'; OPEN_BRAKET : '['; CLOSE_BRAKET : ']'; OPEN_BRACE : '{'; CLOSE_BRACE : '}'; COLON : ':'; SEMICOLON : ';'; COMMA : ','; SINGLE_LINE_COMMENT : '##' ~('\r' | '\n')* {skip();}; MULTIPLE_LINE_COMMENT : '#' ~'#'+ '#' {skip();}; WS : ( ' ' | '\t' | '\r' | '\n') {skip();}; fragment ESC_SEQ : '\\' ('b' | 't' | 'n' | 'f' | 'r' | '\"' | '\'' | '\\'); fragment DIGIT : '0'..'9';

Antlr error: the following token definition can never be matched since previous tokens match the same input

one

2

3

4

5

6

7

EDIT

More articles: