How does the ANTLR lexer unambiguously eliminate its rules (or why does my parser create "input mismatch" errors)?

Question

How does the ANTLR lexer unambiguously eliminate its rules (or why does my parser create "input mismatch" errors)?

Note. This is a question with an answering machine , the purpose of which is to provide information on one of the most common errors made by ANTLR users.

When I test this very simple grammar:

grammar KeyValues;

keyValueList: keyValue*;
keyValue: key=IDENTIFIER '=' value=INTEGER ';';

IDENTIFIER: [A-Za-z0-9]+;
INTEGER: [0-9]+;

WS: [ \t\r\n]+ -> skip;

With the following input:

foo = 42;

I end up with the following runtime error:

line 1: 6 mismatch input '42', waiting for INTEGER
line 1: 8 inconsistent input ';' expecting '='

Why does ANTLR not recognize 42as INTEGERin this case?
It must exactly match the pattern [0-9]+.

, INTEGER IDENTIFIER, , , , ?

+5

parsing antlr antlr4 lexer

Lucas Trzesniewski 17 . '17 19:21

1

Lucas Trzesniewski · Accepted Answer · 2017-09-17T19:21:42+0000

ANTLR , , , ( " INTEGER ", ). . , -, , .

, lexer . :

,
(, '='),
lexer , ,

, , ANTLR.

, keyValue: IDENTIFIER '=' INTEGER ';', '=' ';' .

42 INTEGER, IDENTIFIER, IDENTIFIER, : IDENTIFIER '=' IDENTIFIER ';', ' t keyValue. , lexer, , " INTEGER next".

, . :

IDENTIFIER [A-Za-z] [A-Za-z0-9]* ( ). , , , .
INTEGER IDENTIFIER. , , , .
, :
INTEGER IDENTIFIER, INTEGER. id: IDENTIFIER | INTEGER;, IDENTIFIER , keyValue key=id '=' value=INTEGER ';'.

:

grammar LexerPriorityRulesExample;

// Parser rules

randomParserRule: 'foo'; // Implicitly declared token type

// Lexer rules

BAR: 'bar';
IDENTIFIER: [A-Za-z]+;
BAZ: 'baz';

WS: [ \t\r\n]+ -> skip;

:

aaa foo bar baz barz

:

IDENTIFIER 'foo' BAR IDENTIFIER IDENTIFIER EOF

aaa IDENTIFIER
IDENTIFIER , .
foo 'foo'
randomParserRule 'foo', IDENTIFIER.
BAR BAR
BAR, IDENTIFIER .
baz IDENTIFIER
baz, IDENTIFIER. , BAR.
, baz , IDENTIFIER , baz.
barz IDENTIFIER
BAR 3 (BAR), IDENTIFIER 4 . IDENTIFIER , BAR.
EOF ( ) - , .

, . , , .

, 'foo', , lexer. , . , , .

How does the ANTLR lexer unambiguously eliminate its rules (or why does my parser create "input mismatch" errors)?

More articles: