How does the ANTLR lexer unambiguously eliminate its rules (or why does my parser create "input mismatch" errors)?

Note. This is a question with an answering machine , the purpose of which is to provide information on one of the most common errors made by ANTLR users.


When I test this very simple grammar:

grammar KeyValues;

keyValueList: keyValue*;
keyValue: key=IDENTIFIER '=' value=INTEGER ';';

IDENTIFIER: [A-Za-z0-9]+;
INTEGER: [0-9]+;

WS: [ \t\r\n]+ -> skip;

With the following input:

foo = 42;

I end up with the following runtime error:

line 1: 6 mismatch input '42', waiting for INTEGER
line 1: 8 inconsistent input ';' expecting '='

Why does ANTLR not recognize 42as INTEGERin this case?
It must exactly match the pattern [0-9]+.

, INTEGER IDENTIFIER, , , , ?

+5
1

ANTLR , , , ( " INTEGER ", ). . , -, , .

, lexer . :

  • ,
  • (, '='),
  • lexer , ,

, , ANTLR.


, keyValue: IDENTIFIER '=' INTEGER ';', '=' ';' .

42 INTEGER, IDENTIFIER, IDENTIFIER, : IDENTIFIER '=' IDENTIFIER ';', ' t keyValue. , lexer, , " INTEGER next".

, . :

  • IDENTIFIER [A-Za-z] [A-Za-z0-9]* ( ). , , , .
  • INTEGER IDENTIFIER. , , , .
  • , :
    INTEGER IDENTIFIER, INTEGER. id: IDENTIFIER | INTEGER;, IDENTIFIER , keyValue key=id '=' value=INTEGER ';'.

:

:

grammar LexerPriorityRulesExample;

// Parser rules

randomParserRule: 'foo'; // Implicitly declared token type

// Lexer rules

BAR: 'bar';
IDENTIFIER: [A-Za-z]+;
BAZ: 'baz';

WS: [ \t\r\n]+ -> skip;

:

aaa foo bar baz barz

:

IDENTIFIER 'foo' BAR IDENTIFIER IDENTIFIER EOF

  • aaa IDENTIFIER

    IDENTIFIER , .

  • foo 'foo'

    randomParserRule 'foo', IDENTIFIER.

  • BAR BAR

    BAR, IDENTIFIER .

  • baz IDENTIFIER

    baz, IDENTIFIER. , BAR.

    , baz , IDENTIFIER , baz.

  • barz IDENTIFIER

    BAR 3 (BAR), IDENTIFIER 4 . IDENTIFIER , BAR.

  • EOF ( ) - , .

, . , , .

, 'foo', , lexer. , . , , .

+5

Source: https://habr.com/ru/post/1685829/


All Articles