How can I distinguish between reserved words and variables using ANTLR?

I use ANTLR to denote a simple grammar and I need to distinguish the identifier:

ID : LETTER (LETTER | DIGIT)* ; fragment DIGIT : '0'..'9' ; fragment LETTER : 'a'..'z' | 'A'..'Z' ; 

and RESERVED_WORD:

 RESERVED_WORD : 'class' | 'public' | 'static' | 'extends' | 'void' | 'int' | 'boolean' | 'if' | 'else' | 'while' | 'return' | 'null' | 'true' | 'false' | 'this' | 'new' | 'String' ; 

Say I ran an input lexer:

 class abc 

I get two ID tokens for "class" and "abc", while I want the "class" to be recognized by RESERVED_WORD. How can i do this?

+4
source share
1 answer

Whenever 2 (or more) rules correspond to the same number of characters, the first one will β€œwin”. So, if you define RESERVED_WORD before the ID , for example:

 RESERVED_WORD : 'class' | 'public' | 'static' | 'extends' | 'void' | 'int' | 'boolean' | 'if' | 'else' | 'while' | 'return' | 'null' | 'true' | 'false' | 'this' | 'new' | 'String' ; ID : LETTER (LETTER | DIGIT)* ; fragment DIGIT : '0'..'9' ; fragment LETTER : 'a'..'z' | 'A'..'Z' ; 

The input "class" will be indicated as RESERVED_WORD .

Note that creating a simple marker that matches any reserved word does not make much sense: this is usually done as follows:

 // ... NULL : 'null'; TRUE : 'true'; FALSE : 'false; // ... ID : LETTER (LETTER | DIGIT)* ; fragment DIGIT : '0'..'9' ; fragment LETTER : 'a'..'z' | 'A'..'Z' ; 

Now "false" will become the FALSE token and "falser" a ID .

+6
source

Source: https://habr.com/ru/post/1401679/


All Articles