How can I distinguish between reserved words and variables using ANTLR?

Question

How can I distinguish between reserved words and variables using ANTLR?

I use ANTLR to denote a simple grammar and I need to distinguish the identifier:

ID : LETTER (LETTER | DIGIT)* ; fragment DIGIT : '0'..'9' ; fragment LETTER : 'a'..'z' | 'A'..'Z' ;

and RESERVED_WORD:

 RESERVED_WORD : 'class' | 'public' | 'static' | 'extends' | 'void' | 'int' | 'boolean' | 'if' | 'else' | 'while' | 'return' | 'null' | 'true' | 'false' | 'this' | 'new' | 'String' ;

Say I ran an input lexer:

 class abc

I get two ID tokens for "class" and "abc", while I want the "class" to be recognized by RESERVED_WORD. How can i do this?

+4

antlr antlr3

Chris covert Mar 15 '12 at 19:18

source share

1 answer

Bart kiers · Accepted Answer · 2012-03-15T19:24:59+0000

Whenever 2 (or more) rules correspond to the same number of characters, the first one will “win”. So, if you define RESERVED_WORD before the ID , for example:

 RESERVED_WORD : 'class' | 'public' | 'static' | 'extends' | 'void' | 'int' | 'boolean' | 'if' | 'else' | 'while' | 'return' | 'null' | 'true' | 'false' | 'this' | 'new' | 'String' ; ID : LETTER (LETTER | DIGIT)* ; fragment DIGIT : '0'..'9' ; fragment LETTER : 'a'..'z' | 'A'..'Z' ;

The input "class" will be indicated as RESERVED_WORD .

Note that creating a simple marker that matches any reserved word does not make much sense: this is usually done as follows:

 // ... NULL : 'null'; TRUE : 'true'; FALSE : 'false; // ... ID : LETTER (LETTER | DIGIT)* ; fragment DIGIT : '0'..'9' ; fragment LETTER : 'a'..'z' | 'A'..'Z' ;

Now "false" will become the FALSE token and "falser" a ID .

How can I distinguish between reserved words and variables using ANTLR?

More articles: