How to do Unicode escape decoding in Antlr tokenizer

I created the antlr grammar using AntlrWorks and created a localization tool for internal use. I would like to convert unicode escape sequences to the actual Java character during parsing, but I'm not sure how to do this. Here are the definitions of markers in my grammar. Is there a way to specify an action for a UNICODE_ESC fragment that will return a character instead of a six-character escape sequence?

ID  :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

INT :   '0'..'9'+
    ;

COMMENT
    :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
    |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;

WS  :   ( ' '
        | '\t'
        | '\r'
        | '\n'
        ) {$channel=HIDDEN;}
    ;

STRING
    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;

fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;

fragment
ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;

fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;
+3
source share
1 answer

Michael wrote:

This is in Java, so representation should not be a problem for a character or string.

, Java, Unicode ... , .

():

, . , , UNICODE_ESC '?' .

, :

Token : 'x' {setText("?");} ;

Token x ?.

+1

Source: https://habr.com/ru/post/1767597/


All Articles