Get the source code of an Antlr rule

I am starting ANTLR and want to calculate the character SHA1-Hash.

My simplified sample grammar:

grammar Example; method @after{calculateSha1($text); }: 'call' ID; ID: 'A'..'Z'+; WS: (' '|'\n'|'\r')+ {skip(); } COMMENT: '/*' (options {greedy=false;}: .)* '*/' {$channel=HIDDEN} 

Since the lexer removes all spaces, different lines of callABC , call /* DEF */ ABC , unfortunately, get the same SHA1-Hash value.

Is it possible to get the "source" text of the rule between the start and end tokens with all missing spaces and the text of other channels?

(One of the possibilities that comes to my mind is to specify all the characters in the WS - and COMMENT -lexer rule, but there are many more rules, so this is not very practical.)

I use the standard ANTLRInputStream to feed Lexer, but I don't know how to get the source.

+6
source share
1 answer

Instead of skip() -ping the WS marker, place it also on the HIDDEN channel:

 grammar Example; @parser::members { void calculateSha1(String text) { try { java.security.MessageDigest md = java.security.MessageDigest.getInstance("SHA-1"); byte[] sha1 = md.digest(text.getBytes()); System.out.println(text + "\n" + java.util.Arrays.toString(sha1) + "\n"); } catch(Exception e) { e.printStackTrace(); } } } parse : method+ EOF ; method @after{calculateSha1($text);} : 'call' ID ; ID : 'A'..'Z'+; WS : (' ' | '\t' | '\n' | '\r')+ {$channel=HIDDEN;}; COMMENT : '/*' .* '*/' {$channel=HIDDEN;}; 

The grammar above can be tested with:

 import org.antlr.runtime.*; public class Main { public static void main(String[] args) throws Exception { String source = "call ABC call /* DEF */ ABC"; ExampleLexer lexer = new ExampleLexer(new ANTLRStringStream(source)); ExampleParser parser = new ExampleParser(new CommonTokenStream(lexer)); parser.parse(); } } 

which will output the following to the console:

  call abc
 [48, -45, 113, 5, -52, -128, -78, 75, -52, -97, -35, 25, -55, 59, -85, 96, -58, 58, -96, 10]

 call / * DEF * / ABC
 [-57, -2, -115, -104, 77, -37, 4, 93, 116, -123, -47, -4, 33, 42, -68, -95, -43, 91, 94, 77] 

ie: one and the same analyzer rule, but another $text (and, therefore, different SHA1).

+4
source

Source: https://habr.com/ru/post/897461/


All Articles