Can I use antlr to analyze partial data?

I am trying to use antlr to parse a log file. Since I'm only interested in the partial part of the log, I only want to write a partial parser to process the important part.

for example: I want to analyze a segment:

[ 123 begin ] 

So I wrote a grammar:

 log : '[' INT 'begin' ']' ; INT : '0'..'9'+ ; NEWLINE : '\r'? '\n' ; WS : (' '|'\t')+ {skip();} ; 

But the segment may appear in the middle of the line, for example:

  111 [ 123 begin ] 222 

According to the discussion: What happened to the simple ANTLR grammar? I know why my grammar cannot handle the above statement.

I want to know if there is a way to force antlr to ignore any error and continue to process the remaining text?

Thanks for any advice! Leon

+4
source share
1 answer

Since '[' can also be skipped in some cases outside of [ 123 begin ] , there is no way in the lexer to handle this. You will need to create a parser rule that will pass tokens (s) (see noise rule).

You also need to create a fall rule that matches any character if none of the other lexer rules matches (see ANY rule).

Quick demo:

 grammar T; parse : ( log {System.out.println("log=" + $log.text);} | noise )* EOF ; log : OBRACK INT BEGIN CBRACK ; noise : ~OBRACK // any token except '[' | OBRACK ~INT // a '[' followed by any token except an INT | OBRACK INT ~BEGIN // a '[', an INT and any token except an BEGIN | OBRACK INT BEGIN ~CBRACK // a '[', an INT, a BEGIN and any token except ']' ; BEGIN : 'begin'; OBRACK : '['; CBRACK : ']'; INT : '0'..'9'+; NEWLINE : '\r'? '\n'; WS : (' '|'\t')+ {skip();}; ANY : .; 
+6
source

Source: https://habr.com/ru/post/1443951/


All Articles