Can I use antlr to analyze partial data?

Question

Can I use antlr to analyze partial data?

I am trying to use antlr to parse a log file. Since I'm only interested in the partial part of the log, I only want to write a partial parser to process the important part.

for example: I want to analyze a segment:

[ 123 begin ]

So I wrote a grammar:

 log : '[' INT 'begin' ']' ; INT : '0'..'9'+ ; NEWLINE : '\r'? '\n' ; WS : (' '|'\t')+ {skip();} ;

But the segment may appear in the middle of the line, for example:

  111 [ 123 begin ] 222

According to the discussion: What happened to the simple ANTLR grammar? I know why my grammar cannot handle the above statement.

I want to know if there is a way to force antlr to ignore any error and continue to process the remaining text?

Thanks for any advice! Leon

+4

antlr

Leon chen Nov 04 '12 at 14:34

source share

1 answer

Bart kiers · Accepted Answer · 2012-11-04T18:35:17+0000

Since '[' can also be skipped in some cases outside of [ 123 begin ] , there is no way in the lexer to handle this. You will need to create a parser rule that will pass tokens (s) (see noise rule).

You also need to create a fall rule that matches any character if none of the other lexer rules matches (see ANY rule).

Quick demo:

 grammar T; parse : ( log {System.out.println("log=" + $log.text);} | noise )* EOF ; log : OBRACK INT BEGIN CBRACK ; noise : ~OBRACK // any token except '[' | OBRACK ~INT // a '[' followed by any token except an INT | OBRACK INT ~BEGIN // a '[', an INT and any token except an BEGIN | OBRACK INT BEGIN ~CBRACK // a '[', an INT, a BEGIN and any token except ']' ; BEGIN : 'begin'; OBRACK : '['; CBRACK : ']'; INT : '0'..'9'+; NEWLINE : '\r'? '\n'; WS : (' '|'\t')+ {skip();}; ANY : .;

Can I use antlr to analyze partial data?

More articles: