Using ANTLR to analyze a log file

Question

Using ANTLR to analyze a log file

I am just starting with ANTLR and trying to parse some template from a log file

for example: log file:

7114422 2009-07-16 15: 43: 07 078 [LOGTHREAD] INFO StatusLog - Task 0 input: uk.project.Evaluation.Input.Function1 (selected = ["red", "yellow"]) {}
7114437 2009-07-16 15: 43: 07 093 [LOGTHREAD] INFO StatusLog - Task 0 output: uk.org.project.Evaluation.Output.Function2 (selected = ["Rocket"]) {}
7114422 2009-07-16 15: 43: 07 078 [LOGTHREAD] INFO StatusLog - Task 0 input: uk.project.Evaluation.Input.Function3 (selected = ["blue", "yellow"]) {}
7114437 2009-07-16 15: 43: 07 093 [LOGTHREAD] INFO StatusLog - Task 0 output: uk.org.project.Evaluation.Output.Function4 (selected = ["speech"]) {}

Now I have to analyze this file to find only "Evaluation.Input.Function1" and it takes the values "red" and "yellow" and "Evaluation.Output.Function2" and the values "Rocket" and ignores everything else and similarly the other 2 input and output functions 3.4 below. There are many such input and output functions, and I have to find such sets of input / output functions. This is my grammar attempt which does not work. Any help would be greatly appreciated. Being the first attempt to write grammar and ANTLR, it becomes quite complex.

grammar test; tag : inputtag+ outputtag+ ; //Input tag consists of atleast one inputfunction with one or more values inputtag: INPUTFUNCTIONS INPUTVALUES+; //output tag consists of atleast one ontput function with one or more output values outputtag : OUTPUTFUNCTIONS OUTPUTVALUES+; INPUTFUNCTIONS : INFUNCTION1 | INFUNCTION2; OUTPUTFUNCTIONS :OUTFUNCTION1 | OUTFUNCTION2; // Possible input functions in the log file fragment INFUNCTION1 :'Evaluation.Input.Function1'; fragment INFUNCTION2 :'Evaluation.Input.Function3'; //Possible values in the input functions INPUTVALUES : 'red' | 'yellow' | 'blue'; // Possible output functions in the log file fragment OUTFUNCTION1 :'Evaluation.Output.Function2'; fragment OUTFUNCTION2 :'Evaluation.Output.Function4'; //Possible ouput values in the output functions fragment OUTPUTVALUES : 'Rocket' | 'Speech';

+4

antlr

RC Feb 16 '10 at 23:19

source share

1 answer

Bart kiers · Accepted Answer · 2010-02-17T12:57:30+0000

When you are only interested in the part of the file that you are parsing, you do not need a parser and write grammar for the entire file format. Just lexer grammar and ANTLR options{filter=true;} enough. This way you only get the markers that you defined in your grammar and ignore the rest of the file.

Here is a quick demo:

 lexer grammar TestLexer; options{filter=true;} @lexer::members { public static void main(String[] args) throws Exception { String text = "7114422 2009-07-16 15:43:07,078 [LOGTHREAD] INFO StatusLog - Task 0 input : uk.project.Evaluation.Input.Function1(selected=[\"red\",\"yellow\"]){}\n"+ "\n"+ "7114437 2009-07-16 15:43:07,093 [LOGTHREAD] INFO StatusLog - Task 0 output : uk.org.project.Evaluation.Output.Function2(selected=[\"Rocket\"]){}\n"+ "\n"+ "7114422 2009-07-16 15:43:07,078 [LOGTHREAD] INFO StatusLog - Task 0 input : uk.project.Evaluation.Input.Function3(selected=[\"blue\",\"yellow\"]){}\n"+ "\n"+ "7114437 2009-07-16 15:43:07,093 [LOGTHREAD] INFO StatusLog - Task 0 output : uk.org.project.Evaluation.Output.Function4(selected=[\"Speech\"]){}"; ANTLRStringStream in = new ANTLRStringStream(text); TestLexer lexer = new TestLexer(in); CommonTokenStream tokens = new CommonTokenStream(lexer); for(Object obj : tokens.getTokens()) { Token token = (Token)obj; System.out.println("> token.getText() = "+token.getText()); } } } Input : 'Evaluation.Input.Function' '0'..'9'+ Params ; Output : 'Evaluation.Output.Function' '0'..'9'+ Params ; fragment Params : '(selected=[' String ( ',' String )* '])' ; fragment String : '"' ( ~'"' )* '"' ;

Now do:

 javac -cp antlr-3.2.jar TestLexer.java java -cp .:antlr-3.2.jar TestLexer // or on Windows: java -cp .;antlr-3.2.jar TestLexer

and you will see that the following data will be printed to the console:

 > token.getText() = Evaluation.Input.Function1(selected=["red","yellow"]) > token.getText() = Evaluation.Output.Function2(selected=["Rocket"]) > token.getText() = Evaluation.Input.Function3(selected=["blue","yellow"]) > token.getText() = Evaluation.Output.Function4(selected=["Speech"])

Using ANTLR to analyze a log file

More articles: