ANTLR - Token enumeration mismatch between grammar and tree grammar

Question

ANTLR - Token enumeration mismatch between grammar and tree grammar

Background

I am trying to write a simple grammar using AntlrWorks for Boolean equations that check a lot of values for the existence (or absence) of certain elements. I created a combined lexer / parser grammar that gives the desired AST. I also wrote an accompanying tree grammar that seems to work (passes debug functions from AntlrWorks).

Problem

However, when I try to link them together in a test program (this is a lex, parsing and parsing in the same program), I get errors like ...

node from line 1:5 required (...)+ loop did not match anything at input 'and'

and

node from after line 1:8 mismatched tree node: UP expecting <DOWN>

As a performance test, I had a test program that outputs the results of toStringTree() from the generated AST and toTokenTypeString() from the resulting TreeNodeStream .

What I discovered is that the listed values of the TreeNodeStream token TreeNodeStream do not match the values of the type of the token enumerated type in the code with the auto-generated tree.

Example

sample input: "true and false"
The output of the toStringTree () command from the tree provided by Parser: (and true false)
The output of toTokenTypeString() from the TreeNodeStream associated with the above AST: 19 2 22 20 3 8

This token should be AND <DOWN> 'true' 'false' <UP> NEWLINE But TreeParser sees it as CLOSEPAREN <DOWN> OR 'false' <UP> OPENPAREN (based on looking at the output of the node token type and checking for the enum defined by in the grammar of the tree) and throws an error

1:5 required (...)+ loop did not match anything at input 'and'

Bottom line

Why is my tree analyzer not configured to correctly identify my AST?

Below is my source. I appreciate any feedback on the stupid mistakes I must have made :)

Grammar Lexer / Parser

 grammar INTc; options { output=AST; ASTLabelType=CommonTree; } tokens { OR='or'; AND='and'; NOT='not'; ALLIN='+'; PARTIN='^'; NOTIN='!'; SET; OPENPAREN='('; CLOSEPAREN=')'; OPENSET='{'; CLOSESET='}'; } @header { package INTc; } @lexer::header { package INTc; } @members { } /**Begin Parser Rules*/ prog: stat+ ; stat: expr | NEWLINE ; expr : orExpr ; orExpr returns [boolean value] : a=andExpr(OR^ b=andExpr)* ; andExpr returns [boolean value] : a=notExpr (AND^ b=notExpr)* ; notExpr returns [boolean value] : a=atom | '!' a=atom -> ^(NOT atom) ; atom returns [boolean value] : ALLIN OPENSET ((INT)(','INT)*) CLOSESET -> ^(ALLIN ^(SET INT+)) | PARTIN OPENSET ((INT)(','INT)*) CLOSESET -> ^(PARTIN ^(SET INT+)) | NOTIN OPENSET ((INT)(','INT)*) CLOSESET -> ^(NOTIN ^(SET INT+)) | TIMERANGE | OPENPAREN! e=expr CLOSEPAREN! | 'true' | 'false' ; /**Begin Lexer Rules*/ ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ; DIGIT : ('0'..'9'); INT : DIGIT+ ; NEWLINE : '\r'? '\n' ; WS : ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;}; COMMENT : '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;} | '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;} ;

Grammar tree

 tree grammar INTcWalker; options { tokenVocab=INTc; ASTLabelType=CommonTree; } @header { package INTc; import java.util.ArrayList; import java.util.Arrays; } @members { ArrayList<String> intSet; boolean isFit = false; public boolean getResult() { return isFit; } public void setINTSet(ArrayList newSet) { intSet = newSet; isFit = false; } public ArrayList getINTSET(){return intSet;} } prog : stat+ ; stat : expr { isFit = $expr.value; //System.out.println(isFit); } | NEWLINE {} ; expr returns [boolean value] : ^(OR a=expr b=expr){} | ^(AND a=expr b=expr){} | ^(NOT a=expr){} | ^(ALLIN ^(SET INT+)){} | ^(PARTIN ^(SET INT+)){} | ^(NOTIN ^(SET INT+)){} | 'true' {$value = true;} | 'false' {$value = false;} ;

Testing program

 public class setTest { public static void main(String args[]) throws Exception { INTcLexer lex = new INTcLexer(new ANTLRFileStream("input.txt")); CommonTokenStream tokens = new CommonTokenStream(lex); INTcParser parser = new INTcParser(tokens); INTcParser.prog_return r = parser.prog(); CommonTree t = (CommonTree)r.getTree(); CommonTreeNodeStream nodes = new CommonTreeNodeStream(t); INTcWalker evaluator = new INTcWalker(nodes); System.out.println(t.toStringTree()); System.out.println(nodes.toTokenTypeString()); nodes.reset(); try { evaluator.prog(); } catch (RecognitionException e) { e.printStackTrace(); } System.out.println(evaluator.getResult()); } }

+4

java antlr antlr3 antlrworks dsl

Rex redi Dec 08 '11 at 21:22

source share

1 answer

Bart kiers · Accepted Answer · 2011-12-08T22:07:25+0000

If I use your combined grammar and tree grammar to create the lexer, parser and tree-walker classes and run the following class:

 import org.antlr.runtime.*; import org.antlr.runtime.tree.*; public class Main { public static void main(String args[]) throws Exception { INTcLexer lex = new INTcLexer(new ANTLRStringStream("true and false\n")); CommonTokenStream tokens = new CommonTokenStream(lex); INTcParser parser = new INTcParser(tokens); CommonTree t = (CommonTree)parser.prog().getTree(); CommonTreeNodeStream nodes = new CommonTreeNodeStream(t); INTcWalker evaluator = new INTcWalker(nodes); System.out.println(t.toStringTree()); CommonTree tr; while(true) { Token token = ((CommonTree)nodes.nextElement()).getToken(); if(token.getType() == INTcParser.EOF) break; System.out.printf("%-10s '%s'\n", INTcParser.tokenNames[token.getType()], token.getText()); } System.out.println("\nresult=" + evaluator.getResult()); } }

the following is displayed on the console:

 (and true false) AND 'and' <DOWN> 'DOWN' 'true' 'true' 'false' 'false' <UP> 'UP' NEWLINE ' ' result=false

Ie: I see the expected result:

the tree is in order ( (and true false) );
CommonTreeNodeStream contains the correct tokens (or better: trees);
and the correct value is false , it is printed without any errors either from the parser or from the tree walker.

A few tips:

create tokens for 'true' and 'false' (i.e. TRUE='true'; ...);
do not use literals inside your tree grammar (not 'true' , but TRUE );
make DIGIT a fragment rule, so it will never become an end in itself, but is used only inside INT (or other lexer rules). Just put the fragment keyword in front of it;
both .* and .+ do not respond by default, so you can remove options greedy=false;} :

ANTLR - Token enumeration mismatch between grammar and tree grammar

Background

Problem

Example

Bottom line

More articles: