ANTLR Grammar if statement

Question

ANTLR Grammar if statement

I am working on learning ANTLR to create a domain specific language. One of the requirements is to translate this DSL to C. I was able to get a basic grammar that recognizes DSL, however I am having problems translating this to C. Basically, my problem is trying to translate the DSL if statement to the C if statement . I tried using print instructions in grammar, to no avail (I use C #).

Here is the grammar I tested:

**ifTest.g** grammar ifTest; options { backtrack=true; output=AST; language=CSharp2; } /************************* PARSER RULES *************************/ prog : lambda | statements EOF; lambda : /* Empty */; statements : statement+; statement : logical | assignment | NEWLINE; logical : IF a=logical_Expr THEN b=statements { System.Console.Write("\tif (" + $a.text + ")\n\t{\n\t" + "\t" + $b.text + "\n\n\t}"); } ( ELSE c=statements { System.Console.Write("\n\telse {\n\t\t\t" + $c.text + "\n\t}"); } )? ENDIF { System.Console.Write("\n}"); } ; logical_Expr : expr ; expr : (simple_Expr) (op expr)* ; simple_Expr : MINUS expr | identifier | number ; identifier : parameter | VARIABLE ; parameter : norm_parameter ; norm_parameter : spec_label | reserved_parm ; spec_label : LABEL ; reserved_parm : RES_PARM ; op : PLUS | MINUS | MULT | DIV | EQUALS | GT | LT | GE | LE ; number : INT | FLOAT | HEX ; assignment : identifier GETS expr ; /************************* LEXER RULES *************************/ WS : (' '|'\t')+ {$channel=HIDDEN;}; COMMENT : '/*' (options {greedy=false;}:.)* '*/' {$channel=HIDDEN;} ; LINECOMMENT : '#' ~('\n'|'\r')* NEWLINE {$channel=HIDDEN;} ; NEWLINE : '\r'?'\n' {$channel=HIDDEN;}; IF : IF; THEN : THEN; ELSE : ELSE; ENDIF : ENDIF; PLUS : '+'; MINUS : '-'; MULT : '*'; DIV : '/'; EQUALS : '='; GT : '>'; LT : '<'; GE : '>='; LE : '<='; ULINE : '_'; DOT : '.'; GETS : ':='; LABEL : (LETTER|ULINE)(LETTER|DIGIT|ULINE)*; INT : '-'?DIGIT+; FLOAT : '-'? DIGIT* DOT DIGIT+; HEX : ('0x'|'0X')(HEXDIGIT)HEXDIGIT*; RES_PARM: DIGIT LABEL; VARIABLE: '\$' LABEL; fragment A:'A'|'a'; fragment B:'B'|'b'; fragment C:'C'|'c'; fragment D:'D'|'d'; fragment E:'E'|'e'; fragment F:'F'|'f'; fragment G:'G'|'g'; fragment H:'H'|'h'; fragment I:'I'|'i'; fragment J:'J'|'j'; fragment K:'K'|'k'; fragment L:'L'|'l'; fragment M:'M'|'m'; fragment N:'N'|'n'; fragment O:'O'|'o'; fragment P:'P'|'p'; fragment Q:'Q'|'q'; fragment R:'R'|'r'; fragment S:'S'|'s'; fragment T:'T'|'t'; fragment U:'U'|'u'; fragment V:'V'|'v'; fragment W:'W'|'w'; fragment X:'X'|'x'; fragment Y:'Y'|'y'; fragment Z:'Z'|'z'; fragment DIGIT : '0'..'9'; fragment LETTER : A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z; fragment HEXDIGIT : '0..9'|'a..f'|'A'..'F';

When testing this class C #

 using System; using System.Collections.Generic; using System.Linq; using System.Text; using Antlr.Runtime; namespace ConsoleApplication1 { class Program { static void Main(string[] args) { string inputString = "if $variable1 = 0 then\n if $variable2 > 250 then\n $variable3 := 0\n endif\n endif"; Console.WriteLine("Here is the input string:\n " + inputString + "\n"); ANTLRStringStream input = new ANTLRStringStream(inputString); ifTestLexer lexer = new ifTestLexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer); ifTestParser parser = new ifTestParser(tokens); parser.prog(); Console.Read(); } } }

The solution is not quite as I imagined.

 **Output** if ($variable2 > 250) { $variable3 := 0 } } if ($variable1 = 0) { if $variable2 > 250 then $variable3 := 0 endif } }

The problem is that the second if statement prints twice, but not in the order I was hoping. I suppose this is because I'm just trying to emit a block of statements in print operations, but I'm not quite sure how to do this to work correctly. Am I reading on a StringTemplate or creating an AST and using the Tree Walker to walk it, but still, to fix the above conclusion, to see something like this?

 if ($variable1 = 0) { if ($variable2 > 250) { $variable3 := 0 } }

Any help in which direction I should take would be greatly appreciated. Would it be better if I jumped into a StringTemplate, or is there a way to do this using the base action code? If I left any information, please feel free to ask.

+4

antlr

almostProgramming Feb 16 '12 at 15:29

source share

2 answers

Yes, the problem is that you are trying to fix your "compilation results" (C program) during the parsing phase. The parser will back off, and in general, you cannot expect that each section of the analyzer will start only once and take the correct path each time.

AST exit is what I would like to offer, and then passing AST to produce your exit. TreeWalker certainly sounds like a useful tool.

In general, no, I don’t think that for any non-trivial grammar it would be possible to create the desired result using only parsing actions.

Oddly enough, you're the second person I saw trying to do this in the last couple of days. Of course, I see the attractiveness of the idea “all this with a parser!”, But I really do not think that this is possible. ANTLR is a feature of the instrument, but its output is AST; not compiled executable file.

Here's a link to another similar question if you're interested:
Java code analysis with ANTLR needs concept "

+4

Task Feb 16 '12 at 16:01

source share

Bart kiers · Accepted Answer · 2012-02-16T20:01:47+0000

If you remove the backtrace, which is easy to do in your case, you can let the parser immediately generate the C code.

Note that the analyzer rules can take parameters (indentation level in my example below) and can return custom objects ( String in the example):

Here is your grammar without returning and outputting to C code (I'm not too good in C #, so the demo is in Java):

 grammar ifTest; prog : statements[""] EOF {System.out.println($statements.str);} ; statements[String indent] returns [String str] @init{$str = "";} : (statement[indent] {$str += indent + $statement.str + "\n";})* ; statement[String indent] returns [String str] : if_statement[indent] {$str = $if_statement.str;} | assignment {$str = $assignment.str;} ; if_statement[String indent] returns [String str] : IF expr THEN s1=statements[indent + " "] {$str = "if (" + $expr.str + ")\n" + indent + "{\n" + $s1.str;} (ELSE s2=statements[indent + " "] {$str += indent + "}\n" + indent + "else\n" + indent + "{\n" + $s2.str;} )? ENDIF {$str += indent + "}";} ; assignment returns [String str] : identifier GETS expr {$str = $identifier.str + " = " + $expr.str + ";";} ; expr returns [String str] : rel_expr {$str = $rel_expr.str;} ; rel_expr returns [String str] : e1=eq_expr {$str = $e1.str;} ( LT e2=eq_expr {$str += " < " + $e2.str;} | GT e2=eq_expr {$str += " > " + $e2.str;} | LE e2=eq_expr {$str += " <= " + $e2.str;} | GE e2=eq_expr {$str += " >= " + $e2.str;} )? ; eq_expr returns [String str] : e1=add_expr {$str = $e1.str;} (EQUALS e2=add_expr {$str += " == " + $e2.str;})? ; add_expr returns [String str] : e1=mult_expr {$str = $e1.str;} ( PLUS e2=mult_expr {$str += " + " + $e2.str;} | MINUS e2=mult_expr {$str += " - " + $e2.str;} )* ; mult_expr returns [String str] : e1=unary_expr {$str = $e1.str;} ( MULT e2=unary_expr {$str += " * " + $e2.str;} | DIV e2=unary_expr {$str += " / " + $e2.str;} )* ; unary_expr returns [String str] : MINUS term {$str = "-" + $term.str;} | term {$str = $term.str;} ; term returns [String str] : identifier {$str = $identifier.str;} | number {$str = $number.text;} ; identifier returns [String str] : LABEL {$str = $LABEL.text;} | RES_PARM {$str = $RES_PARM.text;} | VARIABLE {$str = $VARIABLE.text.substring(1);} ; number : INT | FLOAT | HEX ; WS : (' '|'\t')+ {$channel=HIDDEN;}; COMMENT : '/*' .* '*/' {$channel=HIDDEN;}; LINECOMMENT : '#' ~('\n'|'\r')* NEWLINE {$channel=HIDDEN;}; NEWLINE : '\r'?'\n' {$channel=HIDDEN;}; IF : IF; THEN : THEN; ELSE : ELSE; ENDIF : ENDIF; PLUS : '+'; MINUS : '-'; MULT : '*'; DIV : '/'; EQUALS : '='; GT : '>'; LT : '<'; GE : '>='; LE : '<='; ULINE : '_'; DOT : '.'; GETS : ':='; LABEL : (LETTER | ULINE) (LETTER | DIGIT | ULINE)*; INT : DIGIT+; // no '-' here, unary_expr handles this FLOAT : DIGIT* DOT DIGIT+; // no '-' here, unary_expr handles this HEX : '0' ('x'|'X') HEXDIGIT+; RES_PARM : DIGIT LABEL; VARIABLE : '$' LABEL; fragment A:'A'|'a'; fragment B:'B'|'b'; fragment C:'C'|'c'; fragment D:'D'|'d'; fragment E:'E'|'e'; fragment F:'F'|'f'; fragment G:'G'|'g'; fragment H:'H'|'h'; fragment I:'I'|'i'; fragment J:'J'|'j'; fragment K:'K'|'k'; fragment L:'L'|'l'; fragment M:'M'|'m'; fragment N:'N'|'n'; fragment O:'O'|'o'; fragment P:'P'|'p'; fragment Q:'Q'|'q'; fragment R:'R'|'r'; fragment S:'S'|'s'; fragment T:'T'|'t'; fragment U:'U'|'u'; fragment V:'V'|'v'; fragment W:'W'|'w'; fragment X:'X'|'x'; fragment Y:'Y'|'y'; fragment Z:'Z'|'z'; fragment HEXDIGIT : DIGIT |'a..f'|'A'..'F'; fragment DIGIT : '0'..'9'; fragment LETTER : A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z ;

If you now check your parser with input:

 if $variable1 = 0 then if $variable2 > 250 then $variable3 := 0 else $variable3 := 42 endif endif

the following is displayed on the console:

 if (variable1 == 0) { if (variable2 > 250) { variable3 = 0; } else { variable3 = 42; } }

If other parts of your grammar rely (heavily) on predicates (backtracking), the same strategy as above can be applied just as easily, but then in the grammar of the tree (so after the reverse write parser has completed its work and created AST).

ANTLR Grammar if statement

More articles: