How to handle variable links in yacc / bison (with ocaml)

Question

How to handle variable links in yacc / bison (with ocaml)

I was wondering how to handle variable references inside statements when writing grammars with ocamlyacc and ocamllex.

The problem is that statements of the form

var x = y + z var b = true | f;

must be correct, but in the first case the variable refers to numbers, and in the second case f is a logical variable.

In the grammar I'm writing, I have this:

 numeric_exp_val: | nint { Syntax.Int $1 } | FLOAT { Syntax.Float $1 } | LPAREN; ne = numeric_exp; RPAREN { ne } | INCR; r = numeric_var_ref { Syntax.VarIncr (r,1) } | DECR; r = numeric_var_ref { Syntax.VarIncr (r,-1) } | var_ref { $1 } ; boolean_exp_val: | BOOL { Syntax.Bool $1 } | LPAREN; be = boolean_exp; RPAREN { be } | var_ref { $1 } ;

which, obviously, cannot work, since both var_ref non-terminal are reduced to the same thing (reduction / reduction conflict). But I would like to have type checking, which is mostly statically performed (regarding variable references) during the analysis phase itself.

This is why I am wondering what is the best way to have variable references and maintain this structure. As additional information, I have functions that compile the syntax tree by translating it into byte code like this:

 let rec compile_numeric_exp exp = match exp with Int i -> [Push (Types.I.Int i)] | Float f -> [Push (Types.I.Float f)] | Bop (BNSum,e1,e2) -> (compile_numeric_exp e1) @ (compile_numeric_exp e2) @ [Types.I.Plus] | Bop (BNSub,e1,e2) -> (compile_numeric_exp e1) @ (compile_numeric_exp e2) @ [Types.I.Minus] | Bop (BNMul,e1,e2) -> (compile_numeric_exp e1) @ (compile_numeric_exp e2) @ [Types.I.Times] | Bop (BNDiv,e1,e2) -> (compile_numeric_exp e1) @ (compile_numeric_exp e2) @ [Types.I.Div] | Bop (BNOr,e1,e2) -> (compile_numeric_exp e1) @ (compile_numeric_exp e2) @ [Types.I.Or] | VarRef n -> [Types.I.MemoryGet (Memory.index_for_name n)] | VarIncr ((VarRef n) as vr,i) -> (compile_numeric_exp vr) @ [Push (Types.I.Int i);Types.I.Plus;Types.I.Dupe] @ (compile_assignment_to n) | _ -> []

+4

variables compilation yacc ocaml grammar

Jack Dec 11 '11 at 5:44

source share

2 answers

If you want to consider numeric expressions and Boolean expressions as different syntactic categories, then consider how you should parse var x = ( ( y + z ) ) . You do not know what type of expression you are analyzing until you press + . So you need to eat some tokens before you know if you see numeric_exp_val or boolean_exp_val : you need unlimited viewing. Yacc does not provide such browsing (Yacc only provides a limited form of lookahead, roughly described as LALR , which sets the boundaries of parsing time and memory requirements). There is even an ambiguous case that makes your grammar context-sensitive: with the type definition var x = y you need to look for type y .

You can solve this last ambiguity by returning the type information to the lexer, and you can solve the problem with a parser generator that supports unlimited viewing. However, both of these methods will push your parser to a point where it cannot easily evolve if you want to expand the language later (for example, to distinguish between integers and floating-point numbers, add lines or lists, etc. .).

If you need a simple, but constraining solution with low technological overhead, I’ll add a gasche suggestion to add a syntax to define numerical and boolean variables, something like bvar b = … and nvar x = … There again, it will be difficult to support other types later.

You will have an easier time if you separate type checking from parsing. After you have built an abstract syntax tree, perform a type check of the check (in which you select the type of variables.

 type numeric_expression = Nconst of float | Nplus of numeric_expression * numeric_expression | … and boolean_expression = Bconst of bool | Bor of boolean_expression * boolean_expression | … type typed_expression = Tnum of numeric_expression | Tbool of boolean_expression type typed_statement = Tvar of string * typed_expression let rec type_expression : Syntax.expression -> typed_expression = function | Syntax.Float x -> Tnum (Nconst x) | Syntax.Plus (e1, e2) -> begin match type_expression e1, type_expression e2 with | Tnum n1, Tnum n2 -> Tnum (Nplus (n1, n2)) | _, (Tbool _ as t2) -> raise (Invalid_argument_type ("+", t2)) | (Tbool _ as t1), _ -> raise (Invalid_argument_type ("+", t1)) end | …

+4

Gilles Dec 11 '11 at 10:39

source share

gasche · Accepted Answer · 2011-12-11T14:51:52+0000

Analysis is simply not suitable for type checking. I do not understand why you insist on this in this walkthrough. You would have a clearer code and more expressive power, doing it in a separate passage.

Is it for efficiency reasons? I am sure that you could develop effective incremental typing procedures elsewhere that will be invoked from the production of grammar (but I'm not sure that you will gain so much). It looks like premature optimization.

Work on record-type systems was written as a grammar attribute (which could be seen as a declarative way of expressing typing derivations), but I don’t think that this is intended to combine parsing and text input in one pass.

If you really want to go further in this direction, I would advise you to use simple lexical differentiation between num-typed and bool-typed variables. It sounds ugly but simple.

How to handle variable links in yacc / bison (with ocaml)

More articles: