How to resolve the ambiguity in determining the grammar of LR (1)?

I am writing a Golang compiler in OCaml, and the argument lists cause me a headache. In Go, you can group consecutive parameter names of the same type as follows:

func f(a, b, c int) === func f(a int, b int, c int) 

You can also have a list of types without parameter names:

 func g(int, string, int) 

Two styles cannot be mixed and juxtaposed; either all parameters are named or not.

My problem is that when the parser sees a comma, it does not know what to do. In the first example: a type name or variable name with a large number of variables? The comma has a double role, and I'm not sure how to fix it.

I am using the Menhir parser generator tool for OCaml.

Edit: At the moment, my Mengira grammar exactly matches the rules specified at http://golang.org/ref/spec#Function_types

+6
source share
2 answers

As written, go grammar is not LALR(1) . Actually this is not LR(k) for any k . This, however, is unambiguous, so you can successfully analyze it using the GLR parser if you find it (I'm sure there are several GLR parser generators for OCAML, but I don’t know enough to recommend one of them).

If you don't want (or can't) use the GLR parser, you can do it the same way as Russ Cox did in the gccgo compiler that uses bison . ( bison can generate GLR parsers, but Cox does not use this function.) Its technique is independent of a scanner that distinguishes between type names and non-type names.

Rather, it simply accepts lists of parameters whose elements are either name_or_type or name name_or_type (in fact, because of the syntax ... there are more options, but it does not change the general principle.) This is simple, unambiguous and LALR(1) , but it overly accepts - it accepts, for example, func foo(a, b int, c) - and does not create the correct abstract syntax tree, because it does not attach this type to the list of declared parameters.

This means that after the argument list has been fully analyzed and is going to be inserted into the AST, attached to a specific function declaration (for example), a semantic scan is performed to correct it and, if necessary, an error message. This scan is performed from right to left above the list of ad elements, so that the specified type can be passed to the left.

It is worth noting that the grammar in the reference manual also over-accepts, because it does not express a restriction that "either all parameters are named or not." This restriction can be expressed in LR (1) grammar - I will leave this as an exercise for readers, but the resulting grammar will be much more difficult to understand.

+4
source

You have no ambiguity. The fact that the Go standard analyzer is LALR (1) proves that.

Is this the name of a type or variable name with a large number of variables?

Thus, basically your grammar and the parser as a whole should be completely disconnected from the character table; not C - your grammar is not ambiguous, so you can check the type name later in AST.

These are relevant rules (from http://golang.org/ref/spec ); they are already correct.

 Parameters = "(" [ ParameterList [ "," ] ] ")" . ParameterList = ParameterDecl { "," ParameterDecl } . ParameterDecl = [ IdentifierList ] [ "..." ] Type . IdentifierList = identifier { "," identifier } . 

I will explain them to you:

 IdentifierList = identifier { "," identifier } . 

Curly braces represent the closure of kleene (in the designation of regular expressions POSIX it is an asterisk). This rule states "identifier name, optionally followed by a literal comma and identifier, optionally followed by a literal comma and identifier, etc. & Hellip; ad infinitum"

 ParameterDecl = [ IdentifierList ] [ "..." ] Type . 

The square brackets are null; this means that this part may be present or absent. (In the notation for regular expressions, POSIX is a question mark). So you have a β€œMaybe List ID,” followed by, possibly, an ellipsis, followed by a type.

 ParameterList = ParameterDecl { "," ParameterDecl } . 

You can have several ParameterDecl in the list, for example, for example. func x(a, b int, c, d string) .

 Parameters = "(" [ ParameterList [ "," ] ] ")" . 

These rules determine that the ParameterList parameter is optional and must be surrounded by a bracket and may include an optional final comma literal, useful when you write something like:

 func x( a, b int, c, d string, // <- note the final comma ) 

Grammar Go is portable and can be analyzed by any analyzer from the bottom up with one lookahead marker.


Change in relation to β€œnot be C”: I said this because C is context-sensitive and the way to solve this problem in many (all?) Compilers is to put the character table in the lexer and tokens differently depending on whether they are defined as type names or variables. This is a hack and should not be done for unambiguous grammars!

+1
source

Source: https://habr.com/ru/post/975653/


All Articles