A compiler compiler that consumes BNF

Is there a reason why there are no parser generators that consume direct BNF?

I am familiar with JavaCC and Antlr , and recently came across Parse2 . Everyone seems to have their own designations. BNF is really easy to read, while others are not. BNF is unambiguous. Is there any inherent reason why I cannot pass BNF to the compiler compiler and get the parse tree?

+5
source share
2 answers

Is there a reason why there are no parser generators that consume direct BNF?

No, but most parser generators have all started trying. It took society a long time to figure out what was sometimes less.

Tools that consume (some variations) of BNF have existed for a long time.

The document describing MetaII was published back in 1963. The MetaII base BNF is what you expect (lefthandside plus right side side) plus the number is now relatively standard EBNF extensions (Kleene Star and Plus, Groupting, alternatives) and non-standard built-in actions (used to generate a literally syntax-oriented translation). It involves built-in lexers. MetaII can meta-compile its own grammar to accurately reproduce it [the paper shows this, and this is a breathtaking moment when you try to understand why it works], which allows you to download more complex versions. and compile other simple languages. A common simple extension is to define grammar rules for lexical tokens. I built many implementations and tools that used this back in the 1970s because it was so cool and so useful.

Tree Meta added actions for building trees and auxiliary grammar for trees that match the template to generate output.

When you add all the extra EBNF and generation elements, the resulting “BNF” for MetaII and Tree Meta can be quite difficult for the uninitiated to read, mainly because it is dense and you should be familiar with the context. Otherwise, it looks like a chain printer output (for those of you who know what it is) that has broken.

Most modern compiler compilers are not much different from each other. YACC has extended BNF with embedded code (your favorite language) to do the actions commonly used to create trees. JavaCC uses advanced BNF; with JJTree you can create AST. ANTLR 1/2/3 also has an advanced BNF, with built-in actions for building trees. This makes them the same as ugly as MetaII ... there is no progress for 40 years IMHO.

Our DMS software reengineering toolkit (see my biography) uses the simplest BNF you can imagine for really complex grammars such as IBM Enterprise COBOL, Fortran 2005 and C ++ 14. It looks like this:

LHS = RHS1 RHS2 ... RHSn ; 

for various tokens LHS, RHS1 ... RHSn. The lists are simple: you use two rules: one for the base case, one for expanding the list. The alternatives are simple: enter another rule. It’s technically easy to encode grammar rules for tokens just like grammar rules whose terminals are valid characters. (DMS offers, and we usually use a real lexer to determine the reasons for the parsing).

Here is the DMS BNF for high school algebra (and some calculus, some freedom in notation):

 equation = sum ; sum = product ; sum = sum '+' product ; sum = sum '-' product ; product = term ; product = product '*' term ; product = product '/' term ; term = primary ; term = primary '^' term ; primary = NUMBER ; primary = NAME ; primary = '(' sum ')' ; primary = '-' primary ; primary = NAME '(' sum ')' ; primary = 'D' NAME ':' primary ; -- differentiate primary primary = 'I' sum 'D' NAME ; -- indefinite integral primary = 'I' sum ',' sum ':' sum 'D' NAME ; -- definite integral 

There are other things in the real DMS grammar file that it describes to describe beautiful printing, etc. Here you can see a larger example of a DMS grammar for the M6809 assembler .

Interestingly, DMS builds AST using only grammar as a guide; no additional actions on the stand. By avoiding the need to specify actions — while parsing (to create tree nodes), the resulting grammars are pretty easy to read.

DMS has been doing this since 1998; this is the first tool that I know that has used this approach. The ANTLR guys finally realized that this is a good idea, and now from 2013 ANTLR4 will build a tree without any obvious nested actions, although it still has actions for other purposes. DMS trees can be either specific (directly by grammar) or "abstract" (many nodes of the tree are dropped because they can be restored by DMS on demand, with an abstract tree and grammar). Abstract trees are actually pretty decent. I do not know what ANTLR4 does here.

The really nice thing about this approach is to write and revise really complex, large grammars, just by revising the rules; AST design is “free” and always “correct” in relation to grammar. This allows the DMS to provide many other related tools (prettyprinting, source-to-source conversion), using it as a base layer. (DMS is technically a “meta” compiler in the sense that it can analyze its own grammars using its own grammar, we use this DMS feature to generate parsers implied by these grammars).

You can see a complete example of this in "algebra as a dms domain", also through my biography. (BPF was taken from there). I would provide links, but many SO people don't like this.

+2
source

Marpa :: R2 , the Perl interface for Marpa , a common BNF parser, takes a direct BNF as a grammar description and generates a parser for it in Perl. Here is an example taken almost literally from a BNF grammar textbook .

 <tree> ::= '(' <list> ')' <list> ::= <thing> | <list> ',' <thing> <thing> ::= <tree> | <name> <name> ::= 'ant' | 'bat' | 'cow' | 'dog' | 'cat' 

Full code example.

+2
source

Source: https://habr.com/ru/post/1205402/


All Articles