Graco "code" generation

I'm trying to figure out how to recreate a document parsed by grako generated parser .

After I plunged deeply into the grako source code, I believe that I finally understood how one returns from the AST to the generated document. Can someone please make sure my following understanding is correct and let me know if there is a more direct method?

  • One creates the PEG grammar that the parsing wants. Grako creates on it an analyzer class and a semantics class.
  • One creates (manually) a python module containing (more or less) a separate class (a subclass of grako.model.Node ) for each rule in one grammar. Each class must have at least a constructor with parameters for each named element in the corresponding rule and store its values ​​in the class property.
  • One of the subclasses (manually) generated semantic class to replace ast for each rule with the corresponding class created in step 2.
  • Creates (manually) a python module subclass of grako.codegen.ModelRenderer , defining a template for generating code for (more or less) each rule in one grammar.
  • One AST channel, consisting of Node subclasses, and a python module containing templates in grako.codegen.CodeGenerator().render(...) to generate the output.

Can this be right? It does not seem intuitive at all.

  • Why do you need to make significant efforts in steps 2 and 3 in order to do nothing but store information that is already contained in the AST?
  • What is the advantage of this approach rather than working directly from AST?
  • Is there a way to automate or get around steps 2 and 3 if you only need to recreate the document in the original grammar?
  • Given the definition of PEG grammar, is it theoretically possible to automatically create a "code generator" in the same way as creating a "parser generator"?
+5
source share
1 answer

If you look at how Grako himself analyzes grammars, you will notice that the 2nd stage classes are synthetically created by the ModelBuilderSemantics thread:

 # from grako/semantics.py class GrakoSemantics(ModelBuilderSemantics): def __init__(self, grammar_name): super(GrakoSemantics, self).__init__( baseType=grammars.Model, types=grammars.Model.classes() ) self.grammar_name = grammar_name self.rules = OrderedDict() ... 

Classes are synthesized if they are not present in the types= parameter. All that ModelBuilderSemantics requires is that each grammar rule contains a parameter that gives the class name for the corresponding Node :

 module::Module = .... ; 

or,

 module(Module) = ... ; 

Step 3 is inevitable because the translation must be specified "somewhere." The Grako method allows you to use the str patterns specified on the line with the dispatcher executed by CodeGenerator , which is my preferred way of doing the translation. But I use grako.model.DepthFirstNodeWalker when I just need to extract information from the model, for example, when creating a symbol table or calculations.

Step 3 cannot be automated, since comparing the semantics of the source language with the semantics of the target language requires mental power, even when the source and target objects are the same.

You can also get along with the JSON-like Python structure that parse() or grako.model.Node.asjson() generates (AST) as you suggest, but the processing code will be filled with if-then-elseif to distinguish one dictionary from another or one list of another. With models, each dict in the hierarchy has a Python class as a type.

In the end, Grako does not impose a way to create a model of the analyzed file and not transfer it to something else. In it, the basic Grako form provides only a syntax tree (CST) or abstract syntax tree (AST), if it is reasonable to use element naming. Everything else is done by a separate class of semantics, which can be anything you want.

+4
source

Source: https://habr.com/ru/post/1243618/


All Articles