Grammar parsing tool in AST (or .y + .lang => xml)

Given the lexer definition file, the grammar file (say, postgresql .y , .l flex and the program bison from it the source tree) and the file defined by these lexer and parser (say, SQL query) to get the AST in some standard form (e.g. , JSON XML).

The most important aspect of this tool is the flexibility of the input format. In my example, I could recreate the SQL postgrace grammar in ANTLR - but I don't want to. I would prefer to use only what postgres uses. So even if the .y file contains more than the parsing rules - the tool I'm looking for will be able to understand them with a few changes.

Is there a common tool that does this?

Here's a command line session with my imaginary ly2xml tool:

 $ git clone git://postgres-git-url pg $ find pg -iname *.[yl] -exec cp '{}' ~/ \; $ echo 'SELECT * FROM (SELECT 1)'|ly2xml -parser=*.y -lexer=*.l - -O- <SELECT> <ARGS>*</ARGS> <FROM> <SELECT><ARGS>1</ARGS></SELECT> </FROM> </SELECT> 

(note that - means that it reads from standard input, and -O- means that it writes to standard output).

+2
source share
1 answer

Good idea. You accept one or more of:

  a) that each tool that has a grammar, uses a canonical parsing engine type (eg, everybody uses bison) b) that there is some parsing tool that understands the zillion grammar specification schemes that exist c) that whatever the parser is, it will parse language fragments (perhaps well formed). 

a) is clearly false. I have never seen b). Virtually none of the syntax machines work c); they can only analyze "complete programs."

Your only hope IMHO is to use a parser generator that has a large number of well-tested language definitions.

ANTLR , possibly one; it certainly has a long list of language definitions provided. And they are all in one place. I do not know the language fragments that I know of. It is doubtful if it has XML export for all parsing trees.

Bison is perhaps one; There are many, many language processors built using Bison. But definitions are scattered everywhere, and it will be very difficult to collect them. Also, fragments of the language do not occur. Pretty sure it has no XML export.

Our DMS Software Reengineering Toolkit is perhaps one. It has many definitions of the language. All of them are collected in one place (our company). It produces an AST for each analysis and has built-in XML export. DMS can also analyze any language that is not a term for any language that it knows.

DMS can very well mimic your example, given DMS.lex, .atg ("attribute grammar") and a compatible source file.

This is followed by the DMS lexer / parser-build and starts with XML export for the Algebra grammar found in Algebra as a DMS domain ( ++ XML halfway, for example, is the parsing step proposed for exporting XML):

 C:\DMS\Domains\Algebra\Tools\Parser\Source>make perl /cygdrive/c/DMS/Executables/MakeDMSTool Algebra -lexer MakeDMSTool: Selected domain "Algebra". LexerGenerator V2.1a Copyright (c) 1999-2010 Semantic Designs, Inc.; All Rights Reserved Parsing lexical specification ... Processing mode Algebra ... Exiting with final status 0 perl /cygdrive/c/DMS/Executables/MakeDMSTool Algebra -tool %Temporaries MakeDMSTool: Selected domain "Algebra". Using attribute grammar in "/cygdrive/c/DMS/Domains/Algebra/Tools/Parser/Source/Syntax/Algebra.atg" AttributeEvaluatorGenerator V3.0 Copyright (c) 1999-2010 Semantic Designs, Inc.; All Rights Reserved Parsing attribute grammar ... Generating attribute evaluator(s) ... Exiting with final status 0 rm -rf /cygdrive/c/DMS/Domains/Algebra/Tools/%Temporaries perl /cygdrive/c/DMS/Executables/MakeDMSTool Algebra -prettyprinter MakeDMSTool: Selected domain "Algebra". PrettyPrinterGenerator V2.0 Copyright (c) 1999-2010 Semantic Designs, Inc.; All Rights Reserved Parsing pretty printer specification ... Generating pretty printer ... Exiting with final status 0 AttributeEvaluatorGenerator V3.0 Copyright (c) 1999-2010 Semantic Designs, Inc.; All Rights Reserved Parsing attribute grammar ... Generating attribute evaluator(s) ... ...................... Exiting with final status 0 cd /cygdrive/c/DMS/Domains/Algebra/Tools/Parser/Source/\%Generated; \ perl /cygdrive/c/DMS/Executables/MakeDMSTool Algebra -weave-preserve-productions %PreserveProductions.*.par MakeDMSTool: Selected domain "Algebra". perl /cygdrive/c/DMS/Executables/MakeDMSTool Algebra -parser MakeDMSTool: Selected domain "Algebra". export PARLANSEINCLUDEDIRECTORIES=`perl -e '($_ = $ARGV[0].";/cygdrive/c/DMS/Domains/PARLANSE/Library/Arrays;/cygdrive/c/DMS/Domains /PARLANSE/Library/Bags;/cygdrive/c/DMS/Domains/PARLANSE/Library/HashTables;/cygdrive/c/DMS/Domains/PARLANSE/Library/Pipes;/cygdrive/ c/DMS/Domains/PARLANSE/Library/Sequences;/cygdrive/c/DMS/Domains/PARLANSE/Library/Sets;/cygdrive/c/DMS/Domains/PARLANSE/Library/Stac ks;/cygdrive/c/DMS/Domains/PARLANSE/Library/Utilities;/cygdrive/c/DMS/Domains/PARLANSE/Library/Algorithms/Source;/cygdrive/c/DMS/Dom ains/PARLANSE/Library/Booleans/Source;/cygdrive/c/DMS/Domains/PARLANSE/Library/Characters/Source;/cygdrive/c/DMS/Domains/PARLANSE/Li brary/Graphics/Source;/cygdrive/c/DMS/Domains/PARLANSE/Library/HashTrees/Source;/cygdrive/c/DMS/Domains/PARLANSE/Library/Numbers/Sou rce;/cygdrive/c/DMS/Domains/PARLANSE/Library/References/Source;/cygdrive/c/DMS/Domains/PARLANSE/Library/SQL/Source;/cygdrive/c/DMS/D omains/PARLANSE/Library/Streams/Source;/cygdrive/c/DMS/Domains/PARLANSE/Library/SuffixTrees/Source;/cygdrive/c/DMS/Domains/PARLANSE/ Library/System/Source;/cygdrive/c/DMS/Domains/PARLANSE/Library/Search/Source;/cygdrive/c/DMS/Domains/PARLANSE/Library/TestSupport/So urce") =~ s!//(.)/!$1:/!g; $_ =~ s!/cygdrive/(.)/!$1:/!g; print $_' "/cygdrive/c/DMS/Domains/Algebra/Tools/Parser/Source;/cygdrive/c /DMS/Domains/Algebra/Tools/Parser/Source/Components;/cygdrive/c/DMS/Domains/Algebra/Tools/Parser/Source/%Generated;/cygdrive/c/DMS/D omains/DMSStringGrammar/Tools/DomainParser/Source;/cygdrive/c/DMS/Domains/Algebra/Tools/Lexer/Source;/cygdrive/c/DMS/Domains/Algebra /Tools/Lexer/Source/%Generated;/cygdrive/c/DMS/Domains/DMSLexical/Tools/DomainLexer/Source;/cygdrive/c/DMS/Infrastructure/HyperGraph /Source;/cygdrive/c/DMS/Domains"`; \ cd `echo /cygdrive/c/DMS/Domains/Algebra/Tools/Parser/Source`; \ nice /cygdrive/c/DMS/Domains/PARLANSE/Tools/Compiler/p0c.exe DomainParser.par PARLANSE0 Compiler V19.16.40 Semantic Designs, Inc. *** Confidential Information 128/485/133408 smallest/average/largest activation record/grain stack space required. Largest stack space required by function at Line 1533 in file FFIModule.par 89 grains. 3775 functions/procedures. 223447 lines of source code read. 7160772 bytes of object code. No errors detected. mv -f /cygdrive/c/DMS/Domains/Algebra/Tools/Parser/Source/DomainParser.P0B /cygdrive/c/DMS/Domains/Algebra/Tools/Parser/DomainParser .P0B C:\DMS\Domains\Algebra\Tools\Parser\Source>run ../DomainParser ++XML C:\DMS\Domains\Algebra\Tools\Lexer\TestCase\algebraformula.txt Domain Parser for Algebra 2.3.3 Copyright (C) Semantic Designs 1996-2010; All Rights Reserved 31 tree nodes in tree. <DMSForest> <tree node="formula" type="1" domain="1" id="10qx0" parents="0" line="1" column="1" file="1"> <tree node="product" type="4" domain="1" id="10qwx" line="1" column="1" file="1"> <tree node="term" type="10" domain="1" id="10qwy" line="1" column="1" file="1"> <tree node="'D'" type="19" domain="1" id="10qw5" literal="0" line="1" column="1" file="1"/> <tree node="'['" type="20" domain="1" id="10qw6" literal="0" line="1" column="2" file="1"/> <tree node="formula" type="1" domain="1" id="10qwt" line="1" column="4" file="1"> <tree node="product" type="4" domain="1" id="10qws" line="1" column="4" file="1"> <tree node="term" type="9" domain="1" id="10qwr" line="1" column="4" file="1"> <tree node="'('" type="17" domain="1" id="10qw7" literal="0" line="1" column="4" file="1"/> <tree node="formula" type="3" domain="1" id="10qwp" line="1" column="5" file="1"> <tree node="formula" type="2" domain="1" id="10qwk" line="1" column="5" file="1"> <tree node="formula" type="1" domain="1" id="10qwf" line="1" column="5" file="1"> <tree node="product" type="5" domain="1" id="10qwe" line="1" column="5" file="1"> <tree node="product" type="4" domain="1" id="10qwa" line="1" column="5" file="1"> <tree node="term" type="7" domain="1" id="10qw9" line="1" column="5" file="1"> <tree node="VARIABLE" type="15" domain="1" id="10qw8" line="1" column="5" file="1"> <literal>x</literal> </tree> </tree> </tree> <tree node="'*'" type="13" domain="1" id="10qwb" literal="0" line="1" column="7" file="1"/> <tree node="term" type="8" domain="1" id="10qwd" line="1" column="8" file="1"> <tree node="NUMBER" type="16" domain="1" id="10qwc" literal="23" line="1" column="8" file="1"/> </tree> </tree> </tree> <tree node="'+'" type="11" domain="1" id="10qwg" literal="0" line="1" column="10" file="1"/> <tree node="product" type="4" domain="1" id="10qwj" line="1" column="12" file="1"> <tree node="term" type="7" domain="1" id="10qwi" line="1" column="12" file="1"> <tree node="VARIABLE" type="15" domain="1" id="10qwh" line="1" column="12" file="1"> <literal>y</literal> </tree> </tree> </tree> </tree> <tree node="'-'" type="12" domain="1" id="10qwl" literal="0" line="1" column="13" file="1"/> <tree node="product" type="4" domain="1" id="10qwo" line="1" column="14" file="1"> <tree node="term" type="7" domain="1" id="10qwn" line="1" column="14" file="1"> <tree node="VARIABLE" type="15" domain="1" id="10qwm" line="1" column="14" file="1"> <literal>z</literal> </tree> </tree> </tree> </tree> <tree node="')'" type="18" domain="1" id="10qwq" literal="0" line="1" column="15" file="1"/> </tree> </tree> </tree> <tree node="','" type="21" domain="1" id="10qwu" literal="0" line="1" column="16" file="1"/> <tree node="VARIABLE" type="15" domain="1" id="10qwv" line="1" column="18" file="1"> <literal>x</literal> </tree> <tree node="']'" type="22" domain="1" id="10qww" literal="0" line="1" column="19" file="1"/> </tree> </tree> </tree> <FileIndex> <File index="1">C:/DMS/Domains/Algebra/Tools/Lexer/TestCase/algebraformula.txt</File> </FileIndex> <DomainIndex> <Domain index="1">Algebra</Domain> </DomainIndex> </DMSForest> Exiting with final status 0 C:\DMS\Domains\Algebra\Tools\Parser\Source> 

If you really need an engine that understands many grammar notations, the easiest way is to create one with DMS. Simply define each of the grammar formalisms (e.g. ANTLR or bison) as DSL for DMS, parse a specific instance of the grammar formalism (e.g., bnf ANLTR) using DMS, apply the DMS rewrite rules to convert it to DMS grammar, and then build the parser DMS (You will also have to do the same with the lexer.).

+3
source

Source: https://habr.com/ru/post/1447421/


All Articles