Can I use ANTLR for unprocessed code?

I am going to write a parser for OpenEdge (4GL database language), and I would like to use ANTLR (or the like).

There are two reasons why I think this could be a problem:

  • OpenEdge is a 4GL database language that allows you to create constructs such as:

    assign customer.name = 'Customer name' customer.age = 20 . 

    Where . at the end is a line separator, and this statement combines the assignment of two database fields. OpenEdge has many more such constructs:

  • I need to save all the details of the source files, so I cannot extend the preprocessor instructions before I can parse the file, therefore:

     // file myinc.i 7 * 14 // source.p assign customer.age = {myinc.i}. 

    In the above example, I need to keep the fact that customer.age was assigned using {myinc.i} instead of 7 * 14 .

Can I use ANTLR for this or do I need to write my own parser?

UPDATE:
I need this analyzer not to create an executable file from it, but rather to analyze the code. That's why I need an AST to contain the fact that include was used.

+4
source share
5 answers

Just to clarify: ANTLR is not a parser, but a parser generator.

You either write your own parser for the language, or write a grammar for it (ANTLR), and ANTLR generate a lexer and a parser for you. You can mix custom code in your grammar to keep track of your assignments.

So the answer is: yes, you can use ANTLR.

Note. I am not familiar with OpenEdge, but SQL grammars are usually difficult to write a parser or grammar. Take a look at the ANTLR wiki to see that there is no trivial task to write it from scratch. You have not mentioned this, but I assume that you looked at existing parsers that can parse your language?

FYI: you may already have this, but there is a link to the documentation here, including the BNF grammar for the OpenEdge SQL dialect: http://www.progress.com/progress/products/documentation/docs/dmsrf/dmsrf.pdf

+3
source

The solution lies within the architect of OpenEdge. You should check the openge archive jar files (C: \ Progress \ OpenEdge \ oeide \ eclipse \ plugins \ com.openedge.pdt.core_10.2.1.01 \ lib \ progressparser.jar)

Here you will find parser classes. They are fully related to Eclipse, but I am separated from the eclipse framework and it works. Progressparser uses antlr, and the antlr file can be found in the following folder ... C:. \ Progress \ OpenEdge \ oeide \ eclipse \ Plugins \ com.openedge.pdt.core_10.2.1.01 \ oe_common_services.jar

Inside this file you will find the antlr definition (check for openge.g).

Good luck. If you want an isolated eclipse just let me go.

+2
source

Do you know that there is already an open source parser for OpenEdge / Progress 4GL? It is called Proparse , written using ANTLR (it was originally manually encoded in OpenEdge, but eventually converted to ANTLR). It is written in Java, but I think you can run it in C # using IKVM .

License is an Eclipse license, so it is convenient for business.

+1
source

You can do the same thing that C preprocessor does - expand your grammar with some pragmas that specify the source location, and let your preprocessor generate code filled with these pragmas.

0
source

The problem with multiple assignments is quite easy to process in grammar. Just allow multiple assignments:

 assign_stmt = 'assign' assignments '.' ; assignements = ; assignments = assignments target '=' expression ; 

One way you can use is to increase the grammar to allow preprocessor token sequences wherever a nonterminal can be resolved, and simply not perform preprocessor expansion. For your example, you have a grammar rule:

 expression = ... ; 

just add a rule:

 expression = '{' include_reference '}' ; 

This works to the extent that the preprocessor is not offensively used to create multiple lanaguage elements that span nonterminal boundaries.

What anlaysis code do you intend to do? To a large extent, in order to do something, you will need to name and print the resolution, which will require an extension of the preprocessor directives. In this case, you will need a more complex scheme, because you need an extended tree for name resolution, as well as the need to include related information to the side.

Our DMS Software Reengineering Toolkit has an OpenEdge parser, in which we present the trick of the previous "save file links" links. The DMS C parser adds a “macro node” to the tree, where the macro (OpenEdge) includes “just a fun way to write a macro definition”), the child nodes contain the tree, as you expect, and the reference information that accesses the macro definition. This requires some careful organization and a large number of special layouts of the nodes where they occur.

0
source

Source: https://habr.com/ru/post/1337460/


All Articles