Antlr generates ast for c and parses ast

I do static analysis on a c-program. And I'm looking for an antlr website, there seems to be no corresponding grammar file that ast creates for the c program. This means that I have to do it myself from the very beginning. There is a faster method. I also need a tree parser that can traverse the ash created by the parser.

+2
source share
2 answers

You indicated that you want to perform a static analysis to detect buffer overflows.

First, writing a grammar for C is harder than it sounds. Everything there is in the standard, and then there that the compilers really accept. And you must decide what to do with the preprocessor (and it depends on the compiler on the compiler!). If you do not get the correct grammar and preprocess, you will not be able to analyze real programs. (If you want to play in toy languages, that’s fine, but then you don’t need C grammar).

To do the analysis, you will need much more machines than AST. You will need symbol tables, control and analysis of the data flow, probable local and global points - analysis, call graph extraction and some type of range analysis.

People just don't get it.

** OBTAINING A PASSIR - A LONG WAY FROM ANY USER WITH REAL LANGUAGES **

I scream because I see it again and again and again.

If you want to perform a specific task of analyzing or transforming a program, if you do not want to die of old age before you begin your task, you better find the foundation that you need most. The base of a squeaky grammar parser generator is not the foundation. (Don't get me wrong: ANTLR, YACC, JavaCC are all great parser generators, and they are great for creating a parser for a new language. They are great for implementing parsers for real langauges when creating an investment. But they produce parsers, and mostly people do not engage in the production part, and they do not provide additional equipment with a long shot.)

Our DMS Software Reengineering Toolkit contains all of the above mechanisms, because it is almost always necessary, and this is a royal headache for implementation. (My team has invested 15 years.)

We also created an instance of this mechanism, which is especially useful for COBOL and Java, C, C ++ (to a slightly lesser extent, the language is really difficult), in different dialects, so that others do not need to repeat this lengthy process.

GCC and Clang are quite mature for C and C ++ as alternatives.

+4
source

The hardest part is writing a grammar. Mixing rewrite rules to create an AST is not that difficult, and creating a tree grammar from a parser grammar that emits AST is also not too difficult (compared to writing a parser grammar).

Here's the previous Q & A that shows how to create the right AST: How to derive an AST constructed using ANTLR?

And I could not find a decent SO-Q & A that explains how to go about creating a grammar tree, so here is a link to my personal blog that explains this: http://bkiers.blogspot.com/2011/03/6- creating-tree-grammar.html

Good luck.

+2
source

Source: https://habr.com/ru/post/1380695/


All Articles