Static analysis for partial C ++ programs

I am going to do some kind of static analysis project on C ++ samples code, unlike whole programs. In the general case, static analysis requires some simpler intermediate representation, but such a representation cannot be precisely created without all the program code.

Nevertheless, I know there is such a tool for Java - it basically "guesses" the missing information and, thus, allows static analysis even though it no longer sounds and does not end.

Is there anything similar that can be used to convert partial C ++ code to some intermediate form (e.g. LLVM bytecode)?

+4
source share
4 answers

Generally, if you assume you are mistaken; any complaints from the static analyzer based on such guesses are false positives and will tend to cause a high rejection rate.

If you insist on guessing, you will need a tool that can analyze arbitrary fragments of C ++. ("Guess the static analysis of this method ..."). Most C ++ parsers will only parse complete source files, not fragments.

You will also need a way to create partial symbol tables. ("I am indicated as an argument to FOO, but does not have type information, and this is not the same as in the statement following the call to FOO").

Our DMS Software Reengineering Toolkit with its C ++ Front End can parse fragments and can be used as a springboard for partial symbol tables.

DMS provides general code analysis / analysis / conversion as defined by the explicit langauge definition provided by DMS. C ++ Front End provides a complete, reliable C ++ interface that allows the DMS to analyze C ++, build AST, and create symbol tables for such ASTs using an attribute grammar (AG) that encodes C ++ search rules. AG is a functional style calculation encoded in AST nodes; C ++ Character Table Builder is a great functional program whose parts are tied to BNF grammar rules for C ++.

As part of the general parsing mechanism, taking into account the definition of langauge (for example, the front end of C ++), DMS can analyze arbitrary (non) terminals of this language using the built-in langauge template. In this way, DMS can analyze expressions, methods, declarations, etc. Or any other well-formed piece of code and build AST. If an unformed fragment is provided, a parsing error is currently occurring in the parsing in the fragment analysis; it would be possible to expand DMS error recovery to create a plasabilic AST solution and thus analyze arbitrary elements.

The partial symbol table is more complex, since most of the equipment for constructing table symbols depends on other parts of the created symbol table. However, since all this is encoded as AG, it is possible to execute the AG part related to the fragment being analyzed, for example, the logic for constructing the symobl table for the method. AG will probably need to be modified extensively to allow it to work with “assumptions” about missing character definitions; they would actually become restrictions. Of course, a missing character can be any of several things, and you can get a configuration of possible character tables. Consider:

{ int X; T*X; } 

Not knowing what T is, the type of phrase (and even its syntactic category) cannot be uniquely determined. (DMS will analyze T * X and report an ambiguous analysis, since there are several possible interpretations of the correspondence, see Why C ++ cannot be parsed by LR (1)? )

We have already done some partial partial parsing and partial symbol tables in which we experimentally used DMS to capture code containing preprocessor conventions with some conditional status undefined. This forces us to create conditional character table entries. Consider:

 #if foo int X; #else void X(int a) {...} #endif ... #if foo X++; #else X(7); #endif 

With conditional characters, this code can enter validation. A character table entry for X says something like: "X ==> int if foo else ==> void (int)".

I think that the idea of ​​reasoning about large fragments of a program with restrictions is great, but I suspect that it is really complicated, and you will forever try to resolve enough information about the restriction to do a static analysis.

+6
source

Understand 4 C ++ from SciTools is a product that analyzes source code and provides metrics for various things. As a tool, the product is similar to the source code browser, but I personally do not use it for this, since the Intellisense visual studio is just as good.

Its real strength lies in the fact that it comes with the C and Perl APIs . Thus, using this, you can write your own static analysis tools. And yes, this will work well with missing code files. Also, understand that 4 C ++ works with Windows and a number of other operating systems.

As for your last question about intermediate code, Understand 4 C ++ does not provide you with an “intermediate” form, but with its API, it provides you with an abstraction layer over an abstract syntax tree that gives you a lot of power to parse the source code. I wrote many tools at my work using this API, and a C ++ managed API ( which I wrote and publicly published in codeplex ) that wraps its own C API.

+1
source

I don’t know about LLVM bytecode, but there is an old proverb called PcLint

http://www.gimpel.com/html/index.htm

they even have an online testing module where you can post code snippets

0
source

You can check this out:

What C ++ open source static analysis tools are available?

this also applies to the same issue, and some solutions are proposed there. It can be useful!

0
source

Source: https://habr.com/ru/post/916357/


All Articles