Using clang to parse C ++ code

Question

Using clang to parse C ++ code

We want to make a fairly simple analysis of C ++ user code, and then use this information to encode our code (basically, edit our code using the toolkit code) so that the user can perform a dynamic analysis of his code and get statistics on the types of values of certain numeric types.

clang should be able to handle enough C ++ now to process the code that our users will throw on it, and since clang C ++ coverage is constantly improving by the time we finish, it will be even better.

So how can clang be used as a standalone parser? We think that we can just create an AST and then take a walk looking for class objects that we are interested in tracking. It would be interesting to hear from others who use clang without LLVM.

+4

c ++ clang

aneccodeal Mar 11 '10 at 4:54

source share

2 answers

What you did not indicate is what “analyzes” you wanted to do. Most C ++ analyzes require that you have exact character table data, so when you come across a foo character, you have an idea of what it is. (Technically, you don’t even know that + without such a character table!) You also need general type information; if you have the expression "a * b", what is the type of result? Having "name and type" information is the key to almost everything you want to do for analysis.

If you insist on clang, then there are other answers. I do not know that it provides name and type resolution.

If you need name and type resolution, then another solution would be the DMS Software Reengineering Toolkit . DMS provides a universal compiler, such as an infrastructure for parsing, analyzing, converting, and parsing (regenerating source code from compiler data structures). DMS Industrial-strength C ++ front end (also has many other interfaces) provides the full name and type resolution in accordance with the ANSI standard, as well as the GCC and MS VC ++ dialogs.

Code conversions can be implemented through the abstract syntax tree interface provided by DMS, or by program control rules controlled by patterns written in the surface syntax of your target language (in this case, C ++). Here's a simple conversion using the rule language:

  domain Cpp~GCC3; -- says we want patterns for C++ in the GCC3 dialect rule optimize_to_increment(lhs:left_hand_side):expression -> expression " \lhs = \lhs + 1 " -> " \lhs++" if no_side_effects(lhs).

This implicitly affects ASTs built by DMS to modify them. The conditional allows you to learn about arbitrary properties of template variables (in this case, lhs), including name and type restrictions, if you wish.

DMS has been used many times for very complex program analysis and C ++ code conversion. We are creating C ++ testing tools using the C ++ code toolkit in a fairly obvious way using DMS. The website has a documentary library describing how DMS was used to restructure the architecture of a large line of military aircraft software. This type of activity literally pours C ++ into one architectural form in another, using a large number of patterns of directional transformations, such as above.

Most likely, it will be very easy for you to implement your toolkit. And you do not have to wait until it ripens.

+1

Ira Baxter Mar 11 '10 at 5:26

source share

Eli bendersky · Accepted Answer · 2010-03-11T04:59:32+0000

clang designed for modular operation. Quote from your page:

The main design concept for clang is its use of library architecture. In this design, various parts of the interface can be cleanly divided into separate libraries, which can be mixed for different needs and uses.

Take a look at clang libraries like libast for your needs. More details here .

Using clang to parse C ++ code

More articles: