Good C / C ++ Parser / Analyzer Tools

What are some good tools to quickly run parsing and parsing C / C ++ code?

In particular, I am looking for open source tools that handle the preprocessor and C / C ++ language. Preferably, these tools will use lex / yacc (or flex / bison) for grammar, rather than being too complex. They should handle the latest ANSI C / C ++ definitions.

Here is what I have found so far, but have not looked at them in detail (thoughts?):

  • CScope is an old-school C analyzer. However, it does not appear to be performing a full analysis. Described as an illustrious "grep" for finding C functions.
  • GCC - All your favorite open source compilers. Very difficult, but it all seems to be. There is a related project for creating GCC extensions called GEM , but has not been updated since GCC 4.1 (2006).
  • PUMA - PUre MAnipulator. (from the page: “The goal of this project is to provide a class library for analyzing and manipulating C / C ++ sources. For this purpose, PUMA provides classes for scanning, parsing and, of course, manipulating C / C ++ Sources.”). It looks promising, but has not been updated since 2001. Obviously, PUMA was included in AspectC ++ , but even this project hasn been updated since 2006.
  • Various source C / C ++ grammars. You can get c-C ++ - grammars-1.2.tar.gz , but that has been the same since 1997. A small Google search pulls up other major lex / yacc, which can serve as a starting point.
  • Any other?

I hope to use this as a starting point for translating the C / C ++ source into a new toy language.

Thank! Matt

(Added 2/9): Just a clarification: I want to extract semantic information from the preprocessor in addition to the C / C ++ code itself. I do not want "#define foo 42" to disappear into the integer "42", but remain attached to the name "foo". This, unfortunately, excludes several solutions that first start the preprocessor, and only deliver the C / C ++ parsing tree.

+50
c ++ c parsing lex yacc
Feb 09 '09 at 0:48
source share
14 answers

C ++ parsing is extremely complex because the grammar is unsolvable. To quote Yossi Kreinin :

Outstanding Complex Grammar

“Outstanding” should be interpreted literally, because all popular languages ​​have context-free (or “almost” context-free) grammars, while C ++ has an unsolvable grammar. If you like compilers and parsers, you probably know what that means. If you are not doing this, there is a simple example showing a problem with C ++ parsing: AA BB(CC); object definition or function declaration? It turns out that the answer largely depends on the code before the expression - "context". This shows (on an intuitive level) that the C ++ grammar is quite context-sensitive.

+35
Feb 09 '09 at 1:38
source share

You can see the clang that llvm uses for parsing.

C ++ support completely link

+21
Feb 09 '09 at 10:04
source share

The ANTLR parser generator has a grammar for C / C ++ as well as a preprocessor. I have never used it, so I can’t say how complete its C ++ parsing will be. ANTLR itself was a useful tool for me several times to analyze simpler languages.

+17
Feb 09 '09 at 3:05
source share

Depending on your problem, GCCXML might be your answer. It basically parses the source using GCC, and then gives you an easily digestible XML parsing tree. With GCCXML you do it once and for all.

+15
Feb 09 '09 at 1:25
source share

pycparser is a complete parser for C (C99) written in Python. It has a fully customizable AST database, so it is used as the basis for any language processing you might need.

Doesn't support C ++. Of course, this is much more complicated than C.




Update (2012) : this time, the answer will undoubtedly be Clang - it is modular, supports full C ++ (with many C ++ - 11 functions) and has a relatively friendly code base. It also has a C API for binding to high-level languages ​​(i.e. for Python ).

+13
Mar 14 '09 at 6:38
source share

See how Doxygen works, full source code is available, based on flexible settings.

A misleading candidate is GOLD, which is a free parser toolkit for Windows designed to create translators. Their list of supported languages ​​refers to the languages ​​in which parsers can be implemented, and not to the list of supported parsing grammars.

They only have grammar for C and C #, no C ++.

+8
Feb 09 '09 at 1:03
source share

Parsing C ++ is a very difficult task .

There was a Boost / Spirit framework, and a couple of years ago they made a game with the idea of ​​implementing a C ++ parser , but it is far from complete .

By fully and correctly analyzing the ISO C ++, it is far from trivial, and in fact there was a lot of related effort. But this is a difficult task that is not easy to accomplish without rewriting the full compiler interface, understanding all the C ++ and preprocessor. A pre-processor implementation called the “wave” is available from Spirit people.

However, you can take a look at pigs / oink (based on elsa), which is a C ++ parser toolkit specifically designed for use in source code conversion, it is used by the Mozilla project to conduct large-scale static analysis of source code and automatic rewriting of code, the most interesting is that it not only supports most of C ++, but also the preprocessor itself

On the other hand, there really is one single solution available: an EDG interface that can be used for almost all C ++ related efforts.

Personally, I would like to test the pig pig kit for elsa, which is used by Mozilla, in addition, the FSF has now approved work on gcc plugins using the runtime library license, so I would assume that the situation will change soon, as soon as people can easily use the parser Ccc based on gcc for such purposes using binary plugins.

So, in a nutshell: if you want: EDG, if you need something free / open source now : else / oink are pretty promising, if you have the time, you might want to use gcc for your project.

Another option is for C cscout code only .

+7
Mar 13 '09 at 20:02
source share

Grammar for C ++ is a kind of notorious hairy one. There is a good stream in Lambda about this , but the bottom line is that C ++ grammar can require arbitrarily much attention.

For what I can imagine, I would think of hacking Gnu CC or Splint . Gnu CC, in particular, pretty much separates part of the language generation, so you might be better off creating a new g ++ server.

+6
Feb 09 '09 at 0:55
source share

Actually, PUMA and AspectC ++ are still actively supported and updated. I studied the use of AspectC ++ and wondered if there were no updates myself. I sent an email to the author, who said that both AspectC ++ and PUMA are still being developed. You can go to the source code via SVN https://svn.aspectc.org/repos/ or you can get regular binary assemblies at http://akut.aspectc.org . As with many excellent C ++ projects nowadays, the author does not have time to keep up with the maintenance of web pages. It makes sense if you have full work and life.

+4
Aug 30 '10 at 20:46
source share

Elsa is superior to everything else that I know is hands on C ++ parsing, although it doesn't match 100%. I'm a fan. There is a module that prints C ++, so that might be a good starting point for your toy project.

+3
Feb 09 '09 at 10:02
source share

how about something easier to understand like tiny-C or Small C

+2
Feb 09 '09 at 0:59
source share

See our C ++ Front End for a full-featured C ++ analyzer: builds AST, character tables, names and resolution by type. You can even parse and save the directive preprocessor. The front end of C ++ is built on top of our DMS Software Reengineering Toolkit , which allows you to use this information to perform arbitrary changes to the source code using source-to-source transformations.

DMS is the ideal engine for implementing such a translator.

Having said that, I do not see much point in your imaginary task; I don’t see much value when trying to replace C ++, and you will find the building a complete translator is a huge job, especially if Goal is a “toy” language. And, probably, there is little point in parsing C ++ using a reliable analyzer if its only goal is to create an isomorphic version of C ++ that is easier to parse (wait, we postulated strong C ++ already!).

EDIT May 2012: DMS C ++ front end now handles GCC3 / GCC4 / C ++ 11, Microsoft VisualC 2005/2010. Reliable.

EDIT Feb 2015: Now handles C ++ 14 in GCC and MS dialects.

EDIT August 2015: now parses and captures both code and preprocessor directives in a single tree.

+2
Jun 17 '09 at 15:06
source share

A back I tried to write a tool that will automatically generate unit tests for c files.

For preprosessing, I put files through GCC. The result is ugly, but you can easily track where in the source code from a pre-processed file. But for your needs you may need something else.

I used Metre as the basis for parsing C. It is open source and uses lex and yacc. This made it easy to get up and run in a short time without a full understanding of lex and yacc.

I also wrote a C application, as the solution lex and yacc could not help me track the functionality between functions and analyze the structure of the entire function in one pass. It became indispensable in a short time and was abandoned.

+1
Feb 09 '09 at 9:55
source share

How about using a tool like GNU CFlow , which can analyze code and create call graph charts, this is what the opengroup (man page) has to say about cflow. The GNU version of cflow comes with open source and source code ...

Hope this helps, Regards, Tom.

+1
Jan 24 '10 at 12:00
source share



All Articles