The most efficient way to parse strings of type C definition?

I have a set of function definitions written in C, with some additional keywords that can be placed before some arguments (for example, the same as “unsigned” or “register”, for example), and I need to parse these lines, as well as some function stubs and generate actual C code from them.

  • Is it right that Flex / Yacc is the right way to do this?

  • Will it be slower than writing a shell or Python script using regular expressions (which can be a big pain, as I believe, if the number of additional keywords becomes more and their effects are completely different), provided that I have no experience with parsers / parsers (although I know how LALR does its job)?

  • Are there any good Lex / Yacc materials that cover similar issues? All the documents that I could find use the same primitive example of a toy calculator.

Any help would be appreciated.

+5
parsing flex-lexer yacc
Apr 27 '09 at 6:48
source share
5 answers

ANTLR is commonly used (like Lex \ Yacc).

ANTLR, another tool for Recognition Language, is a language tool that provides the foundation for building recognizers, translators, compilers, and grammar description translators containing actions in a variety of target languages.

+3
Apr 27 '09 at 6:51
source share

There is also Lemon Parser , which uses less strict grammar. On the other hand, you are married to a lemon, rewriting the parser grammar to something else when you find some kind of restriction. The upside is very easy to use and self-contained. You can leave it in a tree and not worry about checking the presence of others.

SQLite3 uses it, like other popular projects. I'm not saying I use it because SQLite does, but it might try if time permits.

+3
Apr 27 '09 at 8:47
source share

It completely depends on your definition of "effective." If you have all the time in the world, the fastest parser will be a handwritten parser. They take a long time to debug and develop, but today no parser outperforms handwritten code in terms of runtime performance.

If you want something that can analyze real C for a week or so, use a parser generator. The code will be fast enough, and most parser generators will have a grammar for C, which you can use as a starting point (avoiding 90% of common mistakes).

Note that regular expressions are not suitable for analyzing recursive structures. This approach will be slower than using a generator and more error prone than a handwritten parser.

+1
Apr 27 '09 at 8:38
source share

in fact, it depends on how complex your language is and whether it is really close to C or not ...

However, you can use lex as a first step even for regular expression ....

I would go for lex + menhir and o'caml ....

but any flex / yacc combination would be great.

The main problem with a regular bison (yacc implementation in gnu) is related to type C. You must describe your entire tree (and all manipulation functions) ... Using o'caml would be really simpler ...

+1
Jun 06 '09 at 1:57
source share

For what you want to do, our DMS Software Reengineering Toolkit is probably a very effective solution.

DMS is specifically designed to support client analyzers / generators of the type of code you are discussing. It provides very powerful capabilities for defining arbitrary language parsers / analyzers (tested in 30+ real languages, including several full dialects of C, C ++, Java, C # and COBOL).

DMS automates the construction of AST (so you don’t need to do anything except obtain the right grammar to have usable AST), allow you to create individual analyzes of the inspections directed at you, can design new C-specific ASTs that represent the code that you want to generate , and spit them out as C.'s compiled source code. The pre-existing C definitions for DMS can be bent to cover your C-like language.

-one
Jan 24 '10 at 11:30
source share



All Articles