Parameters for parsing / processing files in C ++

Therefore, I need to be able to parse some relatively simple C ++ files with annotations and generate additional source files from them.

As an example, I might have something like this:

//@ service struct MyService { int getVal() const; }; 

I will need to find the // @service annotation and get a description of the structure that follows it.

I am considering possibly using LLVM / Clang as it seems to support a library for embedding compiler / parsing functions in third-party applications. But I'm really pretty elusive about how much the source code understands, so I'm not sure exactly what I will need to look for or where to start.

I understand that AST is the core of language representations, and there is a support library for generating AST from source files in Clang. But comments would not really be part of AST law? So, what would be a good way to find a representation of the structure that follows a particular annotation of a comment?

I'm not too worried about handling cases where the annotation will appear in the wrong place, as it will only be used to parse C ++ files that are specifically written for this application. But of course, the more reliable I can do this, the better.

+4
source share
3 answers

One of the ways I do this is to annotate identifiers:

  • Classes
  • base classes
  • class members
  • transfers
  • numerators

eg:.

 class /* @ann-class */ MyClass : /* @ann-base-class */ MyBaseClass { int /* @ann-member */ member_; }; 

This annotation makes it easy to write a python or perl script that reads the header line by line and retrieves the annotation and its associated identifier.

Annotations and an associated identifier allow you to generate C ++ reflection in the form of function templates that intersect objects that pass base classes and members to a functor, for example:

 template<class Functor> void reflect(MyClass& obj, Functor f) { f.on_object_start(obj); f.on_base_subobject(static_cast<MyBaseClass&>(obj)); f.on_member(obj.member_); f.on_object_end(obj); } 

It is also convenient to create numeric identifiers (enumeration) for each base class and member and pass this to the functor, for example:

  f.on_base_subobject(static_cast<MyBaseClass&>(obj), BaseClassIndex<MyClass>::MyBaseClass); f.on_member(obj.member_, MemberIndex<MyClass>::member_); 

This reflection code allows you to write functors that serialize and de-serialize any type of object to / from several different formats. Functors use function overload and / or type inference for the proper treatment of various types.

+4
source

Parsing C ++ code is an extremely difficult task. Using a C ++ compiler can help, but it can be useful to limit yourself to a weaker domain-specific format, i.e. Generate source and additional C ++ files from a simpler view, for example, as prototype protobuf or SOAP WSDL files, or even simpler in your specific case.

+2
source

I recently did a very similar job. The research that I did showed that there were no ready-made solutions available, so I finished the manual work.

Other answers are dead regarding parsing C ++ code. I need something that could get ~ 90% of C ++ code correctly parsed; I ended up using srcML . This tool uses C ++ or Java source code and converts it into an XML document, which simplifies the analysis. He saves comments in tactics. In addition, if you need to do a source code conversion, it has a reverse tool that takes an XML document and creates the source code.

It works 90% of the time correctly, but it moves through complex template metaprogramming and the darkest corners of C ++ parsing. Fortunately, my input source code is pretty consistent in design (not much C ++ trick), so it works for us.

Other points to consider include gcc-xml and reflex (which actually uses gcc-xml). I'm not sure if GCC-XML saves comments or not, but it saves GCC attributes and pragmas.

The last element to watch is a blog when writing GCC plugins written by the author of the ODB CodeSynthesis tool.

Good luck

+1
source

Source: https://habr.com/ru/post/1395198/


All Articles