Generate C ++ character list for bulk refactoring / renaming

Background

I inherited a legacy 60kloc g ++ project that I would like to reorganize to provide a consistent naming convention throughout the project.

Question

Is there a free open source static analyzer that can generate a list:

  • global characters
  • class names
  • member methods (public / protected / private, if possible)
  • member variables
  • static methods
  • local characters (probably ignore them)
  • any other characters that I may have missed, but may affect the code reader.

An approach

I intend to use vim to edit the generated list of characters, and then use a Ruby script to do a very crude search and replace / match characters so that at least the naming conventions are consistent.

The procedure is a little ugly, and I expect the original compilation to fail, but I don't mind going through and fixing problems manually if I can have a more readable code set.

What tools are used by developers of large C ++ codebases for such refactoring?

+4
source share
2 answers

C ++ automatic refactoring is extremely difficult, in part, with a preprocessor (macros and file inclusion), but this is mainly due to the interdependence between parsing, searching for names and the rest of the semantic analysis phase (creating a template, constant expression, overload resolution, etc.). etc., etc.). On the very large C ++ codebases that I worked on, automatic refactoring is simply not performed, and due to the inherent complexity, the quality of the refactoring tools is poor.

With the advent of clang, though, which has a modular interface, so you can access AST better than other tools, there may be some better refactoring tools based on it - but I would not hold my breath.

Take a look at clang's AST dump, maybe you can write a script in XML to give you a dump, which can be a starting point for manual refactoring.

+1
source

Op wants to perform mass renaming, for example, generate a list of names, and then rename many of them in a large source code database.

The refactoring tool that was good at this is the choice if it can find it.

A strange but possibly effective alternative: a tool for obfuscating C ++ source code.

Our company offers one of them that does the following (yes, it will be wrong for the task!):

  • comments in the comments
  • loss formatting
  • replaces identifiers sequentially with scrambled names (response seed!)
  • creates an identifier card (list of names "identifier → scrambled_identifier") as a result for all identifiers.

This process applies to files without preprocessing.

Thus, in reality this is a massive renaming tool. And renaming to bad names is his goal, but he can be abused by renaming to good names.

In fact, what it takes as input is an identifier mapping (possibly empty, of course, at the first start, usually taken from consecutive obfuscation obfuscations), and it renames the identifiers that it finds on this map according to the map, and the identifiers that it can't find with new scrambled names.

If you give him the full map, you have full control over the names that he renames.

So, to use it for mass renaming, the following process should work:

  • Run obfuscator, get an id card. Discard the result of the source text.
  • Reconsider the identifier card as identifier - identifier. This is a 30 second task with a decent editor like Emacs. If you use this modified map without changes, the obfuscator renames each character to itself, for example, nothing is renamed. Replacing "identifier → foo" with only "identifier" is handled by the tool "identifier →".
  • (Sort then) view the list of identifiers. Select new names for some identifiers. Change the list accordingly: "bad_identifier_1 → better_identifier_1"
  • Restart the obfuscator using the revised card. Your bad_identifiers will be replaced.

Sorry, what about comments and formatting: -?

Well, there is a command line switch that essentially says "don't drop comments." As for formatting, the obfuscator remarkably includes a source code formatter. Just run it a second time as formatting. Voila, renamed code with a nice format.

Cautions:

  • formatter cannot handle some incorrectly installed conditional preprocessors; most C ++ code does not have this, and what is, can usually be changed by editing a single line.
  • Obfuscator does not highlight areas. Given me → J, it will rename all instances of I to J.
  • the obfuscator will not detect stupid renames. If you rename I → J and rename K → J, if this renaming is detrimental to your program, the obfuscator will not tell you. (This renaming may work, depends on your code and where I and K are used). This is easy to avoid: do not create a card with the same name as the name. This means that you should not rename identifiers that appear in system files; You can rename identifiers that appear in your applications, including files.

If there was enough interest, minor changes on our part could save the formatting and comments directly.

The best part about this klunky process is that you can experiment to get the rename list correctly; you only need to save the final "obfuscation / formatting" result. You can, of course, rename many things in groups by starting this process, one for each step. Recompiling after each cycle is highly recommended: -}

You can use this process to rename one identifier at a time, but I think a regular editor will serve you well.

If the OP just wanted a list of names, he could obviously stop after the first obfuscation and escape with an ID card.

No, this is not a regexp-replace-string handle; it uses the full C ++ 11 lexer, so it is not confused by the contents of string literals or comments. In terms of formatting, the full C ++ parser is actually used (11).

+1
source

Source: https://habr.com/ru/post/1486459/


All Articles