What tools can help translate (like in French & # 8594; English, and not in C ++ & # 8594; java) source code?

I have code written in French, that is, variables, class, function have names in French. Comments are also written in French. I would like to translate the code into English. This will be a rather difficult task, since this is a project of the 18K line, and I would like to know if there is any tool that could help me, especially with the names of variables / classes / functions, since this will be an error prone to renaming them.

Are there any tools that can help me? Advice?

edit: I am not looking for machine translation. I am looking for a tool to help me translate the code. Say there is a cool name C, and this class has a TraverserLaRue method, and I rename it CrossTheRoad. I would like all TraverserLaRue links in all files to be translated as CrossTheRoad. However, I do not need to translate the TraverserLaRue method of class B.

+4
source share
7 answers

I assume that the langauge in question is one of the common ones, such as C, C ++, C #, Java, ... (You do not have a language with French keywords, I once met a fully Swedish version of Pascal, and I refused from this).

So you have two problems:

  • Translation of identifiers into source code
  • Comment translation

Since comments contain arbitrary text in natural language, you will need their arbitrary translation. I do not think you can find an automated tool for this.

Unlike others, I think that you have a good chance of translating identifiers and changing them massively.

SD creates the string source code "obfuscator" . These tools do not process the code as raw text, but process the source code in terms of the target language; they accurately distinguish identifiers from operators, numbers, comments, etc. In particular, they reliably work with the need only for identifiers.

One of the things these tools do is to replace one identifier name with another (usually a meaningless name) to make the code very hard to understand. Think abstractly about the identifier name map i → N. (They do other things, but that's not interesting here). Since you often want to re-obfuscate a file that has been changed, just like the original, these tools allow you to reuse the identifier card of the previous cycle, which is presented as a list of I → N pairs.

I think you can abuse it to do what you want.

Step 1: Run such an obfuscator on the source French code. This will create a text file containing all code identifiers, like a form map

I1 -> N1 I2 -> N2 .... 

You don't care about Ns, just me.

Step 2: Manually translate each French I into the English name E, which you consider the best. (I have no specific suggestions on how to do this, some of the other answers have suggestions here). Some of them are likely to be library calls and therefore are already correct. You can modify the text obfuscation map file as follows:

  I1 -> E1 I2 -> E2 

Step 3: Run the obfuscation tool and make it use the modified obfuscation map. It can be said to do this.

Viola, all identifiers of your code will be changed as you specified.

[You can get re-formatting of the source text as a freebie. These tools can also format code well. Your name changes are likely to spoil the indentation / spacing in the source code, so this is a good bonus].

+2
source

Any refactoring tool has a rename function. Many questions on SO address language refactoring tools .

You will have to process them manually for comments.

+2
source

I did this with German code a while ago, but had mixed results due to abbreviations in names, etc. Using regular expressions, I wrote a parser that removed all language-specific keywords and characters, then separated comments from the rest of the code, and now I had a lot of words that didn't necessarily mean anything to me. So I wrote a unique word search engine that added them to an ordered text file. The next stop were Google language tools that tried to translate every word in the list. I looked through the list to see if each word was actually translated, and if so, I replaced everything in the code with the English equivalent. Comments that I returned with a full translation if it works. I found that I had to speak with someone who understood "germic" to translate abbreviations, slang terms and elements of a mixed language. In short, regular expressions with a dictionary, if someone does not have a real tool for this, that will be interesting to me too.

0
source

You should definitely check out https://launchpad.net/rosetta

Ubuntu uses this to translate thousands of its packages, written in hundreds of programming languages, into hundreds of human languages, with updates for each new version. A truly witty task.

edit: ... to find out how Rosetta is used in Ubuntu: it modifies all the natural language strings that occur in the source code of open source applications, creating source packages for the language that, when compiled, produce the given types of binary files, Of course, it doesn’t edits the files themselves.

The first maintainers create “template files” that are something like “Patch with wildcards” - a set of rules about what and where in the source tree you need to translate, but there’s nothing to it. Rosetta then displays the lines to be translated and allows volunteer translators to provide translations into their language for each entry. Each entry can be discussed, modified, proposed and moderated. Statistics are provided on how much to translate, which translations are uncertain, which are missing. When the translation is completed, the patch of the given language is applied to the source, creating its version for the given language. Then the distribution is made from modified sources.

This allows translations to be made both for sources that use some external resources for a multilingual language, which allows you to change the language on the fly, and for those that have literal lines of the native language directly in the source code, mixed with business logic.

When a new version of the package is released, the template should be edited to include all new lines, but it has good automation to save existing ones. Of course, only translations for newlines are needed.

0
source

IMHO automated tools will not help here. Just translating variables and function names is not enough and will make the code worse, because they cannot make a conclusion about the intention of the original programmer when he chooses the variable name.

Depending on the programming language this code is written in, there are modern IDEs that can facilitate refactoring, but if you want to have good results, manually reviewing the code is mandatory.

0
source

A good IDE will be able to display classes, methods, variables. There are also tools for creating documentation that will do such as Javadoc for Java, Doxygen for many languages, etc.

To make an actual translation will not be a tool that will work well or even at a satisfactory level. The only way to get something worthy is to translate the translation in two languages. I have been doing freelance translations for many years, and I can say that trying to get the machine to translate is a waste of time. Many examples, word choices, will be relevant to your culture and not to another. And this is just the tip of the iceberg.

If you do not find someone who can do the translation, I suggest you give up the idea . Leave the source code as it is. If a non-French announcer reads this and needs to understand something, let them do a Google search. If they are native English speakers, they will probably be better at automatic translation than you, being French. When translating, you always want to translate into your native language.

0
source

To translate only comments, you can try this simple utility that I wrote (using the Microsoft Translator API): transource .

0
source

Source: https://habr.com/ru/post/1300830/


All Articles