Search for duplicate source code

I am analyzing some old code. This is about 80,000 lines of old plsql code. At first glance, there is quite some duplication in the source that needs to be removed. Instead of doing a diff manual and looking at each file, there should be some tool / command line to detect duplicate lines of source code.

My goal is to give a reasonable assumption about the minimum size of the source rewriting and how much factual knowledge was recorded in this program. I wrote some basic static code analyzer to find the number of IF IFSE FOR etc and Functions control statements in each file. But duplicate code still needs to be removed from my statistics.

+4
source share
4 answers

Have you looked at Simian - affinity analyzer ? (Just tested, and it is no longer free, but it is available for 15 days for evaluation purposes.)

Simian (affinity analyzer) identifies duplication in Java, C #, C, C ++, COBOL, Ruby, JSP, ASP, HTML, XML, Visual Basic, Groovy source code and even text files. In fact, a monkey can be used on any human readable file, such as ini files, deployment descriptors, you name it.

I used it in practice and it works well.

+3
source

Sonar has duplication detection and claims to support PL / SQL, although I have never used it for this.

0
source

You will need to ask / do / steal / write a plsql parser and compare the resulting abstract syntax trees. With the size of the code base you have, which may be useful. After you are done, other analyzer tools will be used.

0
source

How about this:

http://sourceforge.net/projects/sddforeclipse/

It is open source and is said to be used by commercial software. By the way, this is a plugin for Eclipse.

0
source

Source: https://habr.com/ru/post/1335990/


All Articles