What you are looking for is called a "clone detector." You can do this in source code or object code. The basic idea is to decide which points of variability you want to accept.
You can read about our cloneDR detector , which finds duplicated code, comparing the syntax trees of the source files, skip matches. This happens in many files, not just in one source file. This is similar to detecting a βgeneral subexpression,β but it works with both declarations and executable code. When the match is not exact, it can determine the parameters for the "subprogram" (abstraction).
See my article on Clone Detection Using Abstract Syntax Trees for a description of the algorithms.
CloneDR does this for many languages, using language-precise parsing of the front end .
The site describes how CloneDR works and compares CloneDR with a number of other clone detection tools.
CloneDR does not handle command reordering. Less scalable methods that find duplicates by comparing PDGs can do this. They are pretty close to comparing data flow graphs, which can be useful for finding machine code matches.
source share