Finding dead code in a large python project

Question

Finding dead code in a large python project

I saw How can you find unused functions in Python code? but it is really old and doesn’t really answer my question.

I have a large python project with several libraries that are shared by multiple entry point scripts. For many years this project has been acridged with many authors, so there was a lot of dead code. You know what to do.

I know that finding all dead codes is impossible. All I need is a tool that will find all the functions that are not called anywhere. We do nothing with function calls based on the function name string, so I'm not worried about something pathological ...

I just installed pylint, but it seems to be based on files, and does not pay much attention to dependencies between files or even dependencies between functions.

It is clear that I could grep for def in all files, get all function names from them and make grep for each of these function names. I just hope that there is something a little smarter than it already is.

ETA: Please note that I do not expect and do not want something perfect. I know that my problem with stopping is as good as everything (I really did not teach the theory of computation, which I know when I look at what is recursively enumerable). Any thing that tries to approximate it by actually running the code will take too much time. I just want something syntactically passes through the code and says: "This function is definitely used. This function MAY be used, and this function is definitely NOT used, no one else even knows that it exists!" And the first two categories are not important.

+53

python pylint code-cleanup

Brian Postow Mar 01 2018-12-01T00:

source share

7 answers

Keith Gaughan · Answer 1 · 2013-08-14 14:17

You might want to try vulture . It cannot catch everything because of the dynamic nature of Python, but it catches quite a bit without requiring a full set of tests, such as cover.py, and others should work.

Peter Wood · Answer 2 · 2012-03-01 22:22

Try running Ned Batchelder coverage.py .

Coverage.py is a tool for measuring the reach of Python code. It monitors your program, noting which parts of the code were executed, then analyzes the source to determine which code could have been executed, but was not.

Sven Marnach · Answer 3 · 2012-03-02 16:58

It is very difficult to determine which functions and methods to call without executing the code, even if the code does not do any fancy things. Ordinary function calls are pretty easy to spot, but method calls are really complex. A simple example:

class A(object): def f(self): pass class B(A): def f(self): pass a = [] a.append(A()) a.append(B()) a[1].f()

Nothing special happens here, but any script that tries to determine whether Af() or Bf() will be called will have a rather difficult time without executing this code.

While the above code does nothing useful, it certainly uses templates that are displayed in real code, namely the placement of instances in containers. Real code usually does even more complex things — pickling and scattering, hierarchical data structures, conventions.

As stated above, just detecting simple form call functions

 function(...)

or

 module.function(...)

will be pretty easy. You can use the ast module to analyze your source files. You will need to record all the imports and the names used to import other modules. You will also need to track the definitions of top-level functions and the calls inside these functions. This will give you a dependency graph, and you can use NetworkX to discover the connected components of this graph.

Although this may seem rather complicated, it can perhaps be done with less than 100 lines of code. Unfortunately, almost all major Python projects use classes and methods, so this will help little.

Brian Postow · Answer 4 · 2012-03-02 16:13

Here's the solution I'm using, at least tentatively:

 grep 'def ' *.py > defs # ... # edit defs so that it just contains the function names # ... for f in `cat defs` do cat $f >> defCounts cat *.py | grep -c $f >> defCounts echo >> defCounts done

Then I look at individual functions that have very few references (<3 say)

this is ugly and it gives me rough answers, but I think it is good enough for a start. What are your thoughts?

diefans · Answer 5 · 2013-08-06 13:57

On the next line, you can list all function definitions that are not explicitly used as an attribute, function call, decorator, or return value. So this is roughly what you are looking for. It is not perfect, it is slow, but I have never had any false positives. (With linux you need to replace ack with ack-grep )

 for f in $(ack --python --ignore-dir tests -h --noheading "def ([^_][^(]*).*\):\s*$" --output '$1' | sort| uniq); do c=$(ack --python -ch "^\s*(|[^#].*)(@|return\s+|\S*\.|.*=\s*|)"'(?<!def\s)'"$f\b"); [ $c == 0 ] && (echo -n "$f: "; ack --python --noheading "$f\b"); done

yedpodtrzitko · Answer 6 · 2012-03-01 22:23

If you have code with a lot of tests (this is generally very useful), run them with a plugin to cover the code, and you can see the unused code.)

sthenault · Answer 7 · 2012-03-02 08:14

IMO, which can be achieved pretty quickly with a simple pylint plugin that:

remember each analyzed function / method (/ class?) in the set S1
track each function / method called (/ class?) in S2
display S1 - S2 in the report

Then you will need to call pylint on your entire code base to get something that makes sense. Of course, as said, this needs to be checked, because there may have been errors in the output, or one that introduces a false positive. In any case, this is likely to significantly reduce the amount of grep that needs to be done.

I don't have much time to do this myself, but everyone will find help on the python-projects@logilab.org mailing list.

Finding dead code in a large python project

More articles: