Cleaning up the huge Perl Codebase

I am currently working on a 15 year old web application.

It contains mainly CGI perl scripts with HTML templates :: Template .

It contains over 12,000 files and approximately 260 MB of shared code. I believe that no more than 1,500 perl scripts are needed, and I want to get rid of all the unused code.

There are practically no tests for the code.

My questions:

  • Do you know of any CPAN module that can help me get a list of only use d and require d modules?
  • What would be your approach if you want to get rid of all the extra code?

I thought of the following approaches:

  • try overriding the built-in use and require perl functions with those that display the name of the downloaded file in a specific location
  • override warnings and / or strict modules import function and display the file name in a specific place
  • study the perl Devel::Cover module and take the same approach and analyze the code when doing manual testing instead of automatic tests.
  • replace the perl executable with a custom one that will write down every name of the file that it reads (I still don't know how to do this)
  • some creative use of lsof (?!?)
+6
source share
3 answers

Devel :: Modlist can give you what you need, but I never used it.

Several times, when I needed to do something like this, I chose a brighter force to test %INC at the end of the program.

 END { open my $log_fh, ...; print $log_fh "$_\n" for sort keys %INC; } 
+5
source

In a first approximation, I just ran

 egrep -r '\<(use|require)\>' /path/to/source/* 

Then spend a couple of days cleaning it up. This will give you a list of all used or required modules.

You can also play with @INC to exclude specific library paths.

If you are trying to determine the execution path, you can run the code through the debugger with "trace" enabled (ie, "t" in the debugger), and then redirect the output to a text file for further analysis. I know this is difficult when starting CGI ...

+2
source

Assuming that the appropriate timestamps are enabled, you can check the access time for various script files, which should exclude any top-level script files that are not used.

It might be worth adding some tools for CGI.pm to register the current script -name ($ 0) to see what happens.

+2
source

Source: https://habr.com/ru/post/916670/


All Articles