Code cleanup: Find unused subs

Jan 07

von Sebastian am 7.01.2019 um 16:46 in English, Perl

Most projects are growing. New features get added, old ones deprecate. But as life goes on, ancient parts of the source code stay alive even if they're not being used anymore. My current cleanup challenge has 581k lines of code in 1500 files grown for about 15 years. Part one: Find defined, but unused subs.

The project has generic and flexible parts. Sometimes sub names get generated at run time - both for definition and calling. Analyzing the static code won't help much.

I created a small module called Unused.pm starting with the mandatory package lines:

package MyProject::Unused;
use strict;
use warnings;

The package needs a storage for information at run time. Perl's use command tries to call a function called import in the use'd module which is typically being used by the Exporter core module, but could also be used for running arbitrary code at compile time.

my %SUBS_FOUND;
sub import {    # Called on 'use MyProject::Unused'

This function will no nasty things with the importing modules namespace and needs to turn off some strict and warning pragma checks.

my $pkg = (caller)[0] . '::'; # Get importing modules name with suffix
no strict 'refs';
no warnings 'redefine', 'prototype';

The module should find every sub defined by the importing module except for some very special cases. BEGIN, END and import are always being used. Rare cases also declare non-subs within their namespace, but we don't care about them.

# Record all known subs
for my $subname (keys %{$pkg}) {

   # Skip special subs and non-subs
   next
     if $subname eq 'BEGIN'
     or $subname eq 'END'
     or $subname eq 'import'
     or ref($pkg->{$subname});

For finding unused subs, the module needs to remember all existing subs. That's easy.

# Remember all subs seen
$SUBS_FOUND{$pkg}->{$subname} = 0;

Recording the usage of a sub is more tricky: The original sub CODE reference is stored and overwritten by a new (anonymous) sub.

# Add sub wrapper: Will record every sub called and replace the itself with the original sub afterwards
my $orig = \&{$pkg->{$subname}};
$pkg->{$subname} = sub {

The anonymous sub has three jobs to do:

Record the sub as "has been used"
Remove itself
Call the original sub

         # Sub called for the first time:
         # Mark as "seen" and replace wrapper with original sub
         $SUBS_FOUND{$pkg}->{$subname} = 1;
         $pkg->{$subname} = $orig;
         &$orig(@_);
        }
   }

   return;
}

That's all for the import function.

Let's step back for a moment. I'm very afraid about speed. Some subs get called millions of times every hour. An extra delay of few milliseconds for the anonymous "wrapper" sub would sub up to a huge amount of time in production. That's why the anonymous sub records "used" and then replaces itself with the original one. Every sub does take this extra round once during the lifetime of the process which is ok for me. Every followup call will directly go to the original sub without any delay.

Recording results

The results need to be recorded at the end of the process. That's where END steps in.

END {
   # Dispatch all results to be stored in DB
   return unless keys %SUBS_FOUND;

   # TODO: Store stuff from %SUBS_FOUND
}

1;

END is very helpful because this pseudo-function runs at the very end of each process, even in most error cases.

Huston, we got a problem

There's no order for global destruction. Our END might be called when everything is still in place, but it might also be called after all database connections are closed and other things got cleaned up. Never ever rely on anything to be intact during global destruction!

This project has a Gearman dispatcher. Our Gearman client is proven to be functional in global destruction. I just pass the whole %SUBS_FOUND structure as payload to a job for further processing.

Another option - especially for few servers - would be a file.

open my $dump_fh, '>', '/tmp/subs.txt';
for my $pkg (keys %SUBS_FOUND) {
   for my $subname (keys %{$SUBS_FOUND{$pkg}}) {
      print $dump_fh $pkg.$subname."\t".$SUBS_FOUND{$pkg}->{$subname};
   }
}
close $dump_fh;

This snippet would create a list in /tmp/subs.txt which could be easily imported into a database table. File operation is safe during global destruction.

Different jobs

A sub might be called only once in a while. You should collect data until every single script did run at least once. I'd suggest 6 weeks to include monthly jobs (while you might still miss yearly ones).

Finally merge everything together: Check the logs for every sub name which never got a "1" in the second column of the log file.

We're just adding every new sub seen to a database table. Updates occur if a sub is marked as "1" (used) which has been "0" (seen, but unused) before.

Usage

Due to the special nature of this module, it has to be use'd at the very end of the module to be recorded.

use MyProject::Unused;
1;

Perl modules typically end with 1; and this use goes right before, not a the top of the module where uses typically should be. Otherwise the Unused module won't be able to see all subs defined.

The module also does see all subs imported from other modules. They need to be filtered before the actual cleanup.