Speed up your script using profiling

Dieser Post wurde aus meiner alten WordPress-Installation importiert. Sollte es Darstellungsprobleme, falsche Links oder fehlende Bilder geben, bitte einfach hier einen Kommentar hinterlassen. Danke.

What is profiling?

Each script should care of resources (time, memory or CPU time) being used, but some don't. Profiling is a detailed analysis of your script's behavior while it's running.

A profiler notices every processed line, it's callers and called functions and time required to process it.

How to profile a script?

Profiling is always done in two steps: Data collection and report generation.

Data collection is possible within a webserver, but using the command line is much easier.

First, grab your cookies

Open your favorite web browser, log into your webserver as usual and look up the values for your cookies. You'll need them soon to authorize your profiled script.

They may look like this:sessionid: 46832468768767username: foo

Usually not all cookies are required, but you should know this - it's your script :-) .

Second, get the arguments

Navigate to the script and function you'ld like to profile. Get the main frame's URL, it should look like this:

Put everything together

Use a SSH client to log into the server and "cd /path/to/your/cgi-bin" to start where the webserver starts. Assemble the information picked up earlier into a command line:
sudo -u www-data NYTPROF="file=/tmp/nytprof.myname"HTTP_COOKIE="sessionid=46832468768767; username=foo"QUERY_STRING="todo=do_delete_user&id=1"perl -d:NYTProf admin/ >/dev/null
In parts:
sudo -u www-dataAll web scripts run as www-data, most require this user.
NYTPROF="file=/tmp/nytprof.myname"Specify the filename where the collected raw profiling data should be saved. Replace /tmp/nytprof.myname by a path and filename of your choice but ensure that www-data is allowed to write it.
HTTP_COOKIE="sessionid=46832468768767; username=foo"* Simulated cookies, use the values written down in step 1. You may reuse these numbers as long as your session lasts.
QUERY_STRING="todo=do_delete_bonus&id=1* All arguments required by the script. Most script don't make any difference between GET and POST arguments and it's much easier to add arguments as GET values here instead of passing simulated POST data. You may paste the whole URL in here ( and remove the first part up to the question mark (?).
perl -d:NYTProfYour favorite programming language interpreter with an additional argument to enable profiling via Devel::NYTProf. Any other command line arguments (like -l or -I/path/to/other/modules) may be added as needed.
admin/users.plPath and filename of the script being profiled seen from within the cgi-bin dir. Use the part between cgi-bin/ and the first ? from the URL if you're unsure.
>/dev/nullRedirect any output (HTML code) to nowhere. Output - especially to a console or SSH connection - is much slower than you might think and we're measuring times - no need to create false timing results because of blocking output.
Press ENTER to start your script. It will run and finally show you the HTML output, but it might run a bit slower than over the web because the profiler will take some time itself to write the data. You may re-run the command line as often as you like, but remember to always delete the profiling data file before the next run. Devel::NYTProf won't do this for you and run the script without any profiling.

The parts marked by * are not required for running shell scripts or cronjobs through the profiler.

Report generation

The raw profiling data isn't human-readable at all, it has to be converted to a nice colorful report.

There are four tools available for doing this: nytprofcg, nytprofcsv, nytprofhtml and nytprofmerge. It's unlikely that you'll ever need something other than nytprofhtml which requires two arguments:

nytprofhtml -f /tmp/nytprof.myname -o ../htdocs/nytprof_partner2

The first one, -f telly nytprofhtml where to look for the raw data file, use the same value specified while running the profiler. The second argument is the directory where the report will be stored. It should be somewhere in your html dir and shouldn't exist as nytprofhtml will create it. Remove the directory or at least it's contents before running nytprofhtml if you'ld like to override an old report.

Notice: nytprofhtml must always run on the same computer where the profiling data has been generated. The HTML report is portable, but the raw data file isn't!

Point your webbrower to your development webserver adding the output directory, for example:

Using the report

The start page lists the top 15 most expensive (in terms of run time) subroutines and functions in a 6 column table:
CallsThe number of times this subroutine was called from anywhere within your script.
PThe number of different places (lines) calling this subroutine.
FThe number of individual files which called this subroutine.
Exclusive TimeHow many time was used for processing this subroutine '''excluding''' any other subs called from here. This is the time used by the sub itself.
Inclusive TimeTime required for this subroutine (see Exclusive Time) plus any time used by other subroutines called from here. This is the total time between entry (first line) and exit (last line or return command) of this subroutine.
SubroutineModule and subroutine name.
These columns are very common throughout the other report pages, many of them are used there too. It's important to understand the meaning of each one to get correct conclusions. A full list sortable by clicking on the column headers is linked just below that top 15.

A subroutine may be perfectly written and optimized for speed, but it could still become the worst place if being called many times. For example, each call to DBI::st::execute is one SQL query executed on the database. If you got a perfectly small and quick query but it's called 100.000 times even one millisecond per call results in more than one and a half minute of run time. Reducing the number of calls to a subroutine is sometimes much more easy and efficient than optimizing the sub itself.

I don't use the exclusive time often, but that's my point of view. Pick a subroutine with a long inclusive time for now and click on it's name. You may also jump directly to any file used within your script by scrolling down a little bit and clicking on the filename.

It may take a moment for the detailed file view to load but you should end up at the first line of the subroutine now. You'll notice some colored columns in front of the source line, they are:

LineLine number in original source file

StatementsHow many Perl statements have been executed running this line (sum for all calls)
Time on lineTime spent executing this line without any sub calls
CallsNumber of calls to other subs (if any)
Time in subsTime used by called subs (if any)
CodeOriginal Sourcecode and profiler comments.
The subroutine has additional information shown as sourcecode comments within the sub definition line, for example:

# spent 219ms (30.8+188) within Project::User::clicks_per_day which was called 1380 times, avg 159µs/call:# 1379 times (30.7ms+188ms) by main::show_hitlist_user at line 3463 of admin/, avg 159µs/call# once (39µs+69µs) by main::show_hitlist_user at line 3470 of admin/users.plsub clicks_per_day {

This sub (called clicks_per_day) used up 219 milliseconds. 30,8ms have been used for running this sub itself and another 188ms where used by other subs called from here. The sub has been called 1380 times from two different locations. Most of the calls (1379) came from sub show_hitlist_user at line 3463 of admin/ and one call came from the same script but line 3470.

Next line should be the first line of that sub showing all the additional information mentioned above. Look out for a loop and you'll see that it's inner statements have been called much more often than the statements before and after the loop. Same for conditions like if or conditional return statements: The followup lines show how often the condition was true or not.

A typical complex source line may look like this:my $cache_key = $self->dbh->context() . 'USER_' . $id . '_' . ($some_flag ? '' : ($self->foo || '0')) . '_' . $yy . sprintf("%2.2d", $mm) . sprintf("%2.2d", $dd);# spent 82µs making 4 calls to MyApp::Table::__ANON__[GLOBAL/], avg 21µs/call# spent 23µs making 8 calls to MyApp::Table::dbh, avg 3µs/call# spent 19µs making 8 calls to MyApp::DB::context, avg 2µs/call

$self->foo is a "virtual" sub defined at run time and not (really) known to the profiler. It has been called 4 times which took 82 microseconds. The dbh and context subroutines where called 8 times each using 3 and 2 microseconds per call. There is another information hidden in the source: Did you notice that $some_flag ? '' : ($self->foo || '0') part? As the foo sub wasn't called for all 8 times, $some_flag must have been true for half of the calls.

All subroutine names are clickable, you may jump directly to the calling lines (shown above the sub definition line) or to any sub called from here, like MyApp::Table::dbh.

My script still isn't faster than before

Reading profiling reports isn't that easy like reading code. You have to look at the lines, how often they have been called and which one lost time. Start at the most heavy sub and click yourself downwards and finally you'll end up at some statement called often but not being that bad at all. Try to browse through the up and down reference lines finding the right place to start your optimization.

If you're walking through your list of 1000 users and see that a sub separating the first and last name is called 5000 times, the sub itself isn't the problem. It's called too often and you need to look out for the right place to cache the first run's result. Maybe even Memoize is a good solution.

DBI::st::execute is always another good place to start. Do you see 7001 calls from few places? Maybe 1 from a line doing a ''SELECT * FROM users'', 1000 calls from ''SELECT email FROM users WHERE id=?'' and another 6000 calls from this simple sub asking ''SELECT country_name FROM countries WHERE iso_code=?'' to translate the users country shortcode to the country name. Sure, the country information could easily be cached saving around 5990 database calls, but did you know that the email address is being fetched for each user in addition to the previous ''SELECT *''?

Try to look around, change your source and re-run the profiling. Compare the results and even if things got worst - don't worry, try something else.


3 Kommentare. Schreib was dazu

  1. Alexander Hartmaier

    Can you please turn that into pod for the NYTProf dist?

  2. Danial M.

    I like what NYTProf can do. I have a perl script that executes another perl script via Cmd::exec.
    At the point where it calls the new script, NYTProf no longer tracks execution time.
    Is there a way to add parameters anywhere to profile the inner most script, vs the parent script?

    Thanks in Advance.

    • Sebastian

      I think that you have to add "-d:NYTProf" to the call of the 2nd script and also have to switch to another file, maybe
      $ENV{NYTPROF} = 'file=nytprof_2nd.out'
      You could also do
      export NYTPROF=addpid=1
      on your shell before starting the first script. It will attach the PID to the filename.
      But you'll always end up with two profiles (one for each script), I don't think that you could merge them into one.

Schreib was dazu

Die folgenden HTML-Tags sind erlaubt:<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>