Think looping

Dieser Post wurde aus meiner alten WordPress-Installation importiert. Sollte es Darstellungsprobleme, falsche Links oder fehlende Bilder geben, bitte einfach hier einen Kommentar hinterlassen. Danke.

Software Developers need to think straight forward: Line 2 is executed after line 1, never before. But straight forward code isn't always good.

Here is a simple sample taken from a (much) bigger project:

sub write_status {   my $self = shift;   my $statustext = shift;

my $sth = $self->{dbh}->prepare_cached('UPDATE STATUS SET text=? WHERE id=?'); $sth->execute($statustext, $self->{job_id}); $sth->finish; # Required in older DBD::mysql versions due to a bug}

sub write_protocol { my $self = shift; my $item_id = shift; my $action = shift;

open my $log_fh,'>>', $self->{protocol_file} or return; print $log_fh join("\t", time, $action, $item_id)."\n"; close $log_fh;}

Looks good, doesn't it? But a different place of the same project did this:
for my $no (0..$#items) {   $self->write_status($no / $#items);   $self->write_protocol($items[$no]->id, 'something');   [...]}
...and @items had 140.000 elements.

The issue was raised when a customer couldn't be serviced because the processing took 24 hours on a busy system. Only few lines are necessary to improve the runtime:

sub write_status {   my $self = shift;   my $statustext = shift;

# Don't flush two status reports within one second return if $self->{last_status} and $self->{last_status} == time;

my $sth = $self->{dbh}->prepare_cached('UPDATE STATUS SET text=? WHERE id=?'); $sth->execute($statustext, $self->{job_id}); $sth->finish; # Required in older DBD::mysql versions due to a bug

$self->{last_status} = time; # Note the time of the last second}

sub write_protocol { my $self = shift; my $item_id = shift; my $action = shift;

# Cache the filehandle within the object my $log_fh = $self->{protocol_fh}; # print $self->{protocol_fh} $content won't work if (!$log_fh) { open $log_fh,'>>', $self->{protocol_file}; $self->{protocol_fh} = $log_fh; } return unless $log_fh;

print $log_fh join("\t", time, $action, $item_id)."\n"; close $log_fh;}

The first sub now updates the status table only once per second, maybe once every 1000 records. The second one opens the protocol file once and re-uses this fh for every new line reducing the stat, open and close syscalls from 130.000 to 1 each plus collects the lines in a perl-internal output buffer which is flushed in bigger chunks decreasing the number of write. The last change has a small drawback: The latest unflushed buffer content is lost if the task crashes real bad.

The results: prepare_cached completely left the top15 time consuming function calls in NYTProf profiling and only one third of all $sth->execute calls survived the changes. The overall process time was cut by half with that few changed lines.

Try not only the write straightforward sourcecode but also try to think about how often the source is called and if parts could be re-used in the same run.


2 Kommentare. Schreib was dazu

  1. Jakub Narebski

    "print $self->{protocol_fh} $content" won't work, but "print { $self->{protocol_fh} } $content" would

  2. Sebastian

    Thanks! I've been searching for this shortcut for years :-)

Schreib was dazu

Die folgenden HTML-Tags sind erlaubt:<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>