Think looping

Feb 24

von Sebastian am 24.02.2012 um 18:07 in English, Perl

Dieser Post wurde aus meiner alten WordPress-Installation importiert. Sollte es Darstellungsprobleme, falsche Links oder fehlende Bilder geben, bitte einfach hier einen Kommentar hinterlassen. Danke.

Software Developers need to think straight forward: Line 2 is executed after line 1, never before. But straight forward code isn't always good.

Here is a simple sample taken from a (much) bigger project:

sub write_status {   my $self = shift;   my $statustext = shift;   my $sth = $self->{dbh}->prepare_cached('UPDATE STATUS SET text=? WHERE id=?');   $sth->execute($statustext, $self->{job_id});   $sth->finish; # Required in older DBD::mysql versions due to a bug}
sub write_protocol {   my $self = shift;   my $item_id = shift;   my $action = shift;
   open my $log_fh,'>>', $self->{protocol_file} or return;   print $log_fh join("\t", time, $action, $item_id)."\n";   close $log_fh;}

Looks good, doesn't it? But a different place of the same project did this:

for my $no (0..$#items) {   $self->write_status($no / $#items);   $self->write_protocol($items[$no]->id, 'something');   [...]}

...and @items had 140.000 elements.

The issue was raised when a customer couldn't be serviced because the processing took 24 hours on a busy system. Only few lines are necessary to improve the runtime:

sub write_status {   my $self = shift;   my $statustext = shift;   # Don't flush two status reports within one second   return if $self->{last_status} and $self->{last_status} == time;
   my $sth = $self->{dbh}->prepare_cached('UPDATE STATUS SET text=? WHERE id=?');   $sth->execute($statustext, $self->{job_id});   $sth->finish; # Required in older DBD::mysql versions due to a bug
   $self->{last_status} = time; # Note the time of the last second}
sub write_protocol {   my $self = shift;   my $item_id = shift;   my $action = shift;
   # Cache the filehandle within the object   my $log_fh = $self->{protocol_fh}; # print $self->{protocol_fh} $content won't work   if (!$log_fh) {      open $log_fh,'>>', $self->{protocol_file};      $self->{protocol_fh} = $log_fh;   }   return unless $log_fh;
   print $log_fh join("\t", time, $action, $item_id)."\n";   close $log_fh;}

The first sub now updates the status table only once per second, maybe once every 1000 records. The second one opens the protocol file once and re-uses this fh for every new line reducing the stat, open and close syscalls from 130.000 to 1 each plus collects the lines in a perl-internal output buffer which is flushed in bigger chunks decreasing the number of write. The last change has a small drawback: The latest unflushed buffer content is lost if the task crashes real bad.

The results: prepare_cached completely left the top15 time consuming function calls in NYTProf profiling and only one third of all $sth->execute calls survived the changes. The overall process time was cut by half with that few changed lines.

Try not only the write straightforward sourcecode but also try to think about how often the source is called and if parts could be re-used in the same run.

Perl, Profiling, Speed

Weitere Artikel aus der Kategorie Perl:

Perl -M-A-C tests

Benchmarking mapping...

Warum ist 075 = 61?

Perl variable declar...

Perl Taint mode: Ver...

Perl Taint mode: Dat...

Twittern

2 Kommentare. Schreib was dazu

Jakub Narebski

27.02.2012 14:09

"print $self->{protocol_fh} $content" won't work, but "print { $self->{protocol_fh} } $content" would

Antworten
Sebastian

27.02.2012 14:24

Thanks! I've been searching for this shortcut for years :-)

Antworten

Think looping

2 Kommentare. Schreib was dazu

Schreib was dazu

Kategorien

Andere Blogs

Blogverzeichnisse

Think looping

2 Kommentare. Schreib was dazu

Schreib was dazu

Kategorien

Tags

Andere Blogs

Blogverzeichnisse