Benchmarking mapping tables

Software often needs to transform values from A to B. Such transformations (given they're static) might be done using a database table, if/elsif blocks or a mapping table. Such tables are easy to create, maintain and understand. A database is always the slowest solution for a limited number of items, because the overhead for the client, network and database server is very big compared to sourcecode processing. Sourcecode-based solutions are faster, but which one is the best.

race-92193_1280.jpgThe most readable solution is an anonymous hashref within the sub which needs to perform the transformation. The Perl compiler and optimizer auto-handle constant values as constants, but do they discover a constant anonymous hash, too? How does it perform compared to a predefined hash?

I could write a thousand lines arguing for this or that solution, but I prefer a quick benchmark:

use Benchmark qw(cmpthese);
%a = (a => 1, b => 2, c => 3, e => 4, f => 5);
$b = \%a;

Basic benchmark header stuff: Load the modulem, initialize some variables which will be explained shortly and call the cmpthese function.

  if_else => sub {
if ($f eq "a") { $x = 1; }
elsif ($f eq "b") { $x = 2; }
elsif ($f eq "c") { $x = 3; }
elsif ($f eq "e") { $x = 4; }
elsif ($f eq "f") { $x = 5; }

The if/elsif blocks way: Compare each value to the variable in question and set the result variable $x to the result value.

  static     => sub { $x = $a{$f}; },
static_ref => sub { $x = $b->{$f}; },

The same values used in the if/elsif blocks have been initilized in %a as a mapping table. Using it is much shorter than the if/elsif stacking, but is it faster? I'll try both: Calling the mapping table hash directly and using a hash reference (stored in $b). Both got their data defined somewhere at the top of the file and not at the (maybe only) place where it's being used.

  anonymous => sub {
$x = {
a => 1, b => 2, c => 3, e => 4, f => 5

An anonymous mapping table hashref: It should be as fast as the static or static_ref sub if Perl manages to optimize it away to a constant. The anonymous hash is defined, queried for the key which matches $f and destroyed, because the reference to the hash isn't stored anywhere.

Benchmark results

Here are the results:

                 Rate anonymous if_else static_ref static
anonymous 1382840/s -- -70% -88% -90%
if_else 4593352/s 232% -- -60% -66%
static_ref 11593484/s 738% 152% -- -14%
static 13457798/s 873% 193% 16% --

I didn't expect the if/elsif blocks to be that slow. Perl also knows a switch or given/when feature starting from 5.10.1, but I'ld like to compare the mapping tables and thus I ignored it.

The static and static_ref subs don't show any surprise. De-referencing the $b hashref takes slightly longer than accessing the hash directly but 11 million vs. 13 million per second isn't that imporant. The anonymous hashref was much more worse than I expected. It's a very small one using constant single-char-keys and integer values for only 5 items, but still 90% slower than the fastest, direct hash access.


2 Kommentare. Schreib was dazu

  1. perl hashes are actually pretty slow since 5.18, when they changed it over security concerns to a very slow siphash function, instead of fixing the actual problem, the collision strategy.

    Try out external fast hashes dependent on your keys.
    You can e.g. precompute static perfect hash functions in C, which perform ok (see eg gperf or cmph), or tie faster hash tables in C.

    • Sebastian

      I'm using the current Ubuntu Perl v5.14.2 for my tests.

Schreib was dazu

Die folgenden HTML-Tags sind erlaubt:<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>