Seitenanfang

Hash to string race

Dieser Post wurde aus meiner alten WordPress-Installation importiert. Sollte es Darstellungsprobleme, falsche Links oder fehlende Bilder geben, bitte einfach hier einen Kommentar hinterlassen. Danke.


How to convert a hash to a string? Perl is TIMTOWTDY but which way is the fastest? I need a checksum (hash, digest) for the hash, so the string must be the same for the same hash every time. Hash keys are not sorted, even a simple join('', keys(%hash)) may be different for each call (if the hash has at least two keys).

I started a little race searching for new ways and benchmarking them, the result was:

perl -MStorable=freeze -MBenchmark -MYAML -MJSON::XS -MJSON=to_json -MData::Dumper=Dumper -le '$Storable::canonical = 1;my $jxss = JSON::XS->new->canonical;timethese(0,{  Dumper     => sub { Dumper(\%INC); },  self       => sub { join("\x00",map { $_."\x01".$INC{$_} } sort keys %INC); },  JSON       => sub { to_json(\%INC); },  JSONsort   => sub { JSON->new->canonical->encode(\%INC) },  JSONXS     => sub { JSON::XS::encode_json(\%INC); },  JSONXSsort => sub { JSON::XS->new->canonical->encode(\%INC) },  JSONXSpre  => sub { $jxss->encode(\%INC); },  YAML       => sub { YAML::Dump(\%INC); },  Storable   => sub { freeze(\%INC); }, } );'

Benchmark: running Dumper, JSON, JSONXS, JSONXSsort, JSONsort, Storable, YAML, self for at least 3 CPU seconds...

Dumper: 3 wallclock secs ( 3.17 usr + 0.00 sys = 3.17 CPU) @ 7704.10/s (n=24422)

JSON: 4 wallclock secs ( 3.08 usr + 0.03 sys = 3.11 CPU) @ 62825.72/s (n=195388)

JSONsort: 4 wallclock secs ( 3.11 usr + 0.00 sys = 3.11 CPU) @ 42986.17/s (n=133687)

JSONXS: 3 wallclock secs ( 3.12 usr + 0.00 sys = 3.12 CPU) @ 88461.54/s (n=276000)

JSONXSpre: 4 wallclock secs ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 48809.49/s (n=154238)

JSONXSsort: 3 wallclock secs ( 3.19 usr + 0.00 sys = 3.19 CPU) @ 44238.24/s (n=141120)

Storable: 4 wallclock secs ( 3.13 usr + 0.00 sys = 3.13 CPU) @ v19036.74/s (n=59585)

YAML: 4 wallclock secs ( 3.14 usr + 0.00 sys = 3.14 CPU) @ 175.48/s (n=551)

self: 3 wallclock secs ( 3.11 usr + 0.00 sys = 3.11 CPU) @ 20418.65/s (n=63502)

The Benchmark module shows the time elapsed for each candidate (wallclock seconds), the CPU time used and the number of cycles the candidate looped (n=XXX at the end) but the most important part is the number of times per second each candidate could run, this is the best (only?) value to compare the results.

Dumper, JSON, JSONXS and YAML don't work because they don't sort the hash. self works only sometimes because it doesn't support hash trees (references as value of a hash item).

JSONXSpre is  the winner: Create a sorting JSON::XS object once and reuse it all the time is slightly faster than creating the object each run, but I didn't expect Storable, Dumper and finally YAML to be that bad.

It's so easy to run Benchmarks with Perl, no matter if you prefer a oneliner (like I did above) or put your source into a small perl script file.

 

7 Kommentare. Schreib was dazu

  1. ilmari

    Have you tried Data::Pond? It's written in XS for speed and uses a subset of Perl syntax to represent the data (similarly to what JSON is to Javascript).

  2. Max

    And so Data::MessagePack which is pretty fast AND gives small results:
    my $mp = Data::MessagePack->new->canonical;
    then

    MessagePack => sub { $mp->pack(\%INC); },

    gives 53537.36/s while JSONXSpre gives 63099.54/s, on my host.

  3. Joshua Keroes

    It would be worthwhile to add all of the YAML implementations. YAML, if YAML::XS isn't installed, won't be competitive.


    perl -MBenchmark -MYAML -MYAML::XS -MYAML::Tiny -MYAML::Syck -e 'timethese(0,{"YAML::XS" => sub { YAML::XS::Dump(\%INC) }, "YAML" => sub { YAML::Dump(\%INC) }, "YAML::Tiny" => sub { YAML::Tiny::Dump(\%INC) }, "YAML::Syck" => sub { YAML::Syck::Dump(\%INC) } } )'
    Benchmark: running YAML, YAML::Syck, YAML::Tiny, YAML::XS for at least 3 CPU seconds...
    YAML: 3 wallclock secs ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 468.04/s (n=1479)
    YAML::Syck: 3 wallclock secs ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 20095.57/s (n=63502)
    YAML::Tiny: 3 wallclock secs ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 7174.05/s (n=22670)
    YAML::XS: 4 wallclock secs ( 3.21 usr + 0.01 sys = 3.22 CPU) @ 6829.81/s (n=21992)

  4. Reini Urban

    For small hashes Data:::MessagePack is usually faster than JSON::XS,
    for bigger hashes JSON::XS is fastest.


    serialize:
    Rate storable json mp
    storable 91022/s -- -33% -51%
    json 136437/s 50% -- -26%
    mp 185579/s 104% 14% --

  5. demerphq

    Just wanted to add that this is a pretty crap benchmark.


    Unless you really are serializing a small simple hash of string values I wouldn't put any faith in the numbers posted here.

  6. simple sort, join('', sort keys %hash) ???

  7. Sebastian

    Simple, but doesn't work with multi-level hash trees.

Schreib was dazu

Die folgenden HTML-Tags sind erlaubt:<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>