Dieser Post wurde aus meiner alten WordPress-Installation importiert. Sollte es Darstellungsprobleme, falsche Links oder fehlende Bilder geben, bitte einfach hier einen Kommentar hinterlassen. Danke.
How to convert a hash to a string? Perl is TIMTOWTDY but which way is the fastest? I need a checksum (hash, digest) for the hash, so the string must be the same for the same hash every time. Hash keys are not sorted, even a simple join('', keys(%hash)) may be different for each call (if the hash has at least two keys).
I started a little race searching for new ways and benchmarking them, the result was:
perl -MStorable=freeze -MBenchmark -MYAML -MJSON::XS -MJSON=to_json -MData::Dumper=Dumper -le '$Storable::canonical = 1;my $jxss = JSON::XS->new->canonical;timethese(0,{ Dumper => sub { Dumper(\%INC); }, self => sub { join("\x00",map { $_."\x01".$INC{$_} } sort keys %INC); }, JSON => sub { to_json(\%INC); }, JSONsort => sub { JSON->new->canonical->encode(\%INC) }, JSONXS => sub { JSON::XS::encode_json(\%INC); }, JSONXSsort => sub { JSON::XS->new->canonical->encode(\%INC) }, JSONXSpre => sub { $jxss->encode(\%INC); }, YAML => sub { YAML::Dump(\%INC); }, Storable => sub { freeze(\%INC); }, } );'Benchmark: running Dumper, JSON, JSONXS, JSONXSsort, JSONsort, Storable, YAML, self for at least 3 CPU seconds...
Dumper: 3 wallclock secs ( 3.17 usr + 0.00 sys = 3.17 CPU) @ 7704.10/s (n=24422)
JSON: 4 wallclock secs ( 3.08 usr + 0.03 sys = 3.11 CPU) @ 62825.72/s (n=195388)
JSONsort: 4 wallclock secs ( 3.11 usr + 0.00 sys = 3.11 CPU) @ 42986.17/s (n=133687)
JSONXS: 3 wallclock secs ( 3.12 usr + 0.00 sys = 3.12 CPU) @ 88461.54/s (n=276000)
JSONXSpre: 4 wallclock secs ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 48809.49/s (n=154238)
JSONXSsort: 3 wallclock secs ( 3.19 usr + 0.00 sys = 3.19 CPU) @ 44238.24/s (n=141120)
Storable: 4 wallclock secs ( 3.13 usr + 0.00 sys = 3.13 CPU) @ v19036.74/s (n=59585)
YAML: 4 wallclock secs ( 3.14 usr + 0.00 sys = 3.14 CPU) @ 175.48/s (n=551)
self: 3 wallclock secs ( 3.11 usr + 0.00 sys = 3.11 CPU) @ 20418.65/s (n=63502)
The Benchmark module shows the time elapsed for each candidate (wallclock seconds), the CPU time used and the number of cycles the candidate looped (n=XXX at the end) but the most important part is the number of times per second each candidate could run, this is the best (only?) value to compare the results.Dumper, JSON, JSONXS and YAML don't work because they don't sort the hash. self works only sometimes because it doesn't support hash trees (references as value of a hash item).
JSONXSpre is the winner: Create a sorting JSON::XS object once and reuse it all the time is slightly faster than creating the object each run, but I didn't expect Storable, Dumper and finally YAML to be that bad.
It's so easy to run Benchmarks with Perl, no matter if you prefer a oneliner (like I did above) or put your source into a small perl script file.



7 Kommentare. Schreib was dazu-
ilmari
1.08.2012 3:54
Antworten
-
Max
4.08.2012 8:58
Antworten
-
Joshua Keroes
6.08.2012 18:29
Antworten
-
Reini Urban
6.08.2012 19:14
Antworten
-
demerphq
8.08.2012 10:31
Antworten
-
beefreak@freenet.de
23.08.2012 15:37
Antworten
-
Sebastian
23.08.2012 21:33
Antworten
Have you tried Data::Pond? It's written in XS for speed and uses a subset of Perl syntax to represent the data (similarly to what JSON is to Javascript).
And so Data::MessagePack which is pretty fast AND gives small results:
my $mp = Data::MessagePack->new->canonical;
then
MessagePack => sub { $mp->pack(\%INC); },
gives 53537.36/s while JSONXSpre gives 63099.54/s, on my host.
It would be worthwhile to add all of the YAML implementations. YAML, if YAML::XS isn't installed, won't be competitive.
perl -MBenchmark -MYAML -MYAML::XS -MYAML::Tiny -MYAML::Syck -e 'timethese(0,{"YAML::XS" => sub { YAML::XS::Dump(\%INC) }, "YAML" => sub { YAML::Dump(\%INC) }, "YAML::Tiny" => sub { YAML::Tiny::Dump(\%INC) }, "YAML::Syck" => sub { YAML::Syck::Dump(\%INC) } } )'
Benchmark: running YAML, YAML::Syck, YAML::Tiny, YAML::XS for at least 3 CPU seconds...
YAML: 3 wallclock secs ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 468.04/s (n=1479)
YAML::Syck: 3 wallclock secs ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 20095.57/s (n=63502)
YAML::Tiny: 3 wallclock secs ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 7174.05/s (n=22670)
YAML::XS: 4 wallclock secs ( 3.21 usr + 0.01 sys = 3.22 CPU) @ 6829.81/s (n=21992)
For small hashes Data:::MessagePack is usually faster than JSON::XS,
for bigger hashes JSON::XS is fastest.
serialize:
Rate storable json mp
storable 91022/s -- -33% -51%
json 136437/s 50% -- -26%
mp 185579/s 104% 14% --
Just wanted to add that this is a pretty crap benchmark.
Unless you really are serializing a small simple hash of string values I wouldn't put any faith in the numbers posted here.
simple sort, join('', sort keys %hash) ???
Simple, but doesn't work with multi-level hash trees.