Seitenanfang

The five faces of a Perl hash item

PHP calls it associative array, JavaScript calls it object and - in the eyes of other (older) languages like C, BASIC, Pascal or Perl - all of them are wrong. An array has some items which may be addressed using their position in the list, but only a Perl hash has named keys. A hash is basically an (unordered) list of items where each item has a key and a value - but that value may have one of many different states.

hash.pngI love to think of Perl as a pure logical language with very few exceptions, but I'll switch to a developers point of view and include some "states" which aren't clearly hash-related but very common.

Hash basics

A hash is basically a list of items:

KeyValue
1 One
foo bar
baz bar

Each unique key has exactly one value which isn't unique. The size of a hash is only limited by the amout of memory available to the Perl process and so are key and value. One way of defining a hash with values is

my %sample = (
1 => 'One',
foo => 'bar',
baz => 'bar',
);

The key doesn't need quotes as long as it contains only \w+ (which includes A to Z, a to z, 0 to 9 and the underscore _ ) and is no reserved Perl keyword. That's not exactly official, but my personal definition. Anything else goes between quotes and in any doubt: Use quotes. Same for key names between { and }.

print $sample{1};
print $sample{foo};

1. Exists

From a pure Perl core point of view, exists is the only real state of a hash item. Using the sample hash (table) above, the keys 1foo and baz exist, but keys like bar or The flying spaghetti monster don't exist. The exists function is the only chance of checking this:

exists($sample{foo});                            # returns 1 (=true)
exists($sample{'The flying spaghetti monster'}); # returns undef (=false)

2. Defined

One may think of the exists function as some kind of check if one exact key has been created within the hash. The defined function is often being used as exists replacement, but that's not true. Any existing key may have a defined or undefined value.

$sample{exists_but_not_defined} = undef;
exists($sample{exists_but_not_defined}); # returns true
defined($sample{exists_but_not_defined}); # returns false

It's perfectly ok to use an undef value within a hash. Hashes are often used to collect a list of unique values and I like to use 1 as a default value, but there is no need to assign any value. The keys function will report all keys, even if (some of) their values are undef.

3. False

The next common mistake: A hash item's value may by defined, but still false (like any scalar variable's contents):

$sample{false_value} = 0;
exists($sample{false_value}); # returns true
defined($sample{false_value}); # returns true
$sample{false_value}; # returns false

The hash key false_value exists and has a defined value - but the value is still false. Trying to use the last check for testing the existance if an item will fail.

4. True

Testing for a true value is nearly the same as testing for a false value:

$sample{true_value} = 1;
exists($sample{true_value}); # returns true
defined($sample{true_value}); # returns true
$sample{true_value}; # returns true

All tests return true now. You shouldn't mix them up, but it's ok to simply test for a true value if you need only hash elements with true values:

for my $key (@expected_keys) {
next unless $sample{$key};
$sum += $sample{$key}; # Or do anything really useful here
}

The first line within the loop skips all items from the list of @expected_keys which match one of the following conditions:

  • The expected key doesn't exist within the hash %sample.
  • The key exists, but has an undefined value.
  • The value is defined, but empty or zero.

Any positive or negative number and any text value will be added, all others will be skipped without issuing a warning even if use warnings; has been used before.

5. Hashref

Pretty much the same as true above, but a special case from a developers point of view: Any hash item value may hold any reference. A reference to a hash is very common, but array-references, objects and any other type of reference is allowed as hash value (but not as key!).

 

3 Kommentare. Schreib was dazu

  1. ...

    s/but that's not true/but that's an incorrect usage/;
    s/even if \(some of\) their values are undef/including an undef key if it exists/;

  2. Andreas

    well, some points are not entirely correct.

    lets start with the hash ( or associative array ).

    a hash usually consists of 2 "parts", a mathematical function, which calculates a (numeric) position from an arbitrary key, and an usual "array" ( a single list, which contains ( pointers to ) additional lists ).

    lets assume, the mathematical function just sums up the ascii values of each charater in the key,
    the key "A" would be position 41, "B" 42, and so on. "AB" would be "83", but wait - "BA" would also be 83. so we have a "Hash Collision" ( ever heard that? ). So we need a 2nd list here ( another hash, or just an array ). And of course, each entry needs to store the original key and the original value.

    to dump a hash, you just iterate over the arrays, thats why the list is unordered, and why even if the disorder seems static, you cannot rely on it because the math behind it might change depending on the data.


    Furthermore, in the exists,defined,true,false - world, there had been a problem lurkig around when testing nested hash structures, which forced programmers to write constructs like:
    if( defined $hash && exists $hash->{$ref} && exists $hash->{$ref}->{$another} && exists $hash->{$ref}->{$another}->{$ref} && defined $hash->{$ref}->{$another}->{$ref} ).
    Not doing that crashed the program with a panic.
    Thats why these "elements" will nowadays be "auto-created" with an undefined value ( but they do exist, and will be returned with "keys" command. )

    "refs are not allowed as keys":
    oh, sure they are! you use the memory address of the data as a key then.
    mostly, this is a mistake.. but not in every case.
    for example if you have multiple objects, and want to store information to each object, you can easily do:
    my %callback = ();
    my $o = My::Object->new( host => gooogle.com, port => 80 );
    $callback{$o} = sub { reap_google_host_object };
    $o->run_asnc();
    [...]
    foreach $o in ( keys %callback ) {
    if( $o->is_finished ) {
    &$callback{$o}; # i know its discouraged, could also be written $callback{$o}();
    delete $callback{$o};
    }

    and thats perfectly valid :)

    • Sebastian

      Thank you for the explanation why the hash is unorderd. I think, the most important part is to know that it's unorderd and even a static-looking order isn't static.

      I don't agree with your last point. References may be used as keys, this is correct, but they are not usable as references any longer, just plain text. I'm pretty sure that your sample will break in the last foreach loop with "Can't call method "is_finished" without a package or object reference".

      #!/usr/bin/perl -l
      use strict; use warnings;
      my %x = ({foo => "bar"} => 2);
      my $key = (keys %x)[0];
      print $x{$key}; print $key; print ref($key); print $key->{foo};
      2
      HASH(0x23b2998)

      Can't use string ("HASH(0x23b2998)") as a HASH ref while "strict refs" in use at -e line 5.

      The script won't show the error without "use strict", but still won't be able to use the ref as ref.

Schreib was dazu

Die folgenden HTML-Tags sind erlaubt:<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>