Seitenanfang

Parallel DNS lookups using AnyEvent

Perl has a great asynchronous library: AnyEvent. (There may be even more great asynchronous libraries, but it decided to use AnyEvent.) I recently had to lookup a lot of different hostnames and didn't want to do it sequentially (because every single DNS server might be down or wait until the reply is received).

I started with a list of hostnames in @hosts:

my @hosts = (qw(www.google.com www.facebook.com www.iana.org));

This is an example list only, my script is fetching it's hosts from a database, but a fixed list should be ok for demonstration.

The script should lookup all hostnames and continue once this is done. Lets start by preparing some variables and looping through the list array:

my %result;
for my $host (@hosts) {
AnyEvent::DNS::resolver->resolve($host, "a", sub { ... });
}

This look will walk through the whole list and call the AnyEvent DNS resolver to fetch the A records for each host. The resolver will call the specified callback sub once it's done - either finished successfully or got an error. The callback will get a list of DNS records as array-references and should store them into the result hash:

sub {
for my $record (@_) {
# Sample:
# 'www.google.com', 'a', 'in', 3600, '127.0.0.1'
my ($record_host, $type, $proto, $ttl, $ip) = @$record;
push @{$result{$host}}, $ip;
}
}

Notice that nothing else is passed to the sub. The result hash %result is globally defined and thus known to each following sub. The $host variable is also known to the sub, because the sub is defined within the loop which knows $host. It's better to use $record_host because the nameserver might have answered something else than $host in the original query, but I'm using $host to demonstrate the visibility of variables.

Looks like we're done, but it only looks like this. The script adds a lot of things-to-be-done onto the AnyEvent stack - but never starts them. AnyEvent requires a condition variable to actually start the real work.

my $cv = AnyEvent->condvar; # should be defined before the loop

AnyEvent will continue processing events until $cv->send is called, but where to call it? Just behind the for loop would mark the condition variable as finished before a event has been processed, because the loop only schedules the events to be done. I thought about this and that until I discovered the $cv->begin method. It has to be called once for every item of @hosts and the callback has to call $cv->end to mark this host as done.

The $cv->begin call also accepts an optional anonymous sub reference which will fire as soon as the last host did it's $cv->end call. Without any explicit sub, the last $cv->end will call $cv->send.

After all events have been spooled, $cv->recv must be called to actually start the work and something should be done with the %result once $cv->recv returns after all DNS lookups have been done.

Some small final changes: The script needs to use AnyEvent and the other modules. The strict and warnings pragmas should be added and there is also no need to copy @$record into different variables. It's slow, memory-wasting and I think it's bad style. Better use $record->[4] expecially in a short script like this one. Here is the final script:

#!/usr/bin/perl

use strict;
use warnings;

use AnyEvent;
use AnyEvent::DNS;
use Data::Dumper;

my @hosts = (qw(www.google.com www.facebook.com www.iana.org));

my $cv = AnyEvent->condvar;
my %result;
for my $host (@hosts) {
$cv->begin; # Mark host as started

AnyEvent::DNS::resolver->resolve($host, "a", sub {
for my $record (@_) {
# Sample:
# 'www.google.com', 'a', 'in', 3600, '127.0.0.1'
push @{$result{$host}}, $record->[4];
}

$cv->end; # Mark host as finished
});
}

$cv->recv;

print Dumper(\%result)."\n";

Each list item will be counted in the condition variable $cv and will schedule a DNS lookup request passing an anonymous callback subroutine which will be executed once the request is done. I did some further tests and confirmed, that no action is started until $cv->recv is called. The callback sub will collect all results in the %result hash and mark the host as finished.

Here is the output of this script:

$VAR1 = {
'www.google.com' => [
'173.194.35.144',
'173.194.35.148',
'173.194.35.147',
'173.194.35.146',
'173.194.35.145'
],
'www.facebook.com' => [
'69.171.224.42'
],
'www.iana.org' => [
'192.0.32.8'
]
};

Using AnyEvents is simple and mostly straightforward, but I still recommend that you read the AnyEvent documentation before starting.

 

2 Kommentare. Schreib was dazu

  1. Paul "LeoNerd" Evans

    And if it should interest anyone, I have just written a very similar article from the perspective of IO::Async instead of AnyEvent.

    http://leonerds-code.blogspot.co.uk/2013/10/parallel-name-resolving-using-ioasync.html

  2. Anonymous

    Except that the IO::Async version doesn't run in parallel (it's merely asynchronous according to its documentation - a slow request holds up the following requests)) and invokes undefined behaviour when the program also uses pthreads, silently corrupting data on many operating systems such as FreeBSD.

    However, the AnyEvent solution here doesn't handle ipv6 and other niceties - the equivalent to getaddrinfo would be AnyEvent::Socket::resolve_sockaddr.

    For DNS resolving, of course, using AnyEvent::DNS directly is fine!

Schreib was dazu

Die folgenden HTML-Tags sind erlaubt:<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>