Doing
Many Things, Like pings
Randal L. Schwartz
As a UNIX systems administrator, I'm often faced with those
little mundane tasks that seem so trivial to me but so important
to the community I'm supporting. Little things like "hey,
is that host up and responding to pings?". Such tasks generally
have a very repetitive nature to them, and scripting them seems
to be the only way to have time to concentrate on the tasks that
really need my attention.
Let's look at the specific task of pinging a number of hosts
on a subnet. Now, there are tools to do this quickly (like nmap),
and there are even Perl modules to perform the ping (as in
Net::Ping), but I wanted to focus on something familar that
can be launched from Perl as an external process, and the system
C<ping> command seems mighty appropriate for that.
First, let's look at how to ping one host, on my BSD-ish
system:
sub ping_a_host {
my $host = shift;
'ping -i 1 -c 1 $host 2>/dev/null' =~ /0 packets rec/ ? 0 : 1;
}
Here, I'm firing up a subshell to execute the ping -i 1 -c
1 command, which on my system requests that ping have a
1-second timeout, and selects (as Sean Connery's character so
eloquently said in The Hunt for Red October) "one ping
only". Your ping parameters may vary: check your manpage.
The output is scanned for the string 0 packets rec, which
if absent means we got a good ping. So if the match is found, we
return 0 (the ping was bad); otherwise, we'll return 1. The
ping command spits out some diagnostics on standard error,
which we'll toss using Bourne-shell syntax.
Note that the value of $host is not checked here for sanity.
We certainly wouldn't want to accept a random command-line
parameter or (gasp) a Web form value here without some serious validation.
However, as we use this in our program, all of the values will be
internally generated, so we've got some degree of safety.
To scan a particular subnet, looking for hosts that are alive,
we would add to that subroutine something like:
print "ping $_ is ", ping_a_host($_), "\n"
for map "10.0.1.$_", 1..254;
This routine completes very quickly for hosts that are alive, but
is slow as molasses for hosts that aren't present, because the
TCP protocol demands that the host have a chance to respond.
So, how can we speed that up? This is not a CPU-intensive loop:
practically the entire time is spent waiting for some remote host
to respond. We'll leave the ping_a_host subroutine alone,
because that's not where we have a problem -- it's
doing its job as fast as it can. So we need to do more of these
at a time.
One approach is to fork a separate process for each host we want
to ping. We'll then sit back in a wait loop. As each child
process completes, we'll note its exit status, and when there
are no more kids, we'll spit out a report.
First, we'll define the host list for the task:
my @hosts = map "10.0.1.$_", "001".."010";
The numbers here are padded to three digits so that they sort as strings
in a numeric sequence -- a cheap but effective trick. Note also
that I'm only selecting the first 10 hosts this time. I'll
explain that shortly.
Next, we'll want a hash to keep track of the kids:
my %pid_to_host;
The keys of this hash will be the child process ID (PID), and the
value will be the corresponding host that the child is processing.
Then, we'll want to loop over the host list, firing up a child
for each:
for (@hosts) {
if (my $pid = fork) {
## parent does...
$pid_to_host{$pid} = $_;
warn "$pid is processing $_\n";
} else { # child does
## child does...
exit !ping_a_host($_);
}
As each host is placed into $_, we'll fork. The result
of fork is a child process running in parallel with the parent process.
These processes are distinguished only by the return value of fork,
which is 0 in the child, but the child's PID in the parent. So,
if we get back a non-zero value, we're the parent, and we'll
store the PID into the hash, along with the host that particular child
is processing. If we're the child, then we'll call the ping_a_host
routine, and arrange for our exit status to be good (0) if that routine
gives a thumbs up.
The warn in the loop is merely for diagnostic purposes
so that you can see what's happening. In a production program,
I'd certainly remove that.
At the end of this loop, we'll have a number of processes
-- far too many, in fact. For each host to check, we'll
have two processes running: the shell forked by the backquotes,
and the ping process itself. Perl has to fork a shell because we
needed that child to have its standard error output redirected.
If I could have gotten the redirection out of those backquotes somehow,
we'd have only one child process per host, not two.
Launching 20 processes to check 10 hosts will start pushing us
up against the typical per-user process limit. So you can see why
I didn't do all 254 hosts at once!
Now it's time to wait for the results. A simple "wait"
loop will reap the children as fast as they complete their task,
and we'll declare a hash to hold the results:
my %host_result;
The key will be the host, and the value will be 1 if the child said
it was pingable, otherwise 0.
while (keys %pid_to_host) {
my $pid = wait;
last if $pid < 0;
my $host = delete $pid_to_host{$pid}
or warn("Why did I see $pid ($?)\n"), next;
warn "reaping $pid for $host\n";
$host_result{$host} = $? ? 0 : 1;
}
As long as we've got kids (indicated by the ever-decreasing size
of the %pid_to_host hash), we'll wait for them. The child
process ID comes back from wait, which we'll stick into
$pid. At this point, the exit status of that particular child
is in $?. If the return value of wait is negative, then
we don't have any more kids. This is an unexpected result, which
we could check later by noticing that %pid_to_host is not yet
empty, or we could have simply died here.
We'll use the %pid_to_host hash to map the PID into
the host for which it was processing. Again, we might have accidentally
reaped a completed child that wasn't one of ours, so defensive
programming requires checking for that. This won't happen unless
other parts of this program are also forking children somehow, but
I'm a cautious programmer most of the time.
Finally, we'll take the exit status in $?, and map
it into the appropriate good/bad value for the result hash.
When this loop completes, we have no more kids performing tasks,
and it's time to show the result:
for (sort keys %host_result) {
print "$_ is ", ($host_result{$_} ? "good" : "bad"), "\n";
}
For each key of the result table, we'll say whether the result
was good or bad.
Putting this all together makes a nice little demo of forking
20 kids to check 10 hosts, but it won't scale to 254 hosts,
because that would require more process slots than we typically
have (or want to use, actually). We need to perform the forking
gradually, so that we never have more than 20 kids at a time. One
naive approach is to chunk the data into bite-size bits:
my @all_hosts = ...;
my %host_results;
while (my @hosts = splice @all_hosts, 0, 10) {
... process @hosts, adding into %host_results ...
}
... show results ...
Here, most of the code above gets wrapped into an outer loop that
hands 10 hosts at a time to be processed, using C<splice>
to peel them off the master list. While this strategy certainly solves
the "no more than 10 at a time" condition, each batch of
10 has to wait for the slowest of the 10 to complete.
A better way would be to fork until we hit the limit of active
children, then wait for any one child to finish before we need to
fork again. First, we'll need to factor out "waiting for
a kid" into a subroutine so we can call it in two different
places: while forking a new task, and at the end to reap all the
remaining children:
sub wait_for_a_kid {
my $pid = wait;
return 0 if $pid < 0;
my $host = delete $pid_to_host{$pid}
or warn("Why did I see $pid ($?)\n"), next;
warn "reaping $pid for $host\n";
$host_result{$host} = $? ? 0 : 1;
1;
}
Note that we're accessing %pid_to_host and %host_result
directly here, so those variables must be in scope before the subroutine
definition. The subroutine now returns 1 if a kid was reaped, and
0 otherwise. The final reap loop now becomes:
## final reap:
1 while wait_for_a_kid();
At this point, the program functions identically to the prior one,
except that I've re-factored the kid reaping. The magic happens
next. We'll put wait_for_a_kid in the middle of the forking
loop as well, just before we're about to fork, conditionally
if the number of kids is already at the maximum we chose:
for (@hosts) {
wait_for_a_kid() if keys %pid_to_host >= 10;
...
Ahh. That does it. We can now crank @hosts back up to our 254
items. As we fire off the first 10, this new statement has no effect.
But when it comes time for the 11th, we'll wait for at least
one of the other 10 to complete first. So, at no time do we have more
than 10 hosts active (using 20 child processes for reasons explained
earlier). The entire program is given here in case you want to see
it all in context:
sub ping_a_host {
my $host = shift;
'ping -i 1 -c 1 $host 2>/dev/null' =~ /0 packets rec/ ? 0 : 1;
}
my %pid_to_host;
my %host_result;
sub wait_for_a_kid {
my $pid = wait;
return 0 if $pid < 0;
my $host = delete $pid_to_host{$pid}
or warn("Why did I see $pid ($?)\n"), next;
warn "reaping $pid for $host\n";
$host_result{$host} = $? ? 0 : 1;
1;
}
my @hosts = map "10.0.1.$_", "001".."254";
for (@hosts) {
wait_for_a_kid() if keys %pid_to_host > 10;
if (my $pid = fork) {
## parent does...
$pid_to_host{$pid} = $_;
warn "$pid is processing $_\n";
} else { # child does
## child does...
exit !ping_a_host($_);
}
}
## final reap:
1 while wait_for_a_kid();
for (sort keys %host_result) {
print "$_ is ", ($host_result{$_} ? "good" : "bad"), "\n";
}
As a working program, this does pretty well, although it could be
made a bit more robust, and is very specific to the particular ping
program on my machine. If you don't want to write this pattern
of code into each program that wants to do parallel things, look at
Parallel::ForkManager in the CPAN, which does pretty much the
same thing with a friendly interface.
One improvement to this program might be to pre-fork and re-use
the children, using some sort of IPC (pipes or sockets) to communicate
additional tasks to perform as each task completes, but I've
run out of space to talk about that here. Until next time, enjoy!
Randal L. Schwartz is a two-decade veteran of the software
industry -- skilled in software design, system administration,
security, technical writing, and training. He has coauthored the
"must-have" standards: Programming Perl, Learning
Perl, Learning Perl for Win32 Systems, and Effective
Perl Programming, as well as writing regular columns for WebTechniques
and Unix Review magazines. He's also a frequent contributor
to the Perl newsgroups, and has moderated comp.lang.perl.announce
since its inception. Since 1985, Randal has owned and operated Stonehenge
Consulting Services, Inc.
|