Doing Many Things, Like pings

Randal L. Schwartz

As a UNIX systems administrator, I'm often faced with those little mundane tasks that seem so trivial to me but so important to the community I'm supporting. Little things like "hey, is that host up and responding to pings?". Such tasks generally have a very repetitive nature to them, and scripting them seems to be the only way to have time to concentrate on the tasks that really need my attention.

Let's look at the specific task of pinging a number of hosts on a subnet. Now, there are tools to do this quickly (like nmap), and there are even Perl modules to perform the ping (as in Net::Ping), but I wanted to focus on something familar that can be launched from Perl as an external process, and the system C<ping> command seems mighty appropriate for that.

First, let's look at how to ping one host, on my BSD-ish system:

sub ping_a_host {
  my $host = shift;
  'ping -i 1 -c 1 $host 2>/dev/null' =~ /0 packets rec/ ? 0 : 1;
}

Here, I'm firing up a subshell to execute the ping -i 1 -c 1 command, which on my system requests that ping have a 1-second timeout, and selects (as Sean Connery's character so eloquently said in The Hunt for Red October) "one ping only". Your ping parameters may vary: check your manpage.

The output is scanned for the string 0 packets rec, which if absent means we got a good ping. So if the match is found, we return 0 (the ping was bad); otherwise, we'll return 1. The ping command spits out some diagnostics on standard error, which we'll toss using Bourne-shell syntax.

Note that the value of $host is not checked here for sanity. We certainly wouldn't want to accept a random command-line parameter or (gasp) a Web form value here without some serious validation. However, as we use this in our program, all of the values will be internally generated, so we've got some degree of safety.

To scan a particular subnet, looking for hosts that are alive, we would add to that subroutine something like:

print "ping $_ is ", ping_a_host($_), "\n"
  for map "10.0.1.$_", 1..254;

This routine completes very quickly for hosts that are alive, but is slow as molasses for hosts that aren't present, because the TCP protocol demands that the host have a chance to respond.

So, how can we speed that up? This is not a CPU-intensive loop: practically the entire time is spent waiting for some remote host to respond. We'll leave the ping_a_host subroutine alone, because that's not where we have a problem -- it's doing its job as fast as it can. So we need to do more of these at a time.

One approach is to fork a separate process for each host we want to ping. We'll then sit back in a wait loop. As each child process completes, we'll note its exit status, and when there are no more kids, we'll spit out a report.

First, we'll define the host list for the task:

my @hosts = map "10.0.1.$_", "001".."010";

The numbers here are padded to three digits so that they sort as strings in a numeric sequence -- a cheap but effective trick. Note also that I'm only selecting the first 10 hosts this time. I'll explain that shortly.

Next, we'll want a hash to keep track of the kids:

my %pid_to_host;

The keys of this hash will be the child process ID (PID), and the value will be the corresponding host that the child is processing. Then, we'll want to loop over the host list, firing up a child for each:

for (@hosts) {
  if (my $pid = fork) {
  ## parent does...
  $pid_to_host{$pid} = $_;
  warn "$pid is processing $_\n";
  } else { # child does
  ## child does...
  exit !ping_a_host($_);
  }

As each host is placed into $_, we'll fork. The result of fork is a child process running in parallel with the parent process. These processes are distinguished only by the return value of fork, which is 0 in the child, but the child's PID in the parent. So, if we get back a non-zero value, we're the parent, and we'll store the PID into the hash, along with the host that particular child is processing. If we're the child, then we'll call the ping_a_host routine, and arrange for our exit status to be good (0) if that routine gives a thumbs up.

The warn in the loop is merely for diagnostic purposes so that you can see what's happening. In a production program, I'd certainly remove that.

At the end of this loop, we'll have a number of processes -- far too many, in fact. For each host to check, we'll have two processes running: the shell forked by the backquotes, and the ping process itself. Perl has to fork a shell because we needed that child to have its standard error output redirected. If I could have gotten the redirection out of those backquotes somehow, we'd have only one child process per host, not two.

Launching 20 processes to check 10 hosts will start pushing us up against the typical per-user process limit. So you can see why I didn't do all 254 hosts at once!

Now it's time to wait for the results. A simple "wait" loop will reap the children as fast as they complete their task, and we'll declare a hash to hold the results:

my %host_result;

The key will be the host, and the value will be 1 if the child said it was pingable, otherwise 0.

    while (keys %pid_to_host) {
      my $pid = wait;
      last if $pid < 0;
      my $host = delete $pid_to_host{$pid}
or warn("Why did I see $pid ($?)\n"), next;
      warn "reaping $pid for $host\n";
      $host_result{$host} = $? ? 0 : 1;
    }

As long as we've got kids (indicated by the ever-decreasing size of the %pid_to_host hash), we'll wait for them. The child process ID comes back from wait, which we'll stick into $pid. At this point, the exit status of that particular child is in $?. If the return value of wait is negative, then we don't have any more kids. This is an unexpected result, which we could check later by noticing that %pid_to_host is not yet empty, or we could have simply died here.

We'll use the %pid_to_host hash to map the PID into the host for which it was processing. Again, we might have accidentally reaped a completed child that wasn't one of ours, so defensive programming requires checking for that. This won't happen unless other parts of this program are also forking children somehow, but I'm a cautious programmer most of the time.

Finally, we'll take the exit status in $?, and map it into the appropriate good/bad value for the result hash.

When this loop completes, we have no more kids performing tasks, and it's time to show the result:

for (sort keys %host_result) {
  print "$_ is ", ($host_result{$_} ? "good" : "bad"), "\n";
}

For each key of the result table, we'll say whether the result was good or bad.

Putting this all together makes a nice little demo of forking 20 kids to check 10 hosts, but it won't scale to 254 hosts, because that would require more process slots than we typically have (or want to use, actually). We need to perform the forking gradually, so that we never have more than 20 kids at a time. One naive approach is to chunk the data into bite-size bits:

my @all_hosts = ...;
my %host_results;
while (my @hosts = splice @all_hosts, 0, 10) {
  ... process @hosts, adding into %host_results ...
}
... show results ...

Here, most of the code above gets wrapped into an outer loop that hands 10 hosts at a time to be processed, using C<splice> to peel them off the master list. While this strategy certainly solves the "no more than 10 at a time" condition, each batch of 10 has to wait for the slowest of the 10 to complete.

A better way would be to fork until we hit the limit of active children, then wait for any one child to finish before we need to fork again. First, we'll need to factor out "waiting for a kid" into a subroutine so we can call it in two different places: while forking a new task, and at the end to reap all the remaining children:

    sub wait_for_a_kid {
      my $pid = wait;
      return 0 if $pid < 0;
      my $host = delete $pid_to_host{$pid}
or warn("Why did I see $pid ($?)\n"), next;
      warn "reaping $pid for $host\n";
      $host_result{$host} = $? ? 0 : 1;
      1;
    }

Note that we're accessing %pid_to_host and %host_result directly here, so those variables must be in scope before the subroutine definition. The subroutine now returns 1 if a kid was reaped, and 0 otherwise. The final reap loop now becomes:

## final reap:
1 while wait_for_a_kid();

At this point, the program functions identically to the prior one, except that I've re-factored the kid reaping. The magic happens next. We'll put wait_for_a_kid in the middle of the forking loop as well, just before we're about to fork, conditionally if the number of kids is already at the maximum we chose:

for (@hosts) {
  wait_for_a_kid() if keys %pid_to_host >= 10;
  ...

Ahh. That does it. We can now crank @hosts back up to our 254 items. As we fire off the first 10, this new statement has no effect. But when it comes time for the 11th, we'll wait for at least one of the other 10 to complete first. So, at no time do we have more than 10 hosts active (using 20 child processes for reasons explained earlier). The entire program is given here in case you want to see it all in context:

sub ping_a_host {
  my $host = shift;
  'ping -i 1 -c 1 $host 2>/dev/null' =~ /0 packets rec/ ? 0 : 1;
  }

  my %pid_to_host;
  my %host_result;

  sub wait_for_a_kid {
    my $pid = wait;
    return 0 if $pid < 0;
    my $host = delete $pid_to_host{$pid}
    or warn("Why did I see $pid ($?)\n"), next;
    warn "reaping $pid for $host\n";
    $host_result{$host} = $? ? 0 : 1;
    1;
  }

  my @hosts = map "10.0.1.$_", "001".."254";

  for (@hosts) {
    wait_for_a_kid() if keys %pid_to_host > 10;
    if (my $pid = fork) {
    ## parent does...
    $pid_to_host{$pid} = $_;
    warn "$pid is processing $_\n";
    } else { # child does
    ## child does...
    exit !ping_a_host($_);
    }
  }

  ## final reap:
  1 while wait_for_a_kid();

  for (sort keys %host_result) {
    print "$_ is ", ($host_result{$_} ? "good" : "bad"), "\n";
  }

As a working program, this does pretty well, although it could be made a bit more robust, and is very specific to the particular ping program on my machine. If you don't want to write this pattern of code into each program that wants to do parallel things, look at Parallel::ForkManager in the CPAN, which does pretty much the same thing with a friendly interface.

One improvement to this program might be to pre-fork and re-use the children, using some sort of IPC (pipes or sockets) to communicate additional tasks to perform as each task completes, but I've run out of space to talk about that here. Until next time, enjoy!

Randal L. Schwartz is a two-decade veteran of the software industry -- skilled in software design, system administration, security, technical writing, and training. He has coauthored the "must-have" standards: Programming Perl, Learning Perl, Learning Perl for Win32 Systems, and Effective Perl Programming, as well as writing regular columns for WebTechniques and Unix Review magazines. He's also a frequent contributor to the Perl newsgroups, and has moderated comp.lang.perl.announce since its inception. Since 1985, Randal has owned and operated Stonehenge Consulting Services, Inc.