The Masshosts Tool
One of the many challenges facing UNIX systems administrators in large, heterogeneous environments is the management of numerous, dissimilar hosts. In order to distribute software, make configuration changes or perform routine customer support, it is often necessary for machines to be broken down into categories, based on their architecture and OS level, or referenced by subnet. Most useful to the administrator is the ability to execute arbitrary commands on these groups of hosts or execute a series of commands on a single machine with the target hostnames as arguments.
Virtually every systems administration group in a large environment maintains a script or program for accessing hosts en masse. This article describes the masshosts tool, a highly configurable Perl script that fills this role, with an emphasis on performance, efficiency, and flexibility.
Masshosts grew from a shell script that was originally written in our computing environment for the purpose of executing a single command on a large number of machines. This shell script, named allhosts, took as its arguments a list of keywords, where each keyword would correspond to a list of hosts that fell into a common category (e.g., the keyword "ibmws" would correspond to a list of IBM RS/6000 workstations). When allhosts was run, it would either print out the host list that corresponded to your keywords or, via a command-line option, rsh to each host in that list and execute a desired command.
The primary problem with allhosts, however, was its speed. Each rsh command would execute in series, requiring the first rsh to complete before the second could begin. Given the length of time needed to perform each remote execution, using allhosts to execute the desired commands on a large number of hosts would be painstakingly slow. Also, if any of the hosts in the list happened to be unresponsive, the entire execution string would be delayed, waiting for that rsh to time out.
Consider, for example, a list of keywords that would correspond to 200 hosts. If we assume each rsh and command would take roughly three seconds to complete, then using allhosts to execute that command on all 200 machines would take at least 10 minutes. If even 2% of those hosts happened to be down, off the net or otherwise unavailable, that would be an additional 4 minutes of execution time due to rsh timeouts. If we extend this list of targets to include 1000 machines, not too unusual in our environment, then we would be looking at a minimum execution time of just under an hour, plus potentially 10 minutes in rsh timeouts.
Reality, of course, was much worse. It was not uncommon for an allhosts run against 400 machines to take well over an hour, and runs against larger lists of hosts had to be "split up" and run in parallel. It was out of this parallel run technique that masshosts was born. To cut down execution times, a new script was developed in Perl to actively manage its child processes and provide a means for executing these processes in parallel instead of in series. This new approach resulted in a performance boost that was limited only by the CPU of the local machine; and because rsh is cheap in terms of CPU consumption, it was not uncommon to run up to 30 or more processes at a single time, giving over a 30x increase in performance. Command runs that once took an hour or more would complete in minutes, and smaller runs would finish in a few seconds. This performance boost was invaluable to our administrative processes.
Since its first incarnation, masshosts has undergone many modifications designed to increase its usability and flexibility without sacrificing its performance. The masshosts described here was written in Perl and makes use of Perl5 techniques. It was developed and tested on 5.004, but it should run on earlier versions with few or no modifications (though note that the -c option requires Graham Barr's excellent IO::Socket library).
Masshosts can work with three different types of input: a list of hosts, a list of filters, or a list of keywords. The distinction and relationship between keywords and filters is very important, so I'll discuss that in detail before moving on. (All listings mentioned in this article are available from the Sys Admin Web site: www.samag.com or ftp.mfi.com in /pub/sysadmin.)
The default behavior of masshosts is to take a list of keywords, turn those into filters, and use the filters for matching hosts from a predefined list or database. The keyword then functions as an easy-to-remember "tag" that corresponds to a potentially complex filter; but, it is the filter that is actually used in matching a hostname. Keywords are mapped to filters via the filter configuration file, which is explained in more detail below.
The actual host lookup and matching is performed by a custom subroutine named getHosts, which you (the administrator) provide. You can make the getHosts routine as simple as pattern matching hostnames from /etc/hosts (or the hosts.byname map in NIS), or as complex as looking up hosts in some centralized database or flat text file (based on machine attributes). The getHosts subroutine API, and two samples, are also presented later.
The command syntax for masshosts is:
masshosts [ -f | -F | -l | -L | -K ] [ -ivV ] [ -x | -X file ]
[ -n net_expr ] [ [ -crz ] [ -p N ] [ -t time_limit ]
[ -o prefix ] -e "cmd <arguments>" ] arg1 arg2 ...
By default, arguments are a list of keywords. These keywords are looked up in the filters file and used to fetch a list of filters. These filters are then used to return a list of hostnames. If multiple arguments are specified, they are OR'd together.
-c - Check the connectivity to the remote host before attempting an rsh. Only meaningful when combined with the -r switch.
-e cmd - Execute the command(s) specified by cmd for each hostname in the match list. The command will be executed on the local machine unless the -r option is specified. If the command string or argument list contains the literal pattern %HOST%, it will be replaced with the current hostname before being executed.
-f - Arguments are an explicit list of filters to use for matching hosts. Filters will be OR'd together.
-h - Print usage.
-i - Prepend all command output with the string "hostname:", for each matching hostname. Only useful when combined with the -r switch. Ignored if -o is specified. This option is handy when you are letting the output from masshosts go to stdout/stderr, and you want to see which host said what.
-l - Arguments are an explicit list of hostnames. Useful when you already know the list of machines, and just want to run commands against them quickly.
-n net_expr - Only hosts whose IP address matches net_expr will be returned. This is interpreted literally as a regular expression, so be careful.
-o prefix - Send the standard output and standard error from commands run to the files prefix.hostname.out and prefix.hostname.err for each hostname that is matched. Only meaningful when used with the -e switch.
-p N - Run commands in parallel, keeping N jobs active simultaneously. Only meaningful when used with the -e switch.
-r - rsh(1) to each matching host and execute commands on the remote machine. Only meaningful when used with the -e switch.
-t time_limit - Time limit, in seconds, for command execution when making parallel runs. Only meaningful when combined with the -p switch.
-v - Be mildly verbose: display the list of outstanding processes after all processes have been spawned. Very useful if the -r switch is specified. Currently only meaningful when combined with the -p switch.
-x - Exclude any hosts listed in the default exclusion file.
-z - Delete any output files that are zero-length (i.e., empty). Only meaningful when used with the -o switch.
-F - Arguments are files that contain an explicit list of filters to use for matching hosts. Using - as an argument specifies standard input. Filters will be OR'd together.
-K - Arguments are files that contain a list of keywords. These keywords will be looked up in the filters file and used to fetch a list of filters. These filters will then be used to match hostnames.
-L - Arguments are files that contain an explicit list of hosts. Like -l, this is useful for those times when you already have a list of machines, and you need to run commands against them quickly. Using - as an argument specifies standard input.
-V - Be very verbose: show when child processes are spawned, as well as when they are collected. Currently only meaningful when combined with the -p switch.
-X file - Exclude any hosts listed in file.
The config file
- masshosts sparc - Prints a list of all machines corresponding to the keyword "sparc".
- masshosts -r -e date sparc - rsh's to each machine corresponding to the keyword "sparc" and runs the date command.
- masshosts -Licr -p 25 -e 'last -10' /tmp/machines - rsh's to each host listed in /tmp/machines and executes the command last -10. Runs 25 rsh's in parallel, and prepends the output to stdout/stderr with the target machine's hostname.
Masshosts uses a configuration file for customizing its default behavior. The location of the configuration file is hardcoded into masshosts itself as a Perl "require" directive. This is the only line of the masshosts source that you should have to change (see Listings 1 and 2).
The configuration variables and descriptions are as follows:
$CONNECT_TIMEOUT - Used by the -c option. Specifies how long, in seconds, we should wait for a successful connection to a machine before assuming it is down. Set this to something short: if a host doesn't respond in 10 seconds or so, chances are it's not going to, or it has other problems that you may want to look at personally.
$EXCLUDE_FILE - The location of the file containing a list of hosts to exclude from masshosts runs. Host exclusion is explained in detail, below.
$FILTER_FILE - The file that maps keywords to filters. This is explained in detail below.
$GETHOSTS_PL - The location of the Perl library file containing your getHosts subroutine. It is included in masshosts via a "require" directive.
$RSH_CMD - The command name to use when performing an rsh to a remote host. Useful for specifying, for example, ssh instead of rsh.
The Filter Configuration File
The filter configuration file defines the associations between keywords and filters. When a keyword is specified on the command line, masshosts consults the filter file for that keyword and returns each matching filter. The format of filter file is relatively simple, consisting of two "fields" that are separated by whitespace. Blank lines are ignored, and # signs denote comments (in-line comments are allowed). The first field contains the filter, and the second field contains a pipe-separated list of keywords that will match that filter. This keyword list is actually treated as a regular expression, and every keyword expression in the filters file is matched against each keyword specified on the masshosts command line. Keywords match on whole words only.
For example, if the filter configuration file contained the following lines:
sun\d+ sparc|sunos # Sun sparcs running SunOS
sunsol\d+ sparc|solaris # Sun sparcs running Solaris
x86sol\d+ x86|solarisx86 # Intel PC's running Solaris x86
x86lin\d+ x86|linux # Intel PC's running Linux
then the keyword sparc would match lines one and two, but not lines three and four. The keyword solaris would match line two, but not lines one, three, or four.
Note that the matches from each keyword are combined to form the final list. The masshosts command line masshosts sparc x86 would match all four lines in the filter configuration file and return the list of hosts that matched each of these four filters. Another way of looking at this is that the keywords themselves are OR'd together, so that adding more keywords potentially gives you more matches, and hence more machines in the list.
Masshosts has several special features to improve its overall performance. These features serve to reduce its run time, make it less susceptible to network outages, and exclude certain hosts from the final match list, regardless of the keywords or filters that were specified on the command line.
Executing Commands on Remote Hosts vs. Local Host
For each host that your masshost query returns, you have the opportunity to run a command, with arguments, via the -e parameter. This command can be executed either locally (the default) or on the remote host through an rsh, if the -r switch is specified.
If the string %HOST% appears in either the command or the argument list, it will be replaced by the current hostname in masshost's execution queue.
The -p switch is arguably masshost's most powerful option. Rather than waiting for a single command to complete before starting a new one, masshost will spawn the desired number of processes (N) to run in parallel. As each child process exits and is collected, masshosts will spawn a new process in its place, always keeping N processes active at any given time.
Processes can run either on the local machine or, via the -r switch, on the remote host through an rsh. Note that you must be careful when specifying the parameter to the -p switch not to overload your local machine. Commands that run locally can easily suck up available CPU cycles to the point where your performance worsens rather than improves. Unless your jobs are going to be spending a significant amount of time waiting for something (e.g., I/O), keep the number of parallel processes small.
When using the -r switch, however, you can jack up the -p parameter to fairly large values (25 and 30 are not unreasonable numbers). In terms of local CPU, rsh processes are relatively cheap.
Avoiding rsh Timeouts
Perhaps the biggest source of potential delays in a masshosts run is rsh timeouts. In every environment, there are bound to be machines that are down, not responding to network requests or even off the net entirely, but that are still in the hosts file or the local hosts database. Masshosts, of course, has no way of knowing which machines are up and will blindly attempt to connect to every host in its execution queue. For the machines that it can't reach, however, rsh will hang, waiting for either a connection or a timeout. These timeouts can take anywhere from one to two minutes to occur, depending on your local OS.
With the -c switch, however, you can attempt to avoid rsh timeouts. When specified, masshosts will first check the network connectivity to the remote host by attempting to connect to that host's shell port. If a connection is not established within $CONNECT_TIMEOUT seconds, masshosts assumes that the host is unreachable and will not attempt an rsh. This feature is implemented through Graham Barr's IO::Socket package.
Of course, just because a machine is up, that doesn't mean that it can be rsh'd to successfully. A down fileserver, busy CPU, and a variety of other problems can prevent a machine from actually executing your commands once you have connected to it. The -c switch won't help you in these circumstances, but the -t switch will.
Execution Time Limits
When running commands in parallel, you can specify a time limit on overall command execution to prevent stalled machines from tying up the execution queue. The -t switch specifies, in seconds, the time limit for the command to complete, including the rsh itself. This is implemented via a call to alarm(2). Be careful when using this option, particularly if you are executing on multiple machine types or speeds: set the time limit according to the projected execution time of your slowest host.
The -x or -X switches tell masshosts to exclude any hostname that appears in the "host exclusion" file. This file has a very simple format: one hostname per line, with no added spaces and no comments. Hostnames must be an exact match.
IP Address Matching
Specifying the -n switch and its argument allows you to restrict the masshosts host list to machines whose IP addresses match the given regular expression. All known IP addresses for each machine in the host list are checked against this regular expression for a match. The advantage to this technique is that multi-homed hosts will be included if one of their interfaces matches the regex. The disadvantage is that it slows down masshosts, because a gethostsbyname(2) call is made for each hostname in the host list - the longer the host list, the slower the process.
This switch is useful when you only want to hit machines on a specific network or subnet.
Writing the Custom Host-Matching Subroutine
At the heart of masshosts lies the subroutine that actually takes a list of filters and uses it to generate a list of hostnames that meet the conditions specified by one or more of those filters. This subroutine is named getHosts, and it must be supplied by you.
Because every environment names their hosts differently, it is not possible to provide a generic getHosts subroutine that will work for everyone. Some environments, for example, may choose a naming convention for their hosts where the hostname identifies the type of machine. (Our environment takes this approach: all RS6k's are named with "rs" followed by a numerical suffix, all Solaris machines are "ss" followed by a numerical suffix, and so on). Other administrators may maintain a host information database (whether it be a flat text file or a formal SQL database) where machine configuration information is stored for each host on the network.
The purpose of providing a standard API to the getHosts subroutine, and not including it as a part of masshosts itself, is to allow you, the systems administrator, to easily integrate masshosts into your environment. By writing your own getHosts routine, you choose how to translate filters to hostnames, providing ultimate flexibility.
This section describes the API for the getHosts subroutine and provides two samples that you can either incorporate with few or no changes or use as the basis for building your own.
The API for the getHosts subroutine is quite simple: only two arguments are passed, and both are references to arrays. The getHosts subroutine itself is stored in a file named by $GETHOSTS_FUNC and is included into the Perl script via a "require" directive.
The first argument, which we'll call $arefHosts, is a reference to an array containing the hosts found in the getHosts subroutine. The second argument, which we'll call $arefFilters, is a reference to an array containing our filters. Each host that matches one of the filters in @$arefFilters should be pushed onto the array @$arefHosts.
The following two examples show two different implementations of the getHosts subroutine and will be discussed in detail.
Example 1: Matching Hostnames in /etc/hosts (or hosts.byname)
Listing 3 shows code for a getHosts subroutine that performs the simplest form of hostname matching. Each filter is a regular expression, and those regular expressions are matched against the hostnames in /etc/hosts or the hosts.byname NIS map.
This function assumes that you have some sort of naming convention identifying machine type according to its hostname. As previously mentioned, our environment uses this technique, where the hostname consists of an alphabetic tag indicating its architecture and OS, followed by a numerical sequence number (rs006, ss102, hp903, etc.). With this arrangement, it is possible to create a regular expression that matches, say, all RS6k's or HP workstations, and the performance is on par with the grep family of commands. Note that you are limited by the granularity of your naming convention. Our hostnames, for example, don't differentiate between HP-UX 9.x and HP-UX 10.x, so when we ask for HP machines, we get them all regardless of their OS level.
Given this naming convention, our filter configuration file might look like Figure 1. Providing the keyword aix on the masshosts command line would correspond to the filter ibm\d+. If we specified solaris as a keyword, we would get two filters: sunsol\d+ and sunafs\d+.
The getHosts function in Listing 3 takes these filters from @$arefFilters and forms a single regular expression of the form:
Using the above examples and Figure 1, then, the keyword aix would generate the regular expression \b(ibm\d+)\b, and the keyword solaris would generate \b(sunsol\d+|sunafs\d+)\b. The \b designations help prevent unintended matches. If we, for example, specified nfs as our keyword, we would not want the regular expression fs\d+ to also match our AFS servers, whose naming convention is sunafs\d+.
Each line of the hosts file (or the hosts.byname NIS map if $USE_NIS is set) is then matched against this regular expression. If a match is successful, the matching hostname is pushed onto the @$arefHosts array.
Example 2: Querying a Host Information Database
Listing 4 shows how masshosts could be used to query an LDAP database, using Graham Barr's Net::LDAP Perl module, which can be obtained from the Perl-LDAP page on the Web at http://www.connect.net/gbarr/perl-ldap/. LDAP was chosen for this example because both the Perl modules and the University of Michigan's LDAP server implementation (see http://www.umich.edu/~dirsvcs/ldap/) are freely available.
The advantages of using a database query to match hostnames are numerous and are limited only by the amount and type of data you choose to store for your machines. For the purposes of this example, let's assume that our records have the following structure (in reality, there would be additional fields required by the LDAP database, but we'll leave them out for simplicity):
dn: hostname=fs5, o=ourcompany
The attributes could be generated using uname -a, then updated into the database. We could then use masshosts to allow queries against this information, creating a filter configuration file similar to Figure 2.
We could even expand the attribute list for a machine's entry, allowing storage of IP address, total RAM, CPU type, Ethernet address, and even which NIS server the host is bound to (provided you update records often, of course). The more data you choose to store, the more granularity you have in selecting the machines for your masshosts run. For truly complex queries, you could skip using keywords altogether and use the -f or -F options to specify search filters directly, giving you the ability to make customized queries on the fly.
To install masshosts:
- Install the masshosts script in the desired location (/usr/local/bin, etc.).
- Change the hardcoded "require" line of masshosts to reflect the location of your config file.
- Create your filters file.
- Create the getHosts subroutine.
- Create your config file, masshosts.pl.
My group has found masshosts to be a powerful and flexible tool for acessing large numbers of machines. The custom getHosts subroutine makes it easy to integrate masshosts into your existing environment, and allows you to define host-matching filters that range from simple pattern matches to complex database queries. The performance boosts of parallel process execution can save you hours of execution time, and make it possible to run commands on a large number of hosts in a reasonable amount of time, with little or no operator babysitting. These, combined with its other features, make masshosts an invaluable tool for administering large computing environments.
About the Author
John Mechalas has a B.S. and M.S. in Aeronautical and Astronautical Engineering from Purdue University. He has worked at Intel Corporation for four years, where he currently manages a UNIX systems administration and security team for a large microprocessor design site. He can be reached at: firstname.lastname@example.org.