Patrick M. Ryan
In a distributed computing environment, it is often
advisable to keep
publicly needed disk resources -- such as manual pages
and local
software -- on a central server and allow other machines
to remotely
mount those disks. This has the advantage of freeing
up valuable disk
space and eliminating redundant data. You can effect
this distribution
of disk space via Sun's Network File System (NFS) and
the automount
program.
Certain types of disk resources can be thought of as
static; for example,
manual pages and third-party software change only at
widely spaced
intervals. By contrast, some public domain packages,
notably many
Gnu programs, are updated regularly, and other types
of software change
even more frequently, some on a weekly or even daily
basis. In our
work environment, we use the Interactive Data Language
(IDL) for most
of our data analysis. We maintain a suite of customized
IDL routines
for data visualization, specialized file I/O, and statistical
analysis,
and we store these routines in a globally accessible
directory. Many
of the routines are in a constant state of evolution.
In a small cluster of machines, just one machine suffices
to act as
the server. In a large distributed environment, however,
the server
machine can become loaded down and can cause a performance
hit across
the cluster. This is especially true in an environment
where the logical
cluster covers several different subnets. In such cases,
having two
or more machines act as disk servers help to distribute
the load more
evenly. With two or more servers, though, the problem
becomes keeping
the disks in sync. If the software on the server is
static (e.g.,
manual pages, which change rarely), then there is no
problem. But
if local software resides on the disks and if that software
changes
often, a strategy for keeping the disks in sync is required:
this
is where disk mirroring comes in.
Disk mirroring is used to keep two or more sets of disk
resources
in sync. A "disk resource" generally refers
to a directory
tree residing on a single physical file system. With
such a mirroring
system in place, the system manager need not worry about
conflicting
information from different machines. I have written
a Perl script
called mirror, to implement disk mirroring. The script
is
designed to be run at regular intervals in a cron job.
[Editor's
note: the mirror script is too large for publication
here,
but is available electronically. See "Source Code
Availability"
on the cover to find information on electronic distribution.]
The mirror script exploits a special type of entry which
can
be put in an automount map, as follows:
The -hosts keyword indicates to the automounter that
references
to directories in the /net directory refer to machine
names
and then to directories exported by those machines.
For example, assume
that a machine called jupiter is exporting directories
/usr
and /export. Host europa is running automount and
includes an entry in its automount map as described
above. On europa,
one can refer to directory /net/jupiter/export. When
this
reference is made, europa mounts everything that it
can from
jupiter, in this case /export and /usr. It
is this capability to automount from a particular host
which is exploited
in the mirror script.
The model of the mirror scheme is master-slave. One
host is assumed
to have the definitive copy of the software or data.
Each mirror host
then runs the mirror script to update its own copy of
what
is on the master.
To be useful, any mirroring scheme must perform several
different
tasks including:
Keep a log file of changes made to the slave.
Warn the system administrator about any inconsistencies
which could
not be resolved or any unexpected system errors.
An earlier effort at disk mirroring took the form of
a chain of tar
commands. Essentially, a giant tar pipeline, consisting
of
everything on the server, would copy everything over
to the slave.
While this accomplished the task of copying new and
changed files
over, it did not clean up outdated (deleted) files.
This method also
had the drawback of moving several hundred megabytes
across the network
every night.
The logic of the mirror script is to create two collections
of data. The first collection is a text file representing
the contents
of the master directory, one file per line. The second
is a database
of the contents of the slave server. This database is
a DBM file indexed
on the full pathnames of the files (an approach inspired
by Perl's
method of attaching an associative array to a DBM file).
mirror
generates the two files by using the output from the
UNIX find(1)
command (see Listing 1). Running find twice and storing
the
temporary files can be a significant drain on resources,
so it's best
to run the script in the middle of the night when temporary
disk space
and CPU cycles are both somewhat more abundant.
Once the two temporary files are created, the script
traverses the
list of files from the master. As each file is examined,
it is removed
from the associative array containing the slave's files.
The script
performs different types of tests on three different
types of files:
symbolic links, directories, and regular files. (The
script assumes
that no one will be trying to mirror non-text files
like device files
and sockets.)
If the file is a symbolic link, one of several conditions
will be
true on the slave: (1) the symbolic link does not yet
exist on the
slave; (2) the symbolic link does exist but does not
point to the
right place; (3) the file exists but is not a symbolic
link; or (4)
the symbolic link exists and points to the same place
as the master.
In case 1, a new symbolic link is created on the slave.
In case 2,
the old symbolic link is deleted and a correct one created
in its
place. Case 3 is a potential error condition; in this
case, a message
is sent to the administrator and no action is taken.
Case 4 obviously
requires no action. (See Listing 2 for the implementation
of these
cases.)
The script next checks to see if this is a directory.
The find
program provides a breadth-first search, which is important
in case
a new directory subtree with several levels is created.
Higher-level
directories must be created before lower ones. If a
directory does
not exist on the slave, it is created. In any other
case, such as,
if a file of the same name exists but is not a directory,
the administrator
is notified.
The third check (see Listing 3) is for regular files.
The script makes
several comparisons here to determine what action is
needed. A file
is considered "unchanged" if its size, user,
group, mode,
and modification time match that of the version on the
master. If
any of these differ, then a new version of the file
is copied over
and the file's various attributes are set using Perl's
built-in calls
to chown, chmod, and utime. If the file does
not exist on the slave (it is new on the master), then
it is copied
over and the appropriate attributes set.
As each file name in the master list is considered,
it is removed
from the DBM file. Once the traversal is complete, whatever
file names
remain in a DBM file must refer to files and directories
which have
been deleted from the master. These files and directories
are then
deleted from the slave. Note that the overhead of creating
the DBM
file of the slave's contents is purely for the purpose
of deleting
old files. If not for that consideration, the code could
simply check
for the presence of each master file and proceed accordingly.
Script Options
The script accepts several command-line options. They
are:
-f file -- Log changes to named file. By
default, do not save changes in a log file.
-h host -- Use named host as the master
server.
-l dir -- Local mirror directory. Defaults
to /.
-m user -- Send mail to user about any
changes made on the slave server. By default, send mail
to root.
-n Not Really mode. Report what is happening
but don't actually make any changes to the file system.
Useful for
testing.
-t -- dir Use dir as the temporary
directory. Defaults to /tmp.
-v -- Be verbose. Tell the user exactly
what is happening. This option is used primarily for
debugging.
This system has been in operation on our cluster now
for several months.
All indications are that the file systems have remained
in sync.
Planned Enhancements
Although the script is working properly, it does not
handle certain
special cases, so some enhancements will be necessary
in the future.
The most serious problem now is the case of files with
multiple hard
links. Without keeping track of inode numbers, mirror
has
no way of realizing that two or more file names may
actually refer
to a single physical file. As a result, multiple copies
of the same
physical file may be copied to the slave server. This
problem might
be solved with a little more overhead in the directory
review. Generally,
multiple hard links to a single file all reside in the
same directory.
The mirror script could do a little bit of looking around
in case it is examining a file with a link count greater
than 1. If
any hard links cannot be accounted for, the script can
send a warning
to the system administrator.
A second problem is that if large file systems are mirrored,
the two
temporary files can become very large. A possible solution
is to peruse
the file systems on a per-directory basis and cache
only one directory
at a time in temporary space.
Given time, the whole mirroring problem might be solved
with some
kind of clever client-server protocol. A file transfer
protocol such
as FTP or RCP could be used to transfer files across
the network.
To conclude, writing the mirror script was a useful
exercise
in learning about the pitfalls of file management in
a distributed
environment. It was also a chance for me to try my hand
at writing
a large Perl script.
This script may be obtained via anonymous ftp to jaameri.gsfc.nasa.gov
in the directory /pub/sysadmin/mirror.sh. I welcome
bug reports,
enhancement suggestions, and any questions about the
script.
References:
Wall, Larry and Randal Schwartz. Programming
Perl. Sebastopol, CA: O'Reilly & Associates, 1990.
System and Network Administration. Sun Microsystems
Inc.
About the Author
Pat Ryan has been programming on UNIX systems of various
types since
1986. He earned BS and MS degrees at St. Joseph's University,
Philadelphia. He is currently employed by Hughes STX
Corporation and is
working as a programmer and system manager at NASA's
Goddard Space
Flight Center. He can be reached over the Internet at
patrick.m.ryan@gsfc.nasa.gov.