Article

Disk Mirroring with NFS

Patrick M. Ryan

In a distributed computing environment, it is often advisable to keep publicly needed disk resources -- such as manual pages and local software -- on a central server and allow other machines to remotely mount those disks. This has the advantage of freeing up valuable disk space and eliminating redundant data. You can effect this distribution of disk space via Sun's Network File System (NFS) and the automount program.

Certain types of disk resources can be thought of as static; for example, manual pages and third-party software change only at widely spaced intervals. By contrast, some public domain packages, notably many Gnu programs, are updated regularly, and other types of software change even more frequently, some on a weekly or even daily basis. In our work environment, we use the Interactive Data Language (IDL) for most of our data analysis. We maintain a suite of customized IDL routines for data visualization, specialized file I/O, and statistical analysis, and we store these routines in a globally accessible directory. Many of the routines are in a constant state of evolution.

In a small cluster of machines, just one machine suffices to act as the server. In a large distributed environment, however, the server machine can become loaded down and can cause a performance hit across the cluster. This is especially true in an environment where the logical cluster covers several different subnets. In such cases, having two or more machines act as disk servers help to distribute the load more evenly. With two or more servers, though, the problem becomes keeping the disks in sync. If the software on the server is static (e.g., manual pages, which change rarely), then there is no problem. But if local software resides on the disks and if that software changes often, a strategy for keeping the disks in sync is required: this is where disk mirroring comes in.

Disk mirroring is used to keep two or more sets of disk resources in sync. A "disk resource" generally refers to a directory tree residing on a single physical file system. With such a mirroring system in place, the system manager need not worry about conflicting information from different machines. I have written a Perl script called mirror, to implement disk mirroring. The script is designed to be run at regular intervals in a cron job. [Editor's note: the mirror script is too large for publication here, but is available electronically. See "Source Code Availability" on the cover to find information on electronic distribution.]

The Mirror Script

The mirror script exploits a special type of entry which can be put in an automount map, as follows:

/net -hosts

The -hosts keyword indicates to the automounter that references to directories in the /net directory refer to machine names and then to directories exported by those machines. For example, assume that a machine called jupiter is exporting directories /usr and /export. Host europa is running automount and includes an entry in its automount map as described above. On europa, one can refer to directory /net/jupiter/export. When this reference is made, europa mounts everything that it can from jupiter, in this case /export and /usr. It is this capability to automount from a particular host which is exploited in the mirror script.

The Mirroring Model

The model of the mirror scheme is master-slave. One host is assumed to have the definitive copy of the software or data. Each mirror host then runs the mirror script to update its own copy of what is on the master.

To be useful, any mirroring scheme must perform several different tasks including:

Copy over any file which has changed on the master.

Create new directories on the slave that have been created on the master.

Delete files and directories on the slave which have been deleted on the master.

Create, delete, or reform any symbolic links which have changed on the master.

Keep a log file of changes made to the slave.

Warn the system administrator about any inconsistencies which could not be resolved or any unexpected system errors.

An earlier effort at disk mirroring took the form of a chain of tar commands. Essentially, a giant tar pipeline, consisting of everything on the server, would copy everything over to the slave. While this accomplished the task of copying new and changed files over, it did not clean up outdated (deleted) files. This method also had the drawback of moving several hundred megabytes across the network every night.

The logic of the mirror script is to create two collections of data. The first collection is a text file representing the contents of the master directory, one file per line. The second is a database of the contents of the slave server. This database is a DBM file indexed on the full pathnames of the files (an approach inspired by Perl's method of attaching an associative array to a DBM file). mirror generates the two files by using the output from the UNIX find(1) command (see Listing 1). Running find twice and storing the temporary files can be a significant drain on resources, so it's best to run the script in the middle of the night when temporary disk space and CPU cycles are both somewhat more abundant.

Once the two temporary files are created, the script traverses the list of files from the master. As each file is examined, it is removed from the associative array containing the slave's files. The script performs different types of tests on three different types of files: symbolic links, directories, and regular files. (The script assumes that no one will be trying to mirror non-text files like device files and sockets.)

If the file is a symbolic link, one of several conditions will be true on the slave: (1) the symbolic link does not yet exist on the slave; (2) the symbolic link does exist but does not point to the right place; (3) the file exists but is not a symbolic link; or (4) the symbolic link exists and points to the same place as the master. In case 1, a new symbolic link is created on the slave. In case 2, the old symbolic link is deleted and a correct one created in its place. Case 3 is a potential error condition; in this case, a message is sent to the administrator and no action is taken. Case 4 obviously requires no action. (See Listing 2 for the implementation of these cases.)

The script next checks to see if this is a directory. The find program provides a breadth-first search, which is important in case a new directory subtree with several levels is created. Higher-level directories must be created before lower ones. If a directory does not exist on the slave, it is created. In any other case, such as, if a file of the same name exists but is not a directory, the administrator is notified.

The third check (see Listing 3) is for regular files. The script makes several comparisons here to determine what action is needed. A file is considered "unchanged" if its size, user, group, mode, and modification time match that of the version on the master. If any of these differ, then a new version of the file is copied over and the file's various attributes are set using Perl's built-in calls to chown, chmod, and utime. If the file does not exist on the slave (it is new on the master), then it is copied over and the appropriate attributes set.

As each file name in the master list is considered, it is removed from the DBM file. Once the traversal is complete, whatever file names remain in a DBM file must refer to files and directories which have been deleted from the master. These files and directories are then deleted from the slave. Note that the overhead of creating the DBM file of the slave's contents is purely for the purpose of deleting old files. If not for that consideration, the code could simply check for the presence of each master file and proceed accordingly.

Script Options

The script accepts several command-line options. They are:

-f file -- Log changes to named file. By default, do not save changes in a log file.

-h host -- Use named host as the master server.

-l dir -- Local mirror directory. Defaults to /.

-m user -- Send mail to user about any changes made on the slave server. By default, send mail to root.

-n Not Really mode. Report what is happening but don't actually make any changes to the file system. Useful for testing.

-t -- dir Use dir as the temporary directory. Defaults to /tmp.

-v -- Be verbose. Tell the user exactly what is happening. This option is used primarily for debugging.

This system has been in operation on our cluster now for several months. All indications are that the file systems have remained in sync.

Planned Enhancements

Although the script is working properly, it does not handle certain special cases, so some enhancements will be necessary in the future.

The most serious problem now is the case of files with multiple hard links. Without keeping track of inode numbers, mirror has no way of realizing that two or more file names may actually refer to a single physical file. As a result, multiple copies of the same physical file may be copied to the slave server. This problem might be solved with a little more overhead in the directory review. Generally, multiple hard links to a single file all reside in the same directory. The mirror script could do a little bit of looking around in case it is examining a file with a link count greater than 1. If any hard links cannot be accounted for, the script can send a warning to the system administrator.

A second problem is that if large file systems are mirrored, the two temporary files can become very large. A possible solution is to peruse the file systems on a per-directory basis and cache only one directory at a time in temporary space.

Given time, the whole mirroring problem might be solved with some kind of clever client-server protocol. A file transfer protocol such as FTP or RCP could be used to transfer files across the network.

To conclude, writing the mirror script was a useful exercise in learning about the pitfalls of file management in a distributed environment. It was also a chance for me to try my hand at writing a large Perl script.

This script may be obtained via anonymous ftp to jaameri.gsfc.nasa.gov in the directory /pub/sysadmin/mirror.sh. I welcome bug reports, enhancement suggestions, and any questions about the script.

References:

Wall, Larry and Randal Schwartz. Programming Perl. Sebastopol, CA: O'Reilly & Associates, 1990.

System and Network Administration. Sun Microsystems Inc.

About the Author

Pat Ryan has been programming on UNIX systems of various types since 1986. He earned BS and MS degrees at St. Joseph's University, Philadelphia. He is currently employed by Hughes STX Corporation and is working as a programmer and system manager at NASA's Goddard Space Flight Center. He can be reached over the Internet at patrick.m.ryan@gsfc.nasa.gov.