Cover V10, I03
Article

mar2001.tar


What to Do When the Server Doesn't Serve -- Duplicating Data

Brett Lymn

In my previous article, (Sys Admin, February 2001) I talked about providing file system failover using features of Sun's cachefs implementation to provide a transparent serving of files when the server is down. In this article, I will take a more traditional approach to server failover and discuss duplicating data on multiple servers and causing client machines to select another server if the one that they are talking to stops responding.

The process by which a client automatically mounts file systems from a server is called an automounter. An automounter is a daemon that runs on the client machine that monitors a configured set of directory paths. When an access is made to one of these monitored paths, the daemon mounts the appropriate portion of the file system from a remote server using NFS to make it appear as though the path was always there. After a period of inactivity on the mounted path, the daemon will unmount the path to prevent the system from accumulating mounted file systems. On Sun systems, this is simply called the automounter; for other systems, there is an open source equivalent called amd. Both Sun's automounter and amd allow for multiple servers to serve the same filesystem. On versions of Solaris prior to 2.6, the binding of a client to a particular server was resolved only at mount time. So, if the server from which your client automounted a file system went down, too bad, you had to reboot the client to force it to mount the file system from the other server.

For amd and Sun's automounter on Solaris 2.6 and later, the automounter daemon monitors the connection to the server. If the connection to the server fails, the automounter automatically tries to connect to the other listed servers so that the client can continue working.

Sun Automounter

For Sun's automounter, two files are required to start the automounter: one is called the direct map and the other is the indirect map. The direct map may contain automounter configuration lines but, in practice, this is discouraged because any changes to the direct map require the automounter to be restarted before they take effect. Normally, the automounter direct map will contain names of files. These files are called indirect maps and are commonly used to hold the automounter configuration because the indirect maps can be updated and take effect without restarting the automounter.

The direct map is normally a file called /etc/auto_master and contains lines like the following:

/file  /usr/automount/file_map
that specify the locations of the indirect maps. In the example above, the automounted directory /file is controlled by the indirect map /usr/automount/file_map. Note that you may store the indirect maps anywhere that is accessible to the automounter daemon. My preference is to put all of the indirect maps into a single directory, which I typically share out to all the clients. As mentioned before, you may put the contents of an indirect map into the direct map, but doing so will create headaches later if you change your indirect maps. For a typical automounted directory, the indirect map contains lines of the form:

tree  server1:/share/file/tree
which tell the automounter that if a process accesses the directory /file/tree, it should mount the file system /share/file/tree from the machine server1 as /file/tree. The above example gives no failover protection. If server1 goes down, then the client will hang. The automounter syntax for specifying failover servers is a space-separated list of file server names and directories in the automounter entry, like this:

tree    server1:/share/file/tree server2:/share/file/tree
In this case (for automounters running on Solaris 2.6 and later), if server1 goes down, then the automounter will automatically change over to server2 for file serving. As noted earlier, automounters running on versions of Solaris prior to 2.6 will need to have the machine rebooted to change file servers.

amd

amd may already be installed on your system. There are quite a few vendors that bundle amd as part of the operating system or as a package on the distributions CD. Red Hat Linux, for example, has amd as a package on the latest distribution CDs. Check your man pages to see if it exists. If it does not, then check your vendor site for a downloadable package.

For amd, the concepts are like Sun's automounter, but the syntax for the amd maps is different. amd does not have a direct map in the same manner as Sun does. The mapping of root directories to maps is either done on the command line in simple cases, or by creating an amd.conf file. To perform the same function as the /etc/auto_master file shown in the previous example, we would add the following to the end of the amd command line:

/file  /etc/amd/file_map
As with the Sun automounter, the amd map files should be put into a single directory to ease the task of administering the amd files. The amd syntax for the file_map file would be as follows:

tree      -opts:=ro;type:=nfs;rfs:=/share/file/tree rhost:=server1
This tells amd to mount the shared file system /share/file/tree from server1 as the tree subdirectory of the /file top-level directory. The file system will be mounted read-only.

Again, this does not provide any failover protection. If the machine server1 goes down, then the client will hang until server1 comes back on line. As with Sun's automounter, amd can mount a file system from multiple servers and will monitor the servers to detect when one has gone down. To specify redundant file servers in amd, more rhost parameters must be added to the entry. So, to have two servers, server1 and server2, serving a replicated file system to the clients, the amd map entry looks like:

tree      -opts:=ro,soft,intr;type:=nfs;rfs:=/share/file/tree rhost:=server1 rhost:=server2
Now, if server1 goes down when the file system is not mounted, the client will automatically switch to server2 for the serving of the files. Unfortunately, unlike Sun's automounter, if a process is using the mount when the server goes down, that particular process will hang. Subsequent processes that access the file tree will not hang because amd will failover to the server that is still up, but you will be left with a hung process. By using the intr flag in the mount options, we can allow hung processes to be killed, and the soft option will allow file operations to fail on a server error rather than hang. Because of this behavior, any long-running processes accessing the file system when the server fails will have to be restarted. This can be done with a simple script that monitors the server availability and performs a restart on critical processes.

I noticed another quirk with amd (using am-utils version 6.0.1s11 on a NetBSD 1.5 machine). When using multiple servers for a mount, once amd has marked a server as down, it will not use that server to mount from, even after the server has come back up. Thus, you should not rely on amd to load balance your NFS traffic, because all the clients will automatically migrate to your most reliable server.

Synchronizing the Servers

To make failover work, multiple servers must contain complete copies of the exported file systems. One server is designated as the "master" repository for the file system, and a periodic job is run to update the files on the slave servers with any changes using something like rdist, rsync, or unison.

The tool rdist normally comes standard on UNIX systems and is intended for updating file trees from a master tree. The command for performing the update:

rdist -R -c /share/file/tree server2
will update the directory tree /file/tree on server2 from the server on which the rdist command was run. Though this is quite straightforward, there are some problems using rdist. In the past, rdist has been known for some easily exploitable security holes, and since rdist was, by default, a setuid program, it has been a target for intruders to gain root access. For this reason, rdist may have been removed or disabled. Another problem with rdist is that it requires equivalent access to the remote machine, which means that the user running rdist on the master has access to the other machines without requiring a password. This may cause security problems in some environments.

Another method of syncing the file trees is to use some free software called rsync. (See the Resources section for information on obtaining rsync and other tools mentioned here.) The rsync tool was written by Andrew Tridgell (the original author of Samba) and is designed to efficiently copy files from one server to another. rsync is efficient because it only transfers the differences between the files on the master and the files on the slave machine, which dramatically reduces the amount of data pushed over the network. Not only does rsync make efficient use of network bandwidth, but the transport method by which rsync talks to the remote machine can be manually selected. By default, rsync will use rsh to talk to the remote machine, but you can change this to use something like ssh or openssh, which will not only allow stronger verification of the remote machine's credentials but also allow the traffic between the two machines to be encrypted. To perform the same function as shown in the rdist command above, but using ssh as our transport mechanism, the rsync command looks like:

rsync -va --rsh /usr/local/bin/ssh --delete /share/file/tree/ server2:/share/file/tree/
One note of warning with rsync -- the trailing slashes on the directory paths in the above example are important. If these slashes are not there, rsync may eat your target directory contents. So, it is wise to try out the rsync command on a backed up sample directory to ensure you do not accidentally destroy something important.

Another problem with rdist and rsync is that both assume the master source is on a single machine, which limits where the updates can be performed. If files on the slave servers are updated without updating the master, then updates on the slave will be overwritten the next time the master distributes files to the slave servers. To overcome this, we can either strictly enforce the rule that only the master server files are updated, or we can use a program such as unison, which allows files to be updated on multiple servers and can synchronize the updated files between the servers automatically.

unison is currently under active development and is considered beta software by the authors. Despite the beta status, the capabilities that unison provides sound very promising in cases where there is either the lack of discipline or lack of will to enforce the rule of only updating one master server. unison will detect conflicting changes between two file trees so that the administrator can decide what to do about the conflict. If the file on one of the machines has been modified but is unchanged on the remote machine, then the two files are simply synchronized. By default, unison uses ssh as its transport, so data is not sent over the wire in the clear. The relative newness of unison does make selecting it over rdist or rsync a more risky proposition, but if you are willing to help debug the code or put up with some teething problems, then unison may be a good choice for your site.

Although the approach of having multiple servers is simple, it also has some disadvantages. The method can be used only for file systems that will be mounted read-only on the clients. Also, any updates to the file tree must be carried out only on the master machine -- unless a program like unison is used to update the servers. Finally, there is a small risk that the master server will fail just when it is performing the update. If you have multiple slave machines, you may encounter a situation in which one slave has a truncated version of a file (because the server went down during the copy) or some of the slave servers have an out-of-date version of some files (because the master went down part way through the update). If the slave servers are updated frequently and the replicated file system is small, the chance of inconsistencies arising is remote. However, these inconsistencies may cause much confusion if they occur. You must also take into account that you are replicating all the resources required to support the duplicated file systems. This includes all machines and disk space, adding to the cost of providing the failover capability. Also, there are some bursts of network traffic involved in the synchronization process when the servers replicate data between themselves, which may impact the client network. To address this, you may want to set up a private network between the servers rather than causing the replication traffic to go across the client network.

Conclusion

That's it for this time. In my next article, I will look at a more adventurous way of providing failover using a file system specifically designed to handle network disconnections between the client and the server.

Resources

rsync information can be found at: http://www.samba.org/

ssh information can be found at: http://www.cs.hut.fi/ssh

OpenSSH information can be found at: http://www.openssh.com/

unison information can be found at: http://www.utu.net/unison/unison.html

Brett Lymn works for a global company helping to manage a bunch of UNIX machines. He spends entirely too much time fiddling with computers but cannot help himself.