What
to Do When the Server Doesn't Serve -- Duplicating Data
Brett Lymn
In my previous article, (Sys Admin, February 2001) I talked
about providing file system failover using features of Sun's
cachefs implementation to provide a transparent serving of files
when the server is down. In this article, I will take a more traditional
approach to server failover and discuss duplicating data on multiple
servers and causing client machines to select another server if
the one that they are talking to stops responding.
The process by which a client automatically mounts file systems
from a server is called an automounter. An automounter is a daemon
that runs on the client machine that monitors a configured set of
directory paths. When an access is made to one of these monitored
paths, the daemon mounts the appropriate portion of the file system
from a remote server using NFS to make it appear as though the path
was always there. After a period of inactivity on the mounted path,
the daemon will unmount the path to prevent the system from accumulating
mounted file systems. On Sun systems, this is simply called the
automounter; for other systems, there is an open source equivalent
called amd. Both Sun's automounter and amd allow
for multiple servers to serve the same filesystem. On versions of
Solaris prior to 2.6, the binding of a client to a particular server
was resolved only at mount time. So, if the server from which your
client automounted a file system went down, too bad, you had to
reboot the client to force it to mount the file system from the
other server.
For amd and Sun's automounter on Solaris 2.6 and later,
the automounter daemon monitors the connection to the server. If
the connection to the server fails, the automounter automatically
tries to connect to the other listed servers so that the client
can continue working.
Sun Automounter
For Sun's automounter, two files are required to start the
automounter: one is called the direct map and the other is the indirect
map. The direct map may contain automounter configuration lines
but, in practice, this is discouraged because any changes to the
direct map require the automounter to be restarted before they take
effect. Normally, the automounter direct map will contain names
of files. These files are called indirect maps and are commonly
used to hold the automounter configuration because the indirect
maps can be updated and take effect without restarting the automounter.
The direct map is normally a file called /etc/auto_master
and contains lines like the following:
/file /usr/automount/file_map
that specify the locations of the indirect maps. In the example above,
the automounted directory /file is controlled by the indirect
map /usr/automount/file_map. Note that you may store the indirect
maps anywhere that is accessible to the automounter daemon. My preference
is to put all of the indirect maps into a single directory, which
I typically share out to all the clients. As mentioned before, you
may put the contents of an indirect map into the direct map, but doing
so will create headaches later if you change your indirect maps. For
a typical automounted directory, the indirect map contains lines of
the form:
tree server1:/share/file/tree
which tell the automounter that if a process accesses the directory
/file/tree, it should mount the file system /share/file/tree
from the machine server1 as /file/tree. The above example gives
no failover protection. If server1 goes down, then the client will
hang. The automounter syntax for specifying failover servers is a
space-separated list of file server names and directories in the automounter
entry, like this:
tree server1:/share/file/tree server2:/share/file/tree
In this case (for automounters running on Solaris 2.6 and later),
if server1 goes down, then the automounter will automatically change
over to server2 for file serving. As noted earlier, automounters running
on versions of Solaris prior to 2.6 will need to have the machine
rebooted to change file servers.
amd
amd may already be installed on your system. There are
quite a few vendors that bundle amd as part of the operating
system or as a package on the distributions CD. Red Hat Linux, for
example, has amd as a package on the latest distribution
CDs. Check your man pages to see if it exists. If it does not, then
check your vendor site for a downloadable package.
For amd, the concepts are like Sun's automounter,
but the syntax for the amd maps is different. amd
does not have a direct map in the same manner as Sun does. The mapping
of root directories to maps is either done on the command line in
simple cases, or by creating an amd.conf file. To perform
the same function as the /etc/auto_master file shown in the
previous example, we would add the following to the end of the amd
command line:
/file /etc/amd/file_map
As with the Sun automounter, the amd map files should be put
into a single directory to ease the task of administering the amd
files. The amd syntax for the file_map file would be
as follows:
tree -opts:=ro;type:=nfs;rfs:=/share/file/tree rhost:=server1
This tells amd to mount the shared file system /share/file/tree
from server1 as the tree subdirectory of the /file top-level
directory. The file system will be mounted read-only.
Again, this does not provide any failover protection. If the machine
server1 goes down, then the client will hang until server1 comes
back on line. As with Sun's automounter, amd can mount
a file system from multiple servers and will monitor the servers
to detect when one has gone down. To specify redundant file servers
in amd, more rhost parameters must be added to the entry.
So, to have two servers, server1 and server2, serving a replicated
file system to the clients, the amd map entry looks like:
tree -opts:=ro,soft,intr;type:=nfs;rfs:=/share/file/tree rhost:=server1 rhost:=server2
Now, if server1 goes down when the file system is not mounted, the
client will automatically switch to server2 for the serving of the
files. Unfortunately, unlike Sun's automounter, if a process
is using the mount when the server goes down, that particular process
will hang. Subsequent processes that access the file tree will not
hang because amd will failover to the server that is still
up, but you will be left with a hung process. By using the intr
flag in the mount options, we can allow hung processes to be killed,
and the soft option will allow file operations to fail on a
server error rather than hang. Because of this behavior, any long-running
processes accessing the file system when the server fails will have
to be restarted. This can be done with a simple script that monitors
the server availability and performs a restart on critical processes.
I noticed another quirk with amd (using am-utils
version 6.0.1s11 on a NetBSD 1.5 machine). When using multiple servers
for a mount, once amd has marked a server as down, it will
not use that server to mount from, even after the server has come
back up. Thus, you should not rely on amd to load balance
your NFS traffic, because all the clients will automatically migrate
to your most reliable server.
Synchronizing the Servers
To make failover work, multiple servers must contain complete
copies of the exported file systems. One server is designated as
the "master" repository for the file system, and a periodic
job is run to update the files on the slave servers with any changes
using something like rdist, rsync, or unison.
The tool rdist normally comes standard on UNIX systems
and is intended for updating file trees from a master tree. The
command for performing the update:
rdist -R -c /share/file/tree server2
will update the directory tree /file/tree on server2 from the
server on which the rdist command was run. Though this is quite
straightforward, there are some problems using rdist. In the
past, rdist has been known for some easily exploitable security
holes, and since rdist was, by default, a setuid program, it
has been a target for intruders to gain root access. For this reason,
rdist may have been removed or disabled. Another problem with
rdist is that it requires equivalent access to the remote machine,
which means that the user running rdist on the master has access
to the other machines without requiring a password. This may cause
security problems in some environments.
Another method of syncing the file trees is to use some free software
called rsync. (See the Resources section for information
on obtaining rsync and other tools mentioned here.) The rsync
tool was written by Andrew Tridgell (the original author of Samba)
and is designed to efficiently copy files from one server to another.
rsync is efficient because it only transfers the differences
between the files on the master and the files on the slave machine,
which dramatically reduces the amount of data pushed over the network.
Not only does rsync make efficient use of network bandwidth,
but the transport method by which rsync talks to the remote
machine can be manually selected. By default, rsync will
use rsh to talk to the remote machine, but you can change
this to use something like ssh or openssh, which will
not only allow stronger verification of the remote machine's
credentials but also allow the traffic between the two machines
to be encrypted. To perform the same function as shown in the rdist
command above, but using ssh as our transport mechanism,
the rsync command looks like:
rsync -va --rsh /usr/local/bin/ssh --delete /share/file/tree/ server2:/share/file/tree/
One note of warning with rsync -- the trailing slashes
on the directory paths in the above example are important. If these
slashes are not there, rsync may eat your target directory
contents. So, it is wise to try out the rsync command on a
backed up sample directory to ensure you do not accidentally destroy
something important.
Another problem with rdist and rsync is that both
assume the master source is on a single machine, which limits where
the updates can be performed. If files on the slave servers are
updated without updating the master, then updates on the slave will
be overwritten the next time the master distributes files to the
slave servers. To overcome this, we can either strictly enforce
the rule that only the master server files are updated, or we can
use a program such as unison, which allows files to be updated
on multiple servers and can synchronize the updated files between
the servers automatically.
unison is currently under active development and is considered
beta software by the authors. Despite the beta status, the capabilities
that unison provides sound very promising in cases where
there is either the lack of discipline or lack of will to enforce
the rule of only updating one master server. unison will
detect conflicting changes between two file trees so that the administrator
can decide what to do about the conflict. If the file on one of
the machines has been modified but is unchanged on the remote machine,
then the two files are simply synchronized. By default, unison
uses ssh as its transport, so data is not sent over the wire
in the clear. The relative newness of unison does make selecting
it over rdist or rsync a more risky proposition, but
if you are willing to help debug the code or put up with some teething
problems, then unison may be a good choice for your site.
Although the approach of having multiple servers is simple, it
also has some disadvantages. The method can be used only for file
systems that will be mounted read-only on the clients. Also, any
updates to the file tree must be carried out only on the master
machine -- unless a program like unison is used to update
the servers. Finally, there is a small risk that the master server
will fail just when it is performing the update. If you have multiple
slave machines, you may encounter a situation in which one slave
has a truncated version of a file (because the server went down
during the copy) or some of the slave servers have an out-of-date
version of some files (because the master went down part way through
the update). If the slave servers are updated frequently and the
replicated file system is small, the chance of inconsistencies arising
is remote. However, these inconsistencies may cause much confusion
if they occur. You must also take into account that you are replicating
all the resources required to support the duplicated file systems.
This includes all machines and disk space, adding to the cost of
providing the failover capability. Also, there are some bursts of
network traffic involved in the synchronization process when the
servers replicate data between themselves, which may impact the
client network. To address this, you may want to set up a private
network between the servers rather than causing the replication
traffic to go across the client network.
Conclusion
That's it for this time. In my next article, I will look
at a more adventurous way of providing failover using a file system
specifically designed to handle network disconnections between the
client and the server.
Resources
rsync information can be found at: http://www.samba.org/
ssh information can be found at: http://www.cs.hut.fi/ssh
OpenSSH information can be found at: http://www.openssh.com/
unison information can be found at: http://www.utu.net/unison/unison.html
Brett Lymn works for a global company helping to manage a bunch
of UNIX machines. He spends entirely too much time fiddling with
computers but cannot help himself.
|