What
To Do When the Server Doesn't Serve
Brett Lymn
Not that long ago when a server stopped serving files, most people
would ask when the machine would be back on the air, smile ruefully,
and wait patiently for their files to reappear on their network
drives. The excuse "sorry, I cannot tell you that because the
computer is down" was accepted, and people would call back
later for the information. Those were the days. Servers are now
expected to reliably serve up files 24 hours a day, 7 days a week.
Downtime no longer is an inconvenience -- it costs money. If
your Web server is down because of a file server failure, then people
will rarely wait patiently for the Web server to come back on line.
They will take their business, and their money, elsewhere. Because
of this, there has been a lot of focus placed on building systems
that do not rely on a single point of failure, so that even if one
component fails, the system as a whole will continue functioning.
Hardware designers have been working at this for some time, and
it shows in the latest machines that have dual this, hot-swap that,
to provide the ability to ride out a hardware failure and repair
the machine without requiring it to be shut down. Of course, the
operating systems that sit on top of this hardware have been modified
to exploit the new hardware features to provide resiliency in the
face of failures. The problem now is that modern machines rarely
live in isolation. They are typically networked to other machines
with the result that all your hardware and software redundancy grinds
to a halt when the file server goes down. To prevent this situation,
we need to have a system that will detect when the file server has
gone down and take some action to ensure the continuity of service
in the face of a file server failure.
Managing Failover with cachefs
There are quite a few solutions to the problem of file
serving failure. My three-part series of articles will contain a
sampling of what is available. My articles will have a heavy Sun
Solaris slant because that is what I work predominately with. The
first article in the series describes providing failover using a
little known feature of Sun's cachefs implementation
that allows cachefs clients to operate disconnected from
the file server.
Both Solaris 2 and amd running on Solaris 2 have a file
system called cachefs. This filesystem uses the client's
local disk to cache copies of files that originate from a remote
server. The idea behind this file system is to improve performance
by providing files at local disk access speeds for frequently used
files, but still have the files served from a central location.
Although amd supports the cachefs on Solaris 2, it
does not provide all the features that Sun's implementation
does and, hence, is not as useful in providing a fail-over service.
In this article, I will concentrate on the features of the Sun cachefs
implementation.
One of the interesting things with the cachefs is that
once a file has been accessed on the client, the file resides on
the client's local disk. So, if the cachefs can be convinced
to serve up a (possibly out of date) file from cache, even if the
origin server is currently down, then the system could mount files
from a central server, but still ride out server outages if the
files required by the client are in the client's local cache.
With some judicious use of flags, this is exactly what we can do
with the cachefs. In cachefs parlance, the file system
mounted from the server is called the back file system, and the
local cachefs file system is called the front file system.
To create a cachefs mount, you first need to initialize the
front file system storage on the client using the cachefs
administration tool, cfsadmin, like this:
cfsadmin -c /var/cache
This creates and populates the file system cache in the directory
/var/cache. The location of the cachefs cache directory
is arbitrary; I will use the location /var/cache in the examples
in this article. The contents of this directory must only be manipulated
by the cachefs or cfsadmin. Any attempts to edit or
rename files in the cache directory will probably corrupt the cache
file system. After we have created the front file system, we can mount
the back file system onto the cachefs:
mount -F cachefs -o backfstype=nfs,cachedir=/var/cache server1:/file/tree /file/tree
The options here tell the cachefs that the back file
system is from a NFS server and that the cache directory is located
in /var/cache. The client will now be caching files from server1
onto local disk in /var/cache. The whole process is transparent
to the file system user; all the user sees is a normal directory and
is not aware that the files are not coming straight from the server.
The example as given will not handle a server outage. If the server
goes down, then the cachefs mount in the above example will
hang when attempting to validate the file on the server.
Fortunately, the cachefs has some options that help the
situation. One of these options is the "local-access"
cachefs mount option. This option tells the cachefs
to use the file attributes of the cached copy of the file, rather
than validate those file attributes on the back file system server.
This is meant to save a round trip to the server when checking the
file attributes, but it also serves to decouple the cachefs
a bit more from the back file system server; we no longer have to
rely on the server's being up to get file attributes for files
that are in cache.
Another pair of handy options are the demandconst and noconst,
which affect the way the cachefs validates the contents of
the cache against the back file system server. Normally, the cachefs
periodically automatically validates the contents of the cache against
the back file system server. By using the demandconst mount
flag, you can indicate to the cachefs that validation will
be done manually using the cfsadmin -s command. The noconst
mount option tells the cachefs that the cache will never
be validated against the back file system server.
Either of these mount options are good if the files on the back
file system are modified infrequently. With the demandconst
mount option, the clients can be instructed to revalidate their
caches after the changes have been made. With the noconst
mount option, the client cachefs must have the file in the
cache cleared so that the updates will flow through. Note with both
the automatic validation and the demandconst mount option,
if the back file system server is down when the cache object is
validated, then the cachefs will remove the object from the
cache. Clearly, this is undesirable if the primary reason for running
the cachefs is to provide some resiliency in the server mount.
An approach to this problem is to use the demandconst mount
option on the client and to either NFS ping the server prior
to requesting an update to ensure the validation will work, or to
make the server send a signal out to inform the clients to revalidate
their cached files. Fortunately, this is not neccessary.
Sun has built a mount option into the cachefs called disconnectable.
This option is only available when the back file system type is
NFS. The disconnectable option is very poorly documented --
it does not appear in the man pages for mount_cachefs, nor
is there a man page for the daemon called cachefsd that supports
the disconnectable mode. I found out about this mode by chance when
I was searching the Sun support pages looking for patches that I
may need to apply to my system. I found an infodoc number
21701 that provided information on how to set up the cachefs
in disconnectable mode. The procedure is quite simple. You create
the directory:
/etc/fs/cachefs
and add the mount option "disconnectable" to the
cachefs mount command. The mount command now looks like:
mount -F cachefs -o backfstype=nfs,cachedir=/var/cache,disconnectable \ server1:/file/tree /file/tree
Or if you want the file system to be mounted automatically
when the machine boots, then add a line like the one below to the
/etc/vfstab file:
server1:/file/tree /var/cache /file/tree cachefs - yes \ backfstype=nfs,cachedir=/var/cache,disconnectable,local-access
Those who are familiar with the syntax of the vfstab
entry will note that where there is normally a raw device name, we
have the directory /var/cache. This field is used during boot
to fsck the file system, so for a cachefs file system
must provide the cache directory for fsck_cachefs to operate
on.To properly implement the disconnectable mode, you will need to
make an entry in the /etc/vfstab and either run the cachefs
scripts from /etc/init.d or simply reboot the system because
there is a supporting daemon, cachefsd, that needs to be run.
Understanding Disconnectable Mode
The disconnectable mode changes the behavior of the cachefs
in some subtle ways. First, the back file system is mounted in what
Sun terms a "semi-soft" mode, which allows the back file
system access to fail without hanging (much like a soft NFS mount).
However, if the accessed file exists in the cache, then the requestor
will receive the cached copy of the file instead of seeing an error
that would be seen if the mount was a usual NFS soft mount. In disconnectable
mode, the cachefs will block writes to the file system as
a normal hard NFS mount. Second, the operating system starts a cachefsd
daemon on the first access to the cachefs filesystem. This
daemon monitors the link to the back file system server and will
manage the process of keeping the client cache in sync with the
files on the back file system server.
By using the disconnectable mount option, the behavior of the
file attributes fetching changes as well. If the back file system
server is down, then the cachefs will wait for the RPCs to
the server to time out, but rather than returning an error, the
cachefs detects the failure and offers the file attributes
from the cached object instead. You can speed things up in disconnectable
mode by still using the local-access mount flag to prevent the cachefs
from checking the file attributes on the server. This provides a
noticeable improvement if the server is down. Another advantage
of the cachefs in disconnectable mode is that it supports
writes to the file system, even when the back file system server
is down, but only if you mount the cachefs with the non-shared
mount option. This option indicates that only one client will modify
the files on the server. In this manner, Sun has neatly avoided
the problem of multiple clients modifying a file when they are disconnected
from the server by limiting who can modify the files. If you do
not have the non-shared mount flag, then attempts to write to the
file system when the back file system server is down will result
in the write blocking until the server comes back up.
What to Watch For
As with any solution, the cachefs solution has problems.
If the files you are interested in are not in the local cache when
the server is down, then you will get a read failure on those files.
To make matters worse, there is no way to tell the cachefs
system which files are desired to be in the client cache except
by performing a read of each file that is deemed critical. You also
must provision sufficient client-side storage to accommodate all
the files that you want available during a possible server outage.
Normally, you do not need to match byte for byte server storage
capacity on the client. It is expected that the client will only
use a small subset of the server files, and some objects in the
cachefs can be purged to make room for newer objects. If
you use the cachefs as a failover store, then you must ensure
there is enough client-side storage for the cachefs cache
to hold all the files that you need during a server outage. Also,
the limitation of the number of clients that can update the file
system (although it does avoid the problems of conflicting updates)
may be problematic in some circumstances. Also watch the default
size limit for caching files in the cachefs. This limit is
set at 3 MB, but can be increased by using the cfsadmin command
like this:
cfsadmin -o maxfilesize=5 /var/cache
The effect of this command is to increase the maximum size
limit of the files cached by the cachefs to 5 MB. This limit
may need to be tweaked to cope with the largest file that is required
to be cached so that it is available during a server outage. In addition
to the file size limit, there are a lot of options available in cfsadmin
that control the resource utilization of the cachefs. Details
about these options are outside the scope of this article, but if
you are planning to cache a large number of files, check the man page
of cfsadmin to ensure that all the files you require to be
cached will get cached without exceeding the default resource limits.
Finally, when the cachefs is in disconnectable mode, the cachefs
behaves like the demandconst mount flag is being used. That
is, changes to the back file system are not propagated unless you
run:
cfsadmin -s all
This behavior is odd, but it can be worked around by putting
a cron job onto the client system to periodically run the cfsadmin
command. If the server is down, the cfsadmin will quit without
doing anything. Note the difference disconnectable mode makes here
-- if you just use demandconst and run the cfsadmin
-s command when the server is down, the contents of the cachefs
will be flushed, but in disconnectable mode the files stay intact.
Conclusion
That about does it for cachefs, which gives you
the option to use some local machine resources to provide a way
to ride out server outages. This approach is best suited to situations
where you have a small number of files necessary for the local machine
operation. Larger file trees will spend more time doing the validation
of the cachefs contents and will also increase the network
traffic while performing the validation, which may make the cachefs
approach undesirable.
Furthermore, if you are planning to use the cachefs, check
that you have the appropriate patches for your version of Solaris.
Even though cachefs has been available since Solaris 2.3,
the disconnectable mode was a later addition, which may require
a patch to your system. In my next article, I will cover a more
traditional way of implementing failover by using replicated servers.
Brett Lymn works for a global company helping to manage a bunch
of UNIX machines. He spends entirely too much time fiddling with
computers but cannot help himself.
|