Cover V10, I02
Article

feb2001.tar


What To Do When the Server Doesn't Serve

Brett Lymn

Not that long ago when a server stopped serving files, most people would ask when the machine would be back on the air, smile ruefully, and wait patiently for their files to reappear on their network drives. The excuse "sorry, I cannot tell you that because the computer is down" was accepted, and people would call back later for the information. Those were the days. Servers are now expected to reliably serve up files 24 hours a day, 7 days a week. Downtime no longer is an inconvenience -- it costs money. If your Web server is down because of a file server failure, then people will rarely wait patiently for the Web server to come back on line. They will take their business, and their money, elsewhere. Because of this, there has been a lot of focus placed on building systems that do not rely on a single point of failure, so that even if one component fails, the system as a whole will continue functioning.

Hardware designers have been working at this for some time, and it shows in the latest machines that have dual this, hot-swap that, to provide the ability to ride out a hardware failure and repair the machine without requiring it to be shut down. Of course, the operating systems that sit on top of this hardware have been modified to exploit the new hardware features to provide resiliency in the face of failures. The problem now is that modern machines rarely live in isolation. They are typically networked to other machines with the result that all your hardware and software redundancy grinds to a halt when the file server goes down. To prevent this situation, we need to have a system that will detect when the file server has gone down and take some action to ensure the continuity of service in the face of a file server failure.

Managing Failover with cachefs

There are quite a few solutions to the problem of file serving failure. My three-part series of articles will contain a sampling of what is available. My articles will have a heavy Sun Solaris slant because that is what I work predominately with. The first article in the series describes providing failover using a little known feature of Sun's cachefs implementation that allows cachefs clients to operate disconnected from the file server.

Both Solaris 2 and amd running on Solaris 2 have a file system called cachefs. This filesystem uses the client's local disk to cache copies of files that originate from a remote server. The idea behind this file system is to improve performance by providing files at local disk access speeds for frequently used files, but still have the files served from a central location. Although amd supports the cachefs on Solaris 2, it does not provide all the features that Sun's implementation does and, hence, is not as useful in providing a fail-over service. In this article, I will concentrate on the features of the Sun cachefs implementation.

One of the interesting things with the cachefs is that once a file has been accessed on the client, the file resides on the client's local disk. So, if the cachefs can be convinced to serve up a (possibly out of date) file from cache, even if the origin server is currently down, then the system could mount files from a central server, but still ride out server outages if the files required by the client are in the client's local cache. With some judicious use of flags, this is exactly what we can do with the cachefs. In cachefs parlance, the file system mounted from the server is called the back file system, and the local cachefs file system is called the front file system. To create a cachefs mount, you first need to initialize the front file system storage on the client using the cachefs administration tool, cfsadmin, like this:

cfsadmin -c /var/cache
This creates and populates the file system cache in the directory /var/cache. The location of the cachefs cache directory is arbitrary; I will use the location /var/cache in the examples in this article. The contents of this directory must only be manipulated by the cachefs or cfsadmin. Any attempts to edit or rename files in the cache directory will probably corrupt the cache file system. After we have created the front file system, we can mount the back file system onto the cachefs:

mount -F cachefs -o backfstype=nfs,cachedir=/var/cache server1:/file/tree /file/tree
The options here tell the cachefs that the back file system is from a NFS server and that the cache directory is located in /var/cache. The client will now be caching files from server1 onto local disk in /var/cache. The whole process is transparent to the file system user; all the user sees is a normal directory and is not aware that the files are not coming straight from the server. The example as given will not handle a server outage. If the server goes down, then the cachefs mount in the above example will hang when attempting to validate the file on the server.

Fortunately, the cachefs has some options that help the situation. One of these options is the "local-access" cachefs mount option. This option tells the cachefs to use the file attributes of the cached copy of the file, rather than validate those file attributes on the back file system server. This is meant to save a round trip to the server when checking the file attributes, but it also serves to decouple the cachefs a bit more from the back file system server; we no longer have to rely on the server's being up to get file attributes for files that are in cache.

Another pair of handy options are the demandconst and noconst, which affect the way the cachefs validates the contents of the cache against the back file system server. Normally, the cachefs periodically automatically validates the contents of the cache against the back file system server. By using the demandconst mount flag, you can indicate to the cachefs that validation will be done manually using the cfsadmin -s command. The noconst mount option tells the cachefs that the cache will never be validated against the back file system server.

Either of these mount options are good if the files on the back file system are modified infrequently. With the demandconst mount option, the clients can be instructed to revalidate their caches after the changes have been made. With the noconst mount option, the client cachefs must have the file in the cache cleared so that the updates will flow through. Note with both the automatic validation and the demandconst mount option, if the back file system server is down when the cache object is validated, then the cachefs will remove the object from the cache. Clearly, this is undesirable if the primary reason for running the cachefs is to provide some resiliency in the server mount. An approach to this problem is to use the demandconst mount option on the client and to either NFS ping the server prior to requesting an update to ensure the validation will work, or to make the server send a signal out to inform the clients to revalidate their cached files. Fortunately, this is not neccessary.

Sun has built a mount option into the cachefs called disconnectable. This option is only available when the back file system type is NFS. The disconnectable option is very poorly documented -- it does not appear in the man pages for mount_cachefs, nor is there a man page for the daemon called cachefsd that supports the disconnectable mode. I found out about this mode by chance when I was searching the Sun support pages looking for patches that I may need to apply to my system. I found an infodoc number 21701 that provided information on how to set up the cachefs in disconnectable mode. The procedure is quite simple. You create the directory:

/etc/fs/cachefs
and add the mount option "disconnectable" to the cachefs mount command. The mount command now looks like:

mount -F cachefs -o backfstype=nfs,cachedir=/var/cache,disconnectable \
server1:/file/tree /file/tree
Or if you want the file system to be mounted automatically when the machine boots, then add a line like the one below to the /etc/vfstab file:

server1:/file/tree /var/cache /file/tree cachefs - yes \
backfstype=nfs,cachedir=/var/cache,disconnectable,local-access
Those who are familiar with the syntax of the vfstab entry will note that where there is normally a raw device name, we have the directory /var/cache. This field is used during boot to fsck the file system, so for a cachefs file system must provide the cache directory for fsck_cachefs to operate on.To properly implement the disconnectable mode, you will need to make an entry in the /etc/vfstab and either run the cachefs scripts from /etc/init.d or simply reboot the system because there is a supporting daemon, cachefsd, that needs to be run.

Understanding Disconnectable Mode

The disconnectable mode changes the behavior of the cachefs in some subtle ways. First, the back file system is mounted in what Sun terms a "semi-soft" mode, which allows the back file system access to fail without hanging (much like a soft NFS mount). However, if the accessed file exists in the cache, then the requestor will receive the cached copy of the file instead of seeing an error that would be seen if the mount was a usual NFS soft mount. In disconnectable mode, the cachefs will block writes to the file system as a normal hard NFS mount. Second, the operating system starts a cachefsd daemon on the first access to the cachefs filesystem. This daemon monitors the link to the back file system server and will manage the process of keeping the client cache in sync with the files on the back file system server.

By using the disconnectable mount option, the behavior of the file attributes fetching changes as well. If the back file system server is down, then the cachefs will wait for the RPCs to the server to time out, but rather than returning an error, the cachefs detects the failure and offers the file attributes from the cached object instead. You can speed things up in disconnectable mode by still using the local-access mount flag to prevent the cachefs from checking the file attributes on the server. This provides a noticeable improvement if the server is down. Another advantage of the cachefs in disconnectable mode is that it supports writes to the file system, even when the back file system server is down, but only if you mount the cachefs with the non-shared mount option. This option indicates that only one client will modify the files on the server. In this manner, Sun has neatly avoided the problem of multiple clients modifying a file when they are disconnected from the server by limiting who can modify the files. If you do not have the non-shared mount flag, then attempts to write to the file system when the back file system server is down will result in the write blocking until the server comes back up.

What to Watch For

As with any solution, the cachefs solution has problems. If the files you are interested in are not in the local cache when the server is down, then you will get a read failure on those files. To make matters worse, there is no way to tell the cachefs system which files are desired to be in the client cache except by performing a read of each file that is deemed critical. You also must provision sufficient client-side storage to accommodate all the files that you want available during a possible server outage. Normally, you do not need to match byte for byte server storage capacity on the client. It is expected that the client will only use a small subset of the server files, and some objects in the cachefs can be purged to make room for newer objects. If you use the cachefs as a failover store, then you must ensure there is enough client-side storage for the cachefs cache to hold all the files that you need during a server outage. Also, the limitation of the number of clients that can update the file system (although it does avoid the problems of conflicting updates) may be problematic in some circumstances. Also watch the default size limit for caching files in the cachefs. This limit is set at 3 MB, but can be increased by using the cfsadmin command like this:

cfsadmin -o maxfilesize=5 /var/cache
The effect of this command is to increase the maximum size limit of the files cached by the cachefs to 5 MB. This limit may need to be tweaked to cope with the largest file that is required to be cached so that it is available during a server outage. In addition to the file size limit, there are a lot of options available in cfsadmin that control the resource utilization of the cachefs. Details about these options are outside the scope of this article, but if you are planning to cache a large number of files, check the man page of cfsadmin to ensure that all the files you require to be cached will get cached without exceeding the default resource limits. Finally, when the cachefs is in disconnectable mode, the cachefs behaves like the demandconst mount flag is being used. That is, changes to the back file system are not propagated unless you run:

cfsadmin -s all
This behavior is odd, but it can be worked around by putting a cron job onto the client system to periodically run the cfsadmin command. If the server is down, the cfsadmin will quit without doing anything. Note the difference disconnectable mode makes here -- if you just use demandconst and run the cfsadmin -s command when the server is down, the contents of the cachefs will be flushed, but in disconnectable mode the files stay intact.

Conclusion

That about does it for cachefs, which gives you the option to use some local machine resources to provide a way to ride out server outages. This approach is best suited to situations where you have a small number of files necessary for the local machine operation. Larger file trees will spend more time doing the validation of the cachefs contents and will also increase the network traffic while performing the validation, which may make the cachefs approach undesirable.

Furthermore, if you are planning to use the cachefs, check that you have the appropriate patches for your version of Solaris. Even though cachefs has been available since Solaris 2.3, the disconnectable mode was a later addition, which may require a patch to your system. In my next article, I will cover a more traditional way of implementing failover by using replicated servers.

Brett Lymn works for a global company helping to manage a bunch of UNIX machines. He spends entirely too much time fiddling with computers but cannot help himself.