Coda -- The Disconnectable File System
Brett Lymn
In a previous series of articles in Sys Admin, I mentioned
Coda as a possible method to provide services to clients in the
event of the server going down. However, I haven't covered
the installation of Coda because it was too involved for the topics
I previously covered. In this article, I will cover the installation
and configuration of a small-scale Coda server and client. Some
of the setup of Coda presented here is only suitable for a small
Coda cell, but this article should give you enough to get started
experimenting with Coda and give you some experience that can be
applied to larger Coda networks.
Coda is a file system descended from the AFS-2 sources that have
been modified to provide features that were not available in the
original implementation. The primary new feature is the ability
of the client to be disconnected from the server and still use the
mounted file system. This feature can be very handy if you have
mobile laptops or an unreliable network. The users will have access
to their work, within limits, regardless of whether the network
connection is present. Unfortunately, Coda is still an experimental
file system. The problem that Coda tries to solve involves some
difficult situations that must be handled gracefully; currently,
this simply does not happen in some cases, but improvements are
constantly being made. The Coda server and client do work, but there
is the risk that you will experience problems that may result in
loss of data. Thus, having a good backup regimen, as always, is
very smart.
Installing Coda for most systems is a matter of downloading the
correct packages and loading them. In this article, I will cover
the process for loading Coda onto a NetBSD machine; other operating
systems will have other methods, but once the server and client
are installed, the process of setting up Coda is the same.
There are two parts to installing Coda -- a kernel module
and a set of user-level tools. There are kernel modules available
for NetBSD, FreeBSD, OpenBSD, Linux, and Solaris. There is also
an experimental MS Windows-based client that is recommended only
to the very brave and experienced with Coda. The kernel module must
be installed to provide the kernel-level services required by the
user-level Coda daemons. As an example, I will show how to install
Coda on a NetBSD machine. Instructions for installing Coda on other
systems is available on the Coda Web pages.
To install Coda on a NetBSD system, you must first configure the
kernel to enable the kernel Coda code by adding the following lines
to your kernel config file:
file-system coda
pseudo-device vcoda 4
Then build, install, and reboot with the new kernel. For the Coda
file server, you need to install the Coda server package by doing
the following with the latest pkgsrc, which can be downloaded
from the NetBSD ftp repository:
cd /usr/pkg/pkgsrc/coda5_server
make
make install
This will set up the binaries for supporting a Coda server, but the
server is not yet configured. You may also install the Coda client
package on the server as the server may also be a client itself. Naturally,
any Coda client needs to have the Coda client package installed by
doing the following:
cd /usr/pkg/pkgsrc/coda5_client
make
make install
Again, this will install the Coda client binaries but will not set
up Coda for use. If you have the room, you may wish to install the
Coda documentation package, which will give you all the manual pages
for the Coda binaries.
Once the server binaries are installed and the support for Coda
is in the kernel, we can start setting up the Coda server. The server
setup is started by typing:
vice-setup
This will prompt you for the parameters for setting up the server.
Unless you have installed a Coda server before, you should probably
accept the defaults for most of the questions. You will need to select
a non-root user that is going to be the administrator of the Coda
file system, and must know the userid and user name. Also, if this
is the first Coda server in the cell, then it must be the SCM; if
you intend to experiment with server replication then subsequent servers
must be installed as non-SCM machines. The SCM machine is the one
that holds the master copy of the user details, which are automatically
replicated to the other servers by Coda, so it is important there
is only one SCM per Coda cell. As I said previously, most of the setup
options can be set to the recommended or default values. Two questions
that may cause some grief are the locations of the RVM data and log
files.
The data file is used to store a memory image of the RVM (Recoverable
Virtual Memory) system so that the internal state of the Coda file
system is retained across reboots of the machine. For performance
reasons, the Coda team recommends that this be on a raw partition.
However, for the purposes of evaluating Coda, a normal file on the
file system will be adequate. Setting this to /usr/local/coda/data
will use the file named "data" in the directory /usr/local/coda.
The log file referred to here is not a log file in the sense that
you may be used to (i.e., a file to which a daemon writes status/debug
messages). The log file in this situation is a log of what has changed
on the Coda file system. For performance reasons, it is recommended
that you place this log onto a raw partition. For the purposes of
experimentation where performance is not a big issue, it is acceptable
to place this log file onto a normal file system since most people
do not have a spare raw partition on their systems. Setting this
to /usr/local/coda/log will use the file named "log"
in the directory /usr/local/coda.
With both the above mentioned log and data files, if you are making
these plain files, ensure that the directory to hold the files exists.
The script does not create the directory for you, and the setup
script will fail if the directory does not already exist.
At the end of the vice-setup script, there will be some instructions
for starting up the various daemons associated with a Coda server.
Start the processes in the order indicated by the output of the
script and ensure all the daemons start up and are running before
proceeding. To start the Coda server when the server machine boots,
you can add the line:
/usr/pkg/etc/rc.d/rc.vice start
to the /etc/rc.local (on a NetBSD system) to start the Coda
server on boot.
Once the Coda server is up and running, we can set up the client.
There is a simple script that sets up venus (the Coda client daemon):
venus-setup server_name size
where server_name is the name of your Coda server, which can
be a comma-separated list of servers (but because we are doing a simple
example, we only have one server). Remember, a Coda server can also
have a client running on it, so it is valid to put the machine's
own name as the server, if it is actually the Coda server. The size
is the size of the client cache in kilobytes; it is recommended that
the client cache be at least 22 MB, so putting 22000 here will work.
Note that the Coda client creates a directory tree under /usr/Coda
so you must have more than 22 MB of free space on your /usr
partition to accommodate the Coda client files.
Machines that are going to be Coda clients must run the Coda client
daemon, venus. For convenience, this can be started when the machine
boots. On a NetBSD system, this can be achieved by adding the line:
/usr/pkg/sbin/venus &
to the /etc/rc.local file. Once the venus daemon is running,
the Coda filesystem will be accessible under /coda.
Now that the Coda client and server are running, we are ready
to start using the file system. The first task is to create a volume
under the root volume. You can just use the root volume to store
data in, but this can lead to problems during a recovery, so it
is best to create a volume under the root to hold your data. To
create the volume, type:
createvol_rep testvol E0000100 /vicepa
which requests that Coda make a replicated volume; this is a volume
that will be replicated to servers in the Coda cell. In this case,
the new volume name is testvol. The next parameter is the volume storage
group (VSG) address, which not only uniquely identifies the volume
in the Coda cell but also is used to define which servers the volume
will be replicated to. You can find this number by looking for an
entry for your server in /vice/db/VSGDB. If the volume is replicated
to multiple servers, then they will all be listed in the entry. Finally,
you have the location of the Coda server file store and if you have
accepted the defaults during vice-setup, then this will be /vicepa.
After the volume has been created, you need to make a mount point
for it in the Coda file system by typing:
cfs mkmount /coda/testvol testvol
which will make a mountpoint of /coda/testvol for the volume
testvol. When this is done, the replicated volume will appear in your
Coda file system and will be ready for use. You can check on the status
of your newly created volume by using:
cfs listvol /coda/testvol
which will give you information about the state of the volume, how
much space it is using, the volume id, and various other useful information.
When you set up the Coda server, a single administrative user
account is set up with a default password of "changeme".
As the password indicates, this should be changed at the first opportunity
by using the cpasswd command. This password is the one you
use in conjunction with the clog command (discussed later)
to authenticate to the Coda server. If you want to have more than
one person experimenting with Coda, you will need to set up login
id's for them. To add a new user, use the au command
with the nu (new user) option like this:
au nu
You will need to authenticate using the Coda administrator username
and password to add a new user. After you have authenticated, you
will be prompted for the new user details. One thing to watch for
is that au echos the new user's password information to
the screen as it is typed in, so you may want to be careful who is
watching when you set up the new users.
Permissions on the Coda file system are handled by using Access
Control Lists (ACL), which provide a finer grained control over
permissions than the traditional UNIX access permissions. The ACL
can not only control read and write but can also control who can
insert, delete, lookup, or lock files on the Coda volume. The ACLs
are set and listed using the cfs command, which is described
later in this article. In addition to the ACL, Coda emulates the
standard UNIX mode bits for the user because there are some applications
that encode information into these mode bits. However, note that
the ACL for the filesystem object is evaluated first and only if
the ACL allows access will the mode bits be used. Note that irrespective
of any ACL or mode bits, the volume will be read-only until the
user has authenticated to the server.
On a machine running the Coda client, a user will be able to see
a read-only copy of the Coda file system. To modify files, or to
integrate changes after having been disconnected, the user must
authenticate to the Coda server using either the administrator login
details set up on install, or a login set up using the au
command. The authentication is done using the clog command,
which passes the authentication details to the server. If the authentication
details are correct (i.e., the username and password are correct),
then the user is issued a Coda token. This Coda token authorizes
the user to write to the Coda file system and has a life of 25 hours,
after which the user must again authenticate. The status of the
user's token can be viewed by using the ctokens command.
When the server and client processes are running and the user
has authenticated using clog, the Coda file system looks
like any other file system mounted on the machine. It shows up in
the mount list like this:
coda on /coda type coda
The files in the file system can be manipulated using the standard
tools that operate on normal files. One difference that may be noticeable
is if the client is disconnected from the server and you request a
file that is not in the local cache. In this case, you will get an
error telling you the resource is unavailable. To help avoid this
situation, there is a tool called hoard, which I will discuss
later, that can be used to hint to the Coda client which files you
wish to be kept in the local client's cache.
After venus is running on a client machine, you should monitor
the status of the Coda file system by using the Codacon tool. This
will provide status of the Coda client's connection to the
server and also will inform you of any conflicts between files in
the local client cache and the server. Conflicts can happen when
a client is operating in disconnected mode (which I will discuss
later) and modifies a file that is modified by another client. (I
will also discuss what to do about a conflict later in this article.)
The cmon program is another handy tool, which allows you
to monitor the performance of the Coda servers in your Coda cell.
The cmon tool can only be used when the client running it
can contact the Coda server.
The major tool for controlling the behavior of the Coda file system
is a program called cfs. The cfs tool has many command
options, most of which are never needed in normal operation. A subset
of these options can be handy to keep a check on the status of Coda
and make some operations happen more quickly. The cfs commands
that I use regularly are:
checkservers -- This can be shortened to just cs,
and checks the status of the Coda servers. It will report whether
or not the Coda server is reachable and can also be used to make
the Coda client aware that the server is back after a network outage.
Normally, the Coda client polls the server in disconnected mode
so the client would eventually realize the server is back, but the
checkservers command will make this process happen immediately.
strong -- By default, the communications between the
Coda client and server adapt to the bandwidth available between
them. In some circumstances, this can lead to problems, particularly
if you are feeding a large amount of data into the Coda file system
all at once (e.g., extracting a source tree from a tar archive).
By telling Coda you have a strong connection, the server connection
bandwidth is not monitored, which can make some operations more
reliable. If you can, you should make your Coda connections strong.
This option only needs to be set once. The strong mode is available
as a work around to some problems with adaptive mode (mentioned
below). A later version of Coda may deprecate the strong mode.
adaptive -- This is the opposite of strong,
mentioned previously. The adaptive mode greatly reduces the amount
of network traffic between the server and the client, so it's
useful when there is a low-bandwidth link between the server and
the client. When venus detects a bandwidth below a certain threshold,
it throttles the rate of updates going to the server by trickling
the updates to the server, which prevents the client performance
from being throttled by a slow link to the server.
listacl -- This lists the ACLs that are currently applied
to the given path.
setacl -- This sets or cleans the ACL on a given path.
Check the cfs man page and the Coda administration documentation
for the available ACL modes that can be applied.
listvol -- This can be shortened to lv, and
provides a status of the Coda volume associated with the path passed
as a parameter. It will state whether or not the volume is connected
to the server and also state how many changes are waiting to be
written back to the server if you are running in disconnected mode.
disconnect -- This will disconnect the Coda client
from the server, just as if the network connection between client
and server had been disconnected.
reconnect -- This is the opposite of disconnect and
will reconnect the client to the Coda server.
help -- This prints a list of the available cfs
commands and a brief description of what the command does.
There are many more cfs commands not listed here. Some
of the commands not covered here may result in loss of data from
your Coda cache, which is not something you want to do often. Use
the help command or the man page for cfs to look at
the other available commands.
When a Coda client loses contact with the server (or a cfs
disconnect command is issued on the client), the client will change
into what is called "disconnected mode". In this mode,
instead of the changes to files being written directly to the server,
they are cached on the client machine. Any file that is in the local
client cache is available to the user. Attempting to access a file
that is not in the client cache when the client is in disconnected
mode will fail with an error because the data is, quite simply,
unavailable at that time. Normally, the cache is automatically filled
from the server by simply accessing files. The hoard tool
can be used to tell venus on the client machine that you wish a
certain set of files to be put into the local cache so that, if
the client becomes disconnected from the server, then the set of
files that you wish to work on will be available from the local
cache. If the disconnection of the client from the server is a planned
one (e.g., taking your laptop off the network), then you can run:
hoard walk
which will ensure that all the files that are specified in the hoard
configuration will be placed into the local client's cache. This
will ensure they are available when the client is disconnected from
the server. The hoard tool is not only used for filling the
cache but also for setting up the database that hoard uses
to decide what should be put into the local cache. At its simplest,
you can type:
hoard add /coda/testvol 100:d+
which will tell hoard that all the files and directories under
/coda/testvol, including any new directories created after
the hoard command was run, will be put into the local cache
with a priority of 100. The higher the priority number, the higher
the chance that the given file will be placed into the local cache.
You can more finely control the hoard database by giving various
directories or files differing priorities, depending on what is important
to you. The hoard database is persistent across reboots, so
anything you insert into the hoard database will persist until
you change them using the hoard tool again. You can view the
status of the hoard database by using the list command
like this:
hoard list
which will print out the paths that will be hoarded in the local cache
and their associated priorities.
Apart from the client-server relationship already discussed, Coda
also supports replicating data between servers, which allows you
to provide server failover or a local server for clients that may
be separated from the rest of the network over a slow network connection.
The process for setting up a server-to-server replication requires
a bit of manual editing of the Coda control files. To set up a server-to-server
replication, you need to install the slave server as a non-SCM machine
during the install of Coda. It is then a matter of following the
instructions in the Coda admin manual to perform the steps that
will start the replication of volumes between servers.
If the Coda client detects a file that has been modified on the
server while the client was disconnected, then the client Coda file
system goes into a conflict state. All file system operations are
denied until the conflict is resolved. When the client is in conflict
mode, the object in conflict will change from looking like a file
system to something that looks like a symbolic link. The destination
of this link is the Coda volume and file id of the object in conflict,
and it looks something like:
0 lr--r--r-- 1 root 4294967294 27 Feb 25 22:40 /coda/working \
-> @7f000000.ffffffff.000810dd
To fix the conflict, use the repair tool and start repairing
the object that is in conflict (in this case, /coda/working):
$ repair
repair > beginrepair
Pathname of object in conflict? []: /coda/working
After we are in repair mode, the Coda file system changes into what
is called a mixed view where you can see the server and the local
client versions of the files when you perform an ls on /Coda/working
in another window:
4 drwxr-xr-x 6 blymn 65534 2048 Feb 25 22:42 global
4 drwxr-xr-x 6 blymn 65534 2048 Feb 21 18:04 local
The global directory contains the server version of the object that
is in conflict, and local contains the local version. By looking in
both these directories, you can determine what has changed between
the client and server versions of the object and decide which change
should be kept. The first thing to do is to look at what changes have
been made on the client that have not been integrated onto the server.
To do this, we use the repair command listlocal:
repair > listlocal
local mutations are:
Store /coda/working/sample.c (length = 15047)
In this case, we only have one conflict -- the file sample.c
has been modified both on the client and the server. By checking the
local and global directories, we determine that the server version
needs to be kept and, possibly, the local changes manually merged
back into it. To do this, we first take a copy of the local version
of sample.c from /coda/working/local by issuing a cp
command in another window and then issue the repair command:
repair > discardalllocal
discard all local mutations
which discards the local modifications made on the client in favor
of the server versions of all the files. We could have gone through
the local mutations one by one and decided whether or not to keep
them on an individual basis, or kept all the local mutations thus
overwriting the server versions. In this case, it was simplest just
to dump the local changes. After performing the repair, we quit the
repair tool:
repair > quit
commit the local/global repair session? [Y]: y
which will perform the actions that were decided in the repair session
and end the repair state of the Coda file system. When all the conflicts
are repaired, the Coda file system resumes normal operation and once
again looks like a file system.
When you are replicating data between servers, it is possible
for the data on the servers to get out of step if the network link
between the servers has problems. In this situation, a server-server
conflict occurs, which is slightly different to the server-client
conflict described above. With a server-server conflict things are
complicated somewhat by the client's attempts to update files
on the servers. The full process of resolving a server-server conflict
is beyond the scope of this article, but the general process is
the clients are prevented from writing to the affected volume while
the server-server conflict is resolved using the repair tool.
During the repair, the clients will be able to perform read operations
on the volume and will cache writes locally until the repair is
completed. After the server-server conflict is repaired, the clients
are allowed to write to the repaired volume and any server-client
conflicts can then be resolved as per the previous section.
One of the problems of running a busy file server is finding a
quiet time to perform a backup of the system. To help with this,
Coda has a method that is used on some enterprise class systems,
which allows you to checkpoint the file system at a point in time
and back up that checkpoint thereby getting a consistent snapshot
of the file system. Coda does this by creating a read-only replica
of the file system, which can then be written to tape. During this
period, the clients can still write to the Coda volume without compromising
the backup. The procedure of restoring a Coda volume is a bit different
from what you may be used to. To restore a Coda volume, you create
a new volume into which the Coda file system is restored. Once the
volume is restored, the required files can be copied out of the
read-only restore volume back into the production volume.
Due to the ability of a Coda client to operate disconnected from
the server, there is a higher risk that any work done on the client
will be lost due to the client machine rebooting or crashing. In
normal circumstances, the Coda client can easily cope with the local
machine being restarted without any adverse effect but, in some
cases where the Coda client is unable to start cleanly, there are
some safety nets that can help prevent a total disaster. Every 10
minutes the Coda client daemon (venus) makes a tar snapshot
of the files that have been modified while in disconnected mode
and also copies the modification log file. These files are stored
under /usr/coda/spool. In this directory will be a directory
for each UID that has used the Coda file system while disconnected,
and in that directory there will be a tar file that, as mentioned
before, contains a copy of any modified files and the Coda modification
log file. If you are in the unfortunate situation where Coda is
not working correctly after a system crash or reboot, then it may
be wise to take a quick copy of these files and put them in a safe
place, because you should be able to recover a version of your modified
files (that is no more than 10 minutes old) from the tar
file.
There are many more features that I have not covered here, so
there is still a lot to explore and learn. As long as you treat
Coda with care, it should serve you well. Personally, I use Coda
for the ability of working on my files while disconnected from my
server. Most of my work is done on a laptop that I carry around
with me, and the updates made to my files are automatically propagated
back to my server when I am at home, which makes me feel safer because
I do not have all my data just on a laptop hard drive. This is not
to undersell the dire warnings on the Coda home page -- they
are very true. There have been a couple of times that I have had
near misses with my data and one instance where I did lose a couple
of hours worth of work but, considering the mess I used to get into
trying to remember which way I should be copying data, Coda has
been a real boon to my mobile computing.
[1] More information about the NetBSD project can be found at:
http://www.netbsd.org/
[2] More information on Coda can be found at: http://coda.cs.cmu.edu/
Brett Lymn works for a global company helping to manage a bunch
of UNIX machines. He spends entirely too much time fiddling with
computers but cannot help himself.
|