Configuring
Amanda
David T. Smith
Amanda is the Advanced Maryland Automatic Network Disk Archiver,
developed at the University of Maryland in the 1990s. While it is
now maintained at SourceForge and support is provided only through
mailing lists and a FAQ-O-MATIC, it is still a highly useful, stable
network backup utility with a wide range of features. Amanda is
tailored for networks that have a central server with a high-capacity
tape drive and multiple backup clients. Although Amanda was built
for UNIX systems, it has been extended to provide backup services
to Windows clients (via Samba, although a separate project is underway
to develop a native Windows client) to allow deployment in heterogeneous
environments.
In this article, I will review the installation and configuration
process for Amanda and show how to tailor it to some of the potential
environments it may be used in. The environment for this implementation
consists of a Pentium II 350-MHz system running Slackware 8.0 and
an Exabyte 8505 tape drive as the server, and clients including
OpenBSD 2.8 and Red Hat Linux 7.1.
Architecture
Amanda is a client-server application where the server pulls backups
from individual clients according to specifications defined in a
named configuration. Amanda can pull multiple streams from different
clients at the same time, writing each stream to a file on a designated
"holding disk" on the tape server, copying the files to
the tapes one at a time. This provides an efficient use of network
resources and limits the backup window requirements for the clients
while reducing the complexity of the backup database. Amanda comes
with support for two different archiving utilities, gnutar
and dump, but support for additional archiving utilities
can be added by modifying the client source code. Amanda originally
was designed with the use of dump as its primary choice of
archiving utility, but because of synchronization issues with running
dump on live filesystems (see the sidebar "gnutar vs.
dump"), most users of Amanda recommend using gnutar
as the archiving software.
Amanda's unique advantage over other backup systems is that
it manages a schedule of full and incremental backups according
to a strategy that attempts to keep the amount of tape required
for an individual run to a minimum while ensuring that restoration
can be done with the fewest possible tapes. Where other backup systems,
including most commercial systems, require an explicit incremental/differential/full
backup, Amanda develops its own schedule and modifies it according
to the conditions of the most recent backups. This capability makes
attempts to explicitly force Amanda into following a preset schedule
difficult and fraught with pitfalls.
Installation
The current production version of Amanda is provided as source
code at http://www.amanda.org/download.html and is version
2.4.2p2, released on April 2, 2001. The current development version
is hosted on SourceForge at http://sourceforge.net/project/?group_id=120.
Before building the Amanda software, it will be necessary to install
the prerequisite software, determine the Amanda username and groupname,
and name the initial Amanda configuration. If the installation is
for a network with some nodes simply running the client software
(i.e., without local tape drives), it is useful to provide a separate
build process for them. The installation process described below
assumes a central server with multiple clients built from the same
code base into different object directories.
Prerequisites
The following applications are prerequisites for all or some Amanda
implementations. Some of these applications may be installed as
part of the system image, and some operating systems include a version
of Amanda. If Amanda is bundled with the system, it is important
to verify the version level supplied: Amanda 2.4.0 and above is
incompatible with pre-2.4 versions.
For command-line processing on all implementations:
readline:
ftp://ftp.gnu.org/pub/gnu/readline/readline-2.2.1.tar.gz
termcap: ftp://ftp.gnu.org/pub/gnu/termcap/termcap-1.3.tar.gz
For the use of gnutar as a backup program:
gnutar 1.13.19: ftp://alpha.gnu.org/pub/gnu/tar
gnutar 1.12: ftp://ftp.gnu.org/pub/gnu/tar/tar-1.12.tar.gz
Patches 1.12: ftp://ftp.gnu.org/pub/gnu/tar/patches/tar-1.12.patch
For backing up Microsoft Windows 95/NT clients using Samba:
SAMBA 2.0.4: ftp://ftp.samba.org/pub/samba/samba-2.0.4b.tar.gz
For use of certain supplied Perl scripts:
PERL 5.005: ftp://ftp.gnu.org/pub/gnu/perl/perl-5.005_01.tar.gz
For a graphical display of dump utilization (amplot):
gawk: ftp://ftp.gnu.org/pub/gnu/gawk/gawk-3.0.3.tar.gz
gnuplot: ftp://ftp.dartmouth.edu/pub/gnuplot
Once the selected prerequisites are installed, it is necessary to
decide where Amanda will live, the username and group under which
it will run, and the default configuration name. All these options
will be saved in a config.site file located in the etc
directory under the Amanda prefix.
The default root for Amanda is /usr/local and all the files
are distributed under that tree. The root prefix can be changed
by specifying the --prefix== switch to "configure".
Although Amanda defaults to using amanda:bin as the username:group
combination, configure requires that both parameters be explicitly
specified as --with-user=USER and --with-group=GROUP.
These entries do not have to exist before configure is run,
but are required before the compiled files are installed into their
respective directories.
Installation Process
Amanda provides a portable installation process useful for building
multiple systems from the same source. If the Amanda source is NFS-mounted
and a local obj directory defined, a localized version can
be created by running $AMANDASRC/configure from the obj
directory. In this case, the additional configure switches, --without-server
(for nodes that are not using a local tape drive) and --with-index-server=HOST
(to define the host that is writing to tape), are quite useful.
The process for a multi-host system with boca as the tape
server, canopus and bacchus as the clients, then would
be as follows, assuming that the Amanda software is unpacked on
boca:/usr/local/src/amanda and the object directories are
/usr/local/obj/amanda:
<on boca>
boca:/usr/local/src/amanda# make distclean
boca:/usr/local/src/amanda# cd /usr/local/obj/amanda
boca:/usr/local/obj/amanda# configure --with-user=amanda \
--with-group=backup
boca:/usr/local/obj/amanda# make
boca:/usr/local/obj/amanda# make install
< then on canopus >
canopus:/# mount boca:/usr/local/src/amanda \
/usr/local/src/amanda
canopus:/# cd /usr/local/src/amanda
canopus:/usr/local/src/amanda# make distclean
canopus:/usr/local/src/amanda# cd /usr/local/obj/amanda
canopus:/usr/local/obj/amanda# configure --with-user=amanda \
--with-group=backup --with-index-server=boca --without-server
canopus:/usr/local/obj/amanda# make
canopus:/usr/local/obj/amanda# make install
< then on canopus >
bacchus:/# mount boca:/usr/local/src/amanda \
/usr/local/src/amanda
bacchus:/# cd /usr/local/src/amanda
bacchus:/usr/local/src/amanda# make distclean
bacchus:/usr/local/src/amanda# cd /usr/local/obj/amanda
bacchus:/usr/local/obj/amanda# configure --with-user=amanda \
--with-group=backup --with-index-server=boca --without-server
bacchus:/usr/local/obj/amanda# make
bacchus:/usr/local/obj/amanda# make install
The "make distclean" at the start of each section clears
out the interpolated parameters based on the current system so that
each build is tailored to the specific hostname and operating system.
Configuration
Amanda runs with multiple configurations, each defined by a directory
under $prefix/etc/amanda. The directory is named with the
configuration name (e.g., DailySet1), and contains the following
files:
amanda.conf -- Editable, contains the configuration
parameters
disklist -- Editable, contains the host/disk combinations
for backup
tapelist -- Not editable, contains the name, status
and last used date for all tapes in the configuration
tapelist.amlabel -- Not editable, contains the original
status of the tapes when they were labeled
tapelist.yesterday -- Not editable, contains the name,
status, and last used date (as of the previous Amanda run) for all
tapes defined in the configuration
The tapelist.* files are managed by amdump and amlabel,
and it is not advisable to try to edit them manually. Although they
can be edited as text files, the effect of modifying them may not
be what is expected. Disklist contains a set of lines with the format:
hostname diskdev dumptype [spindle [interface]]
The diskdev value can be a block device, a mount point, or
a specific directory (when working with tar-based dumps). The dumptype
is defined in amanda.conf, along with the tape label group,
tape device, holding disk, and overall strategy. This directory is
only found on the tape server and maintains all the dump clients for
that particular configuration. The spindle and interface values, while
seldom used, provide a way to balance backups across different disks
and network interfaces. Amanda will try to avoid backing up multiple
disks on the same spindle in parallel and will try to send backups
to a server over a specifically requested interface.
Example configuration files are found in the $AMANDASRC/examples
directory and are heavily annotated and documented in the Amanda
manpage, but the interaction between different parameters is not
as clear as it might be. In this section, therefore, I will highlight
some different sections of the amanda.conf file that impact
the dump configuration and review them in detail. Most of these
have reasonable defaults, or clear meaning in the man page, and
I am only going to supplement what is defined for them in the existing
documentation.
General Parameters
A common thread through the default configuration file is the
use of a descriptive name for the configuration, not only in the
directory name where the configuration resides, but also in the
org parameter and the labelstr parameter. These parameters
can all be distinct, using normal for the configuration and
DailySet for the org and the labelstr parameters.
The only requirement is that they be unique across all configurations
so that there is no confusion when running multiple Amanda commands.
The dumpuser parameter, by default amanda, is the
name that owns all configuration, logging, and backup files. It
is also the username used to connect to backup clients and would
be listed in the user column of the .amandahosts or .rhosts
files on client systems. Typical values are amanda or operator.
Amanda should not be run with root privileges for backups, even
though only root can initiate recoveries. However, the dumpgroup
parameter should have read access to the block devices for the backup
disks or to any directories being backed up. Typical values are
bin or adm.
Dump Cycle Parameters
The parameters in this category interact to define the number
of tapes required for each cycle and the number of tapes in the
overall pool. The dumpcycle, runspercycle, and tapecycle
parameters interact so that each filesystem is given a full dump
at least once per dumpcycle and that the number of tapes
in tapecycle must be greater than runspercycle * runtapes
(the number of tapes per run). Since runtapes defaults to
1 and is not used unless a tape changer is defined, the usual value
of tapecycle will only depend on runspercycle.
Additionally, the bumpdays parameter interacts with the
dumpcycle parameter to determine how many levels of incremental
dumps are taken between full dumps; in order to get behavior similar
to a differential backup pattern, a long dumpcycle can be
combined with a long bumpdays parameter. For an incremental
backup strategy, the bumpdays, bumpsize, and bumpmult
parameters can be kept small to encourage automatic bumping to higher
backup levels. See the sidebar on backup levels for further information
on how the different backup levels interact.
Tape Device and Changer Parameters
The tape device can be defined as an actual device or as a program
running a tape changer. This latter function is still experimental
with Amanda and support for multiple tapes is not completely available.
Since Amanda takes the backups as individual files and copies them
to a tape, it does not yet have a facility to restart a tape copy
in the middle of a file, but rather will start the copy on the new
tape drive at the beginning of the unsuccessful file.
Amanda uses a non-rewindable tape devices, for example, /dev/nst0.
If a tape changer is used, the tapedev parameter is not used
and replaced with the tpchanger, changerfile, and
changerdev parameters; the latter two values are provided
as parameters to the script defined in tpchanger. The default
script, chg-manual, can only be run manually as it queries
on stdout for confirmation when a tape is changed and therefore
will hang if run in a batch job when a tape needs changing.
There are alternate scripts that are more useful for jukebox tape
changes with multiple drives, automatically switching from one drive
to another. However, if multiple tape backups are needed on a system
with a single drive, the only supplied solution is to use a script
that periodically checks the tape changer to see if a new tape has
been inserted. Additional scripts may be available by contacting
the amanda-users mailing list (see References).
Tapetypes are defined within the amanda.conf file in a
series of definitions grouped together as follows:
define tapetype <name> {
tapetype-option tapetype-value
...
}
Most of the options specified here define the tape device, but this
is where the length option would be placed so that Amanda can
plan the size of a backup destined for a particular drive. The actual
writing to tape ignores the length option so it is only used as an
estimate. Another option, lbl-templ, is used to generate printable
labels for each tape from a postscript template file and the amreport
program.
Although most common tape drives are located here, it is always
possible to come up with a system that does not have a supported
tapetype. In that case, an Amanda utility named tapetype
can provide the necessary parameters. It is not normally installed,
but is located in the $AMANDASRC/tape-src directory and can
be compiled and installed if necessary.
Network Parameters
The network parameters include the timeouts, netusage,
inparallel, and maxdumps values. While the timeout
parameters (etimeout, ctimeout, and dtimeout)
can remain at the default values for most networks, it is quite
useful to increase the netusage parameter, which is by default,
300 Kbps. If Amanda is running during periods of quiescent network
utilization and the network is 100 Mbps, there is no reason not
to give it 40 or 50 Mbps. This bandwidth will help reduce the time
to run backups and, coupled with large inparallel (the default
of 10 should be sufficient for most purposes) and maxdumps
(the default of 1 should be increased), can quickly bring many filesystems
over to the holding disks where they will queue up for access to
one or more tape drives.
Logging and Reporting parameters
The logging and reporting information is stored by default in
/usr/adm/amanda on the server. The directories underneath
this tree include the datafiles that are used to identify the tapes
needed to restore files and directories and to maintain logs of
old backups. These files are used by amplot, amreport,
and amrecover/restore.
Holding Disk
The holding disks are used by Amanda to place backups while waiting
for tape drives to be available. It is not necessary to use a holding
disk; Amanda will back up directly to tape if insufficient space
is available. However, large holding disks will improve backup performance
significantly and enable a single tape server to manage a larger
number of clients.
The holding disks should not themselves be backed up, and there
are special parameters to specify in the dump strategy parameters
to identify holding disks during filesystem backups. However, it
is probably best to dedicate a partition as a holding disk and thus
be able to keep it separate from the active backup disks.
Dumptype Strategies
The dumptype section defines many different backup strategies,
which can be used by the different clients of a single Amanda server.
These strategies inherit parameters specified by other enclosed
strategies so that a particular dumptype can succinctly incorporate
strategic elements from other higher-level strategies.
The default global dumptype does not include the index parameter,
and it must be added to enable interactive recoveries. The more
specific dumptypes include specifications as to:
- Program (dump or gnutar)
- Compression (none, client, or server)
- Files to exclude (gnutar only)
- Backup strategy (full only, differential, or increment only)
- Special holding disk treatment (for backing filesystems with
holding disks directly to tape)
After changing index to "yes" in the global dumptype
group, it is probably most useful to pick the appropriate dumptype
for the clients participating in Amanda. It should be noted that
dumptypes typically differentiate between root and user partitions:
the root partitions are backed up with a low priority while the
user and active partitions are backed up with a high priority. This
is based on the assumption that a root partition will not change
frequently. If a client system uses a single large partition (such
as the default with HP-UX), that disk should use a user dumptype,
not a root dumptype, even though it is a root partition.
Backup Operations
Preparation
Now that the configuration file has been defined, it is time to
create a pool of labeled tapes for use by that configuration. Tapes
are labeled with the command: amlabel <configuration> <label>.
This command should be performed by the Amanda user, and the label
must conform to the regular expression defined by the configuration
parameter labelstr. All the tapes required by the tapecycle
parameter should be labeled at once so they are available during
the testing phase. Amlabel records each labeled tape in tapelist
so that the other Amanda programs can identify it.
The client systems need to have an Amanda user created on them,
with a .amandahosts file containing the server host and username,
separated by whitespace. The parser for .amandahosts is very
strict, so it is important that both the server and username be
specified. On the client systems, the program amandad needs
to be added to /etc/inetd.conf and the Amanda ports added
to /etc/services. These additions are not performed by the
standard installation process, so they must be performed manually.
Testing
Each configuration can be checked for access and environment by
running the amcheck command. While this command should be
used to verify that the network is ready to run amdump, it
can also be used to verify that client<->server connections
are functioning before starting the application.
Once amcheck runs correctly, amdump can be run with
one parameter to identify the configuration that is being dumped.
The dump goes into background and can be monitored by the amreport
program (which must be run by the Amanda user). Once a dump is complete,
the report is available from the logging directory tree.
Scheduling
Once the individual dumps work correctly, it will be time to automate
the dump process. This can simply be done by adding amdump
to the Amanda user's crontab file on the tape server. It is
then only necessary to monitor the dump reports automatically emailed
to the user of record and ensure that the correct tape is in the
tape drive for the next backup.
The amcheck program will review a specific configuration
and ensure that the correct tape is loaded and that the hosts for
the backup are accessible. It is therefore a good choice to run
as a test before running amdump. Another program useful to
test the dump environment is amverify. amverify ensures
that the tape in the drive is readable and that there are no tape
errors. Both of these programs can be run as part of the backup
process in crontab:
'amcheck normal && amdump normal && amverify normal'
Each will mail a separate report to the Amanda username (forwarded
if necessary) so that problems and dump status can be easily reviewed
by systems administrators.
Restore Operations
All backup systems must be periodically tested for restoring;
while backups are started from the tape server and pulled from the
clients, restores are started by the clients and pulled from the
tape server. Therefore, it requires access to both client and server
systems to restore files from multiple tapes. This can easily be
done with network terminal applications allowing access to client
applications while physically close to the server. The amrecover
command provides an interactive session where, by specifying a date,
a host, and a disk, you can walk through the available files for
restoration and, once a file is selected, identify the tapes from
which to retrieve it. It must be run by root as it can access any
file from the system, and the .amandahosts file on the server
must have an entry for root from the client node:
.amandahosts in $AMANDAHOME on tape server boca.smithnet:
bacchus.smithnet root # Amanda client
The backup configuration is named "normal". Then, on bacchus.smithnet:
[root@bacchus /tmp/bacchus]# amrecover normal -s boca.smithnet
AMRECOVER Version 2.4.2p2. Contacting server on boca.smithnet ...
220 boca AMANDA index server (2.4.2p2) ready.
200 Access OK
Setting restore date to today (2001-12-18)
200 Working date set to 2001-12-18.
200 Config set to normal.
200 Dump host set to bacchus.smithnet
amrecover> setdisk /
200 Disk set to /
amrecover> history
200- Dump history for config "normal" host "bacchus.smithnet" disk "/"
201- 2001-12-15 0 DailySet1-2 6
201- 2001-12-14 1 DailySet1-1 3
201- 2001-12-13 0 DailySet1-5 6
201- 2001-12-12 1 DailySet1-6 4
201- 2001-12-11 1 DailySet1-4 5
201- 2001-12-08 0 DailySet1-3 6
200 Dump history for config "normal" host "bacchus.smithnet" disk "/"
amrecover> cd /etc
/etc
amrecover> add inetd.conf # default date is most recent dump
Added /etc/inetd.conf
amrecover>setdate 2001-12-14 # Change to get an old date
200 Working date set to 2001-12-14.
amrecover> add dumpdates # this file is different on 12/14 and 12/15
Added /etc/dumpdates
amrecover> list # Get the extraction list
TAPE DailySet1-1 LEVEL 1 DATE 2001-12-14 # Note that we need two tapes
/etc/dumpdates
TAPE DailySet1-2 LEVEL 0 DATE 2001-12-15
/etc/inetd.conf
amrecover> settape boca.smithnet:/dev/nst0
Using tape /dev/nst0 from server boca.smithnet.
amrecover> extract
Extracting files using tape drive /dev/nst0 on host boca.smithnet.
The following tapes are needed: DailySet1-1
DailySet1-2.
Restoring files into directory /tmp/bacchus
Continue? [Y/n]: y
Load tape DailySet1-1 now
Continue? [Y/n]: y
set owner/mode for '.'? [yn] n
Load tape DailySet1-2 now
Continue? [Y/n]: y
restore: ./etc: File exists
set owner/mode for '.'? [yn] n
amrecover>
This restores the two selected files under the local root (/tmp/bacchus).
The files and the directory etc, created under the local root,
will have the original ownerships, timestamps, and masks.
A good practice to verify backups is to periodically select a
partition or large binary file and compare the restored copy with
the original. Since amverify only checks that the files on
tape are internally correct, this is the only way to ensure that
the entire process is functioning and that the stored files are
recoverable.
Conclusion
Amanda is a highly useful backup system with some advantages over
commercial systems, and has the flexibility to manage backups in
a large variety of situations. The holding disks, if large enough,
can also sustain a series of backups where new tapes might not be
available, a feature not available under most commercial systems.
While commercial systems stream multiple backups onto tapes and
can provide efficient network-to-tape processing, Amanda keeps the
downtime required for backups to a minimum, and with proper tapetype
configuration, can ensure that streaming drives are used to their
capacity.
The mailing lists that form the primary support vehicle for Amanda
are informative with a low signal-to-noise ratio, and the FAQ-O-MATIC
provides answers to a great many of the standard questions. However,
Amanda could be improved by providing native ports to other operating
systems. While work is going on with the port to NT, it could be
useful to provide support for other systems required by a specific
installation.
The extensive use of configuration files, while not unfamiliar
to UNIX systems administrators, might be considered a drawback for
users familiar with the graphical interfaces used by commercial
backup products. This too could be an arena for an interested individual
to extend the basic product.
References
Amanda's basic Web site is http://www.amanda.org and
it includes pointers to the current production version for download,
the SourceForge project for the latest development version, the
mailing lists and other resources.
The lists include amanda-announce for general broadcast announcements,
amanda-users for user questions (announcements are mirrored to this
list so only one subscription is needed), and amanda-hackers for
questions about development and new projects. These lists can be
subscribed to by sending mail to <listname>-request@amanda.org
with a body containing the line: "subscribe <your e-mail
address>". As usual, one should never send subscription
or administrative requests to the list itself.
Amanda is additionally documented in the book UNIX Backup and
Recovery by W. Curtis Preston. W. Curtis Preston has kindly
placed the entire Amanda chapter (written by John R. Jackson) online
at: http://www.backupcentral.com/amanda.html.
David Smith has been programming for more than 30 years and
has worked as a consultant for the last 10. He has designed access
control systems and data transfer protocols for several applications
on multiple platforms. He is currently an independent consultant
and can be contacted at: David.Smith@acm.org.
|