Cover V11, I04

Article
Sidebar 1
Sidebar 2

apr2002.tar

Configuring Amanda

David T. Smith

Amanda is the Advanced Maryland Automatic Network Disk Archiver, developed at the University of Maryland in the 1990s. While it is now maintained at SourceForge and support is provided only through mailing lists and a FAQ-O-MATIC, it is still a highly useful, stable network backup utility with a wide range of features. Amanda is tailored for networks that have a central server with a high-capacity tape drive and multiple backup clients. Although Amanda was built for UNIX systems, it has been extended to provide backup services to Windows clients (via Samba, although a separate project is underway to develop a native Windows client) to allow deployment in heterogeneous environments.

In this article, I will review the installation and configuration process for Amanda and show how to tailor it to some of the potential environments it may be used in. The environment for this implementation consists of a Pentium II 350-MHz system running Slackware 8.0 and an Exabyte 8505 tape drive as the server, and clients including OpenBSD 2.8 and Red Hat Linux 7.1.

Architecture

Amanda is a client-server application where the server pulls backups from individual clients according to specifications defined in a named configuration. Amanda can pull multiple streams from different clients at the same time, writing each stream to a file on a designated "holding disk" on the tape server, copying the files to the tapes one at a time. This provides an efficient use of network resources and limits the backup window requirements for the clients while reducing the complexity of the backup database. Amanda comes with support for two different archiving utilities, gnutar and dump, but support for additional archiving utilities can be added by modifying the client source code. Amanda originally was designed with the use of dump as its primary choice of archiving utility, but because of synchronization issues with running dump on live filesystems (see the sidebar "gnutar vs. dump"), most users of Amanda recommend using gnutar as the archiving software.

Amanda's unique advantage over other backup systems is that it manages a schedule of full and incremental backups according to a strategy that attempts to keep the amount of tape required for an individual run to a minimum while ensuring that restoration can be done with the fewest possible tapes. Where other backup systems, including most commercial systems, require an explicit incremental/differential/full backup, Amanda develops its own schedule and modifies it according to the conditions of the most recent backups. This capability makes attempts to explicitly force Amanda into following a preset schedule difficult and fraught with pitfalls.

Installation

The current production version of Amanda is provided as source code at http://www.amanda.org/download.html and is version 2.4.2p2, released on April 2, 2001. The current development version is hosted on SourceForge at http://sourceforge.net/project/?group_id=120. Before building the Amanda software, it will be necessary to install the prerequisite software, determine the Amanda username and groupname, and name the initial Amanda configuration. If the installation is for a network with some nodes simply running the client software (i.e., without local tape drives), it is useful to provide a separate build process for them. The installation process described below assumes a central server with multiple clients built from the same code base into different object directories.

Prerequisites

The following applications are prerequisites for all or some Amanda implementations. Some of these applications may be installed as part of the system image, and some operating systems include a version of Amanda. If Amanda is bundled with the system, it is important to verify the version level supplied: Amanda 2.4.0 and above is incompatible with pre-2.4 versions.

For command-line processing on all implementations:

readline:
ftp://ftp.gnu.org/pub/gnu/readline/readline-2.2.1.tar.gz
termcap: ftp://ftp.gnu.org/pub/gnu/termcap/termcap-1.3.tar.gz
For the use of gnutar as a backup program:

gnutar 1.13.19: ftp://alpha.gnu.org/pub/gnu/tar
gnutar 1.12: ftp://ftp.gnu.org/pub/gnu/tar/tar-1.12.tar.gz
Patches 1.12: ftp://ftp.gnu.org/pub/gnu/tar/patches/tar-1.12.patch
For backing up Microsoft Windows 95/NT clients using Samba:

SAMBA 2.0.4: ftp://ftp.samba.org/pub/samba/samba-2.0.4b.tar.gz
For use of certain supplied Perl scripts:

PERL 5.005: ftp://ftp.gnu.org/pub/gnu/perl/perl-5.005_01.tar.gz
For a graphical display of dump utilization (amplot):

gawk: ftp://ftp.gnu.org/pub/gnu/gawk/gawk-3.0.3.tar.gz
gnuplot: ftp://ftp.dartmouth.edu/pub/gnuplot
Once the selected prerequisites are installed, it is necessary to decide where Amanda will live, the username and group under which it will run, and the default configuration name. All these options will be saved in a config.site file located in the etc directory under the Amanda prefix.

The default root for Amanda is /usr/local and all the files are distributed under that tree. The root prefix can be changed by specifying the --prefix== switch to "configure".

Although Amanda defaults to using amanda:bin as the username:group combination, configure requires that both parameters be explicitly specified as --with-user=USER and --with-group=GROUP. These entries do not have to exist before configure is run, but are required before the compiled files are installed into their respective directories.

Installation Process

Amanda provides a portable installation process useful for building multiple systems from the same source. If the Amanda source is NFS-mounted and a local obj directory defined, a localized version can be created by running $AMANDASRC/configure from the obj directory. In this case, the additional configure switches, --without-server (for nodes that are not using a local tape drive) and --with-index-server=HOST (to define the host that is writing to tape), are quite useful. The process for a multi-host system with boca as the tape server, canopus and bacchus as the clients, then would be as follows, assuming that the Amanda software is unpacked on boca:/usr/local/src/amanda and the object directories are /usr/local/obj/amanda:

<on boca>

boca:/usr/local/src/amanda# make distclean
boca:/usr/local/src/amanda# cd /usr/local/obj/amanda
boca:/usr/local/obj/amanda# configure --with-user=amanda \
 --with-group=backup
boca:/usr/local/obj/amanda# make
boca:/usr/local/obj/amanda# make install

< then on canopus >

canopus:/# mount boca:/usr/local/src/amanda \
 /usr/local/src/amanda
canopus:/# cd /usr/local/src/amanda
canopus:/usr/local/src/amanda# make distclean
canopus:/usr/local/src/amanda# cd /usr/local/obj/amanda
canopus:/usr/local/obj/amanda# configure --with-user=amanda \
 --with-group=backup --with-index-server=boca --without-server
canopus:/usr/local/obj/amanda# make
canopus:/usr/local/obj/amanda# make install

< then on canopus >

bacchus:/# mount boca:/usr/local/src/amanda \
 /usr/local/src/amanda
bacchus:/# cd /usr/local/src/amanda
bacchus:/usr/local/src/amanda# make distclean
bacchus:/usr/local/src/amanda# cd /usr/local/obj/amanda
bacchus:/usr/local/obj/amanda#  configure --with-user=amanda \
 --with-group=backup --with-index-server=boca --without-server
bacchus:/usr/local/obj/amanda# make
bacchus:/usr/local/obj/amanda# make install
The "make distclean" at the start of each section clears out the interpolated parameters based on the current system so that each build is tailored to the specific hostname and operating system.

Configuration

Amanda runs with multiple configurations, each defined by a directory under $prefix/etc/amanda. The directory is named with the configuration name (e.g., DailySet1), and contains the following files:

amanda.conf -- Editable, contains the configuration parameters

disklist -- Editable, contains the host/disk combinations for backup

tapelist -- Not editable, contains the name, status and last used date for all tapes in the configuration

tapelist.amlabel -- Not editable, contains the original status of the tapes when they were labeled

tapelist.yesterday -- Not editable, contains the name, status, and last used date (as of the previous Amanda run) for all tapes defined in the configuration

The tapelist.* files are managed by amdump and amlabel, and it is not advisable to try to edit them manually. Although they can be edited as text files, the effect of modifying them may not be what is expected. Disklist contains a set of lines with the format:

hostname diskdev dumptype [spindle [interface]]
The diskdev value can be a block device, a mount point, or a specific directory (when working with tar-based dumps). The dumptype is defined in amanda.conf, along with the tape label group, tape device, holding disk, and overall strategy. This directory is only found on the tape server and maintains all the dump clients for that particular configuration. The spindle and interface values, while seldom used, provide a way to balance backups across different disks and network interfaces. Amanda will try to avoid backing up multiple disks on the same spindle in parallel and will try to send backups to a server over a specifically requested interface.

Example configuration files are found in the $AMANDASRC/examples directory and are heavily annotated and documented in the Amanda manpage, but the interaction between different parameters is not as clear as it might be. In this section, therefore, I will highlight some different sections of the amanda.conf file that impact the dump configuration and review them in detail. Most of these have reasonable defaults, or clear meaning in the man page, and I am only going to supplement what is defined for them in the existing documentation.

General Parameters

A common thread through the default configuration file is the use of a descriptive name for the configuration, not only in the directory name where the configuration resides, but also in the org parameter and the labelstr parameter. These parameters can all be distinct, using normal for the configuration and DailySet for the org and the labelstr parameters. The only requirement is that they be unique across all configurations so that there is no confusion when running multiple Amanda commands.

The dumpuser parameter, by default amanda, is the name that owns all configuration, logging, and backup files. It is also the username used to connect to backup clients and would be listed in the user column of the .amandahosts or .rhosts files on client systems. Typical values are amanda or operator. Amanda should not be run with root privileges for backups, even though only root can initiate recoveries. However, the dumpgroup parameter should have read access to the block devices for the backup disks or to any directories being backed up. Typical values are bin or adm.

Dump Cycle Parameters

The parameters in this category interact to define the number of tapes required for each cycle and the number of tapes in the overall pool. The dumpcycle, runspercycle, and tapecycle parameters interact so that each filesystem is given a full dump at least once per dumpcycle and that the number of tapes in tapecycle must be greater than runspercycle * runtapes (the number of tapes per run). Since runtapes defaults to 1 and is not used unless a tape changer is defined, the usual value of tapecycle will only depend on runspercycle.

Additionally, the bumpdays parameter interacts with the dumpcycle parameter to determine how many levels of incremental dumps are taken between full dumps; in order to get behavior similar to a differential backup pattern, a long dumpcycle can be combined with a long bumpdays parameter. For an incremental backup strategy, the bumpdays, bumpsize, and bumpmult parameters can be kept small to encourage automatic bumping to higher backup levels. See the sidebar on backup levels for further information on how the different backup levels interact.

Tape Device and Changer Parameters

The tape device can be defined as an actual device or as a program running a tape changer. This latter function is still experimental with Amanda and support for multiple tapes is not completely available. Since Amanda takes the backups as individual files and copies them to a tape, it does not yet have a facility to restart a tape copy in the middle of a file, but rather will start the copy on the new tape drive at the beginning of the unsuccessful file.

Amanda uses a non-rewindable tape devices, for example, /dev/nst0. If a tape changer is used, the tapedev parameter is not used and replaced with the tpchanger, changerfile, and changerdev parameters; the latter two values are provided as parameters to the script defined in tpchanger. The default script, chg-manual, can only be run manually as it queries on stdout for confirmation when a tape is changed and therefore will hang if run in a batch job when a tape needs changing.

There are alternate scripts that are more useful for jukebox tape changes with multiple drives, automatically switching from one drive to another. However, if multiple tape backups are needed on a system with a single drive, the only supplied solution is to use a script that periodically checks the tape changer to see if a new tape has been inserted. Additional scripts may be available by contacting the amanda-users mailing list (see References).

Tapetypes are defined within the amanda.conf file in a series of definitions grouped together as follows:

define tapetype <name> {
           tapetype-option tapetype-value
           ...
           }
Most of the options specified here define the tape device, but this is where the length option would be placed so that Amanda can plan the size of a backup destined for a particular drive. The actual writing to tape ignores the length option so it is only used as an estimate. Another option, lbl-templ, is used to generate printable labels for each tape from a postscript template file and the amreport program.

Although most common tape drives are located here, it is always possible to come up with a system that does not have a supported tapetype. In that case, an Amanda utility named tapetype can provide the necessary parameters. It is not normally installed, but is located in the $AMANDASRC/tape-src directory and can be compiled and installed if necessary.

Network Parameters

The network parameters include the timeouts, netusage, inparallel, and maxdumps values. While the timeout parameters (etimeout, ctimeout, and dtimeout) can remain at the default values for most networks, it is quite useful to increase the netusage parameter, which is by default, 300 Kbps. If Amanda is running during periods of quiescent network utilization and the network is 100 Mbps, there is no reason not to give it 40 or 50 Mbps. This bandwidth will help reduce the time to run backups and, coupled with large inparallel (the default of 10 should be sufficient for most purposes) and maxdumps (the default of 1 should be increased), can quickly bring many filesystems over to the holding disks where they will queue up for access to one or more tape drives.

Logging and Reporting parameters

The logging and reporting information is stored by default in /usr/adm/amanda on the server. The directories underneath this tree include the datafiles that are used to identify the tapes needed to restore files and directories and to maintain logs of old backups. These files are used by amplot, amreport, and amrecover/restore.

Holding Disk

The holding disks are used by Amanda to place backups while waiting for tape drives to be available. It is not necessary to use a holding disk; Amanda will back up directly to tape if insufficient space is available. However, large holding disks will improve backup performance significantly and enable a single tape server to manage a larger number of clients.

The holding disks should not themselves be backed up, and there are special parameters to specify in the dump strategy parameters to identify holding disks during filesystem backups. However, it is probably best to dedicate a partition as a holding disk and thus be able to keep it separate from the active backup disks.

Dumptype Strategies

The dumptype section defines many different backup strategies, which can be used by the different clients of a single Amanda server. These strategies inherit parameters specified by other enclosed strategies so that a particular dumptype can succinctly incorporate strategic elements from other higher-level strategies.

The default global dumptype does not include the index parameter, and it must be added to enable interactive recoveries. The more specific dumptypes include specifications as to:

  • Program (dump or gnutar)
  • Compression (none, client, or server)
  • Files to exclude (gnutar only)
  • Backup strategy (full only, differential, or increment only)
  • Special holding disk treatment (for backing filesystems with holding disks directly to tape)

After changing index to "yes" in the global dumptype group, it is probably most useful to pick the appropriate dumptype for the clients participating in Amanda. It should be noted that dumptypes typically differentiate between root and user partitions: the root partitions are backed up with a low priority while the user and active partitions are backed up with a high priority. This is based on the assumption that a root partition will not change frequently. If a client system uses a single large partition (such as the default with HP-UX), that disk should use a user dumptype, not a root dumptype, even though it is a root partition.

Backup Operations

Preparation

Now that the configuration file has been defined, it is time to create a pool of labeled tapes for use by that configuration. Tapes are labeled with the command: amlabel <configuration> <label>. This command should be performed by the Amanda user, and the label must conform to the regular expression defined by the configuration parameter labelstr. All the tapes required by the tapecycle parameter should be labeled at once so they are available during the testing phase. Amlabel records each labeled tape in tapelist so that the other Amanda programs can identify it.

The client systems need to have an Amanda user created on them, with a .amandahosts file containing the server host and username, separated by whitespace. The parser for .amandahosts is very strict, so it is important that both the server and username be specified. On the client systems, the program amandad needs to be added to /etc/inetd.conf and the Amanda ports added to /etc/services. These additions are not performed by the standard installation process, so they must be performed manually.

Testing

Each configuration can be checked for access and environment by running the amcheck command. While this command should be used to verify that the network is ready to run amdump, it can also be used to verify that client<->server connections are functioning before starting the application.

Once amcheck runs correctly, amdump can be run with one parameter to identify the configuration that is being dumped. The dump goes into background and can be monitored by the amreport program (which must be run by the Amanda user). Once a dump is complete, the report is available from the logging directory tree.

Scheduling

Once the individual dumps work correctly, it will be time to automate the dump process. This can simply be done by adding amdump to the Amanda user's crontab file on the tape server. It is then only necessary to monitor the dump reports automatically emailed to the user of record and ensure that the correct tape is in the tape drive for the next backup.

The amcheck program will review a specific configuration and ensure that the correct tape is loaded and that the hosts for the backup are accessible. It is therefore a good choice to run as a test before running amdump. Another program useful to test the dump environment is amverify. amverify ensures that the tape in the drive is readable and that there are no tape errors. Both of these programs can be run as part of the backup process in crontab:

'amcheck normal && amdump normal && amverify normal'
Each will mail a separate report to the Amanda username (forwarded if necessary) so that problems and dump status can be easily reviewed by systems administrators.

Restore Operations

All backup systems must be periodically tested for restoring; while backups are started from the tape server and pulled from the clients, restores are started by the clients and pulled from the tape server. Therefore, it requires access to both client and server systems to restore files from multiple tapes. This can easily be done with network terminal applications allowing access to client applications while physically close to the server. The amrecover command provides an interactive session where, by specifying a date, a host, and a disk, you can walk through the available files for restoration and, once a file is selected, identify the tapes from which to retrieve it. It must be run by root as it can access any file from the system, and the .amandahosts file on the server must have an entry for root from the client node:

.amandahosts in $AMANDAHOME on tape server boca.smithnet:
                      bacchus.smithnet  root # Amanda client
The backup configuration is named "normal". Then, on bacchus.smithnet:

[root@bacchus /tmp/bacchus]# amrecover normal -s boca.smithnet
AMRECOVER Version 2.4.2p2. Contacting server on boca.smithnet ...
220 boca AMANDA index server (2.4.2p2) ready.
200 Access OK
Setting restore date to today (2001-12-18)
200 Working date set to 2001-12-18.
200 Config set to normal.
200 Dump host set to bacchus.smithnet
amrecover> setdisk /
200 Disk set to /
amrecover> history
200- Dump history for config "normal" host "bacchus.smithnet" disk "/"
201- 2001-12-15 0 DailySet1-2 6
201- 2001-12-14 1 DailySet1-1 3
201- 2001-12-13 0 DailySet1-5 6
201- 2001-12-12 1 DailySet1-6 4
201- 2001-12-11 1 DailySet1-4 5
201- 2001-12-08 0 DailySet1-3 6
200 Dump history for config "normal" host "bacchus.smithnet" disk "/"
amrecover> cd /etc
/etc
amrecover> add inetd.conf # default date is most recent dump
Added /etc/inetd.conf
amrecover>setdate 2001-12-14 # Change to get an old date
200 Working date set to 2001-12-14.
amrecover> add dumpdates # this file is different on 12/14 and 12/15
Added /etc/dumpdates
amrecover> list         # Get the extraction list
TAPE DailySet1-1 LEVEL 1 DATE 2001-12-14 # Note that we need two tapes
     /etc/dumpdates
TAPE DailySet1-2 LEVEL 0 DATE 2001-12-15
     /etc/inetd.conf

amrecover> settape boca.smithnet:/dev/nst0
Using tape /dev/nst0 from server boca.smithnet.
amrecover> extract

Extracting files using tape drive /dev/nst0 on host boca.smithnet.
The following tapes are needed: DailySet1-1
                          DailySet1-2.

Restoring files into directory /tmp/bacchus
Continue? [Y/n]: y

Load tape DailySet1-1 now
Continue? [Y/n]: y
set owner/mode for '.'? [yn] n  
Load tape DailySet1-2 now
Continue? [Y/n]: y
restore: ./etc: File exists
set owner/mode for '.'? [yn] n
amrecover>
This restores the two selected files under the local root (/tmp/bacchus). The files and the directory etc, created under the local root, will have the original ownerships, timestamps, and masks.

A good practice to verify backups is to periodically select a partition or large binary file and compare the restored copy with the original. Since amverify only checks that the files on tape are internally correct, this is the only way to ensure that the entire process is functioning and that the stored files are recoverable.

Conclusion

Amanda is a highly useful backup system with some advantages over commercial systems, and has the flexibility to manage backups in a large variety of situations. The holding disks, if large enough, can also sustain a series of backups where new tapes might not be available, a feature not available under most commercial systems. While commercial systems stream multiple backups onto tapes and can provide efficient network-to-tape processing, Amanda keeps the downtime required for backups to a minimum, and with proper tapetype configuration, can ensure that streaming drives are used to their capacity.

The mailing lists that form the primary support vehicle for Amanda are informative with a low signal-to-noise ratio, and the FAQ-O-MATIC provides answers to a great many of the standard questions. However, Amanda could be improved by providing native ports to other operating systems. While work is going on with the port to NT, it could be useful to provide support for other systems required by a specific installation.

The extensive use of configuration files, while not unfamiliar to UNIX systems administrators, might be considered a drawback for users familiar with the graphical interfaces used by commercial backup products. This too could be an arena for an interested individual to extend the basic product.

References

Amanda's basic Web site is http://www.amanda.org and it includes pointers to the current production version for download, the SourceForge project for the latest development version, the mailing lists and other resources.

The lists include amanda-announce for general broadcast announcements, amanda-users for user questions (announcements are mirrored to this list so only one subscription is needed), and amanda-hackers for questions about development and new projects. These lists can be subscribed to by sending mail to <listname>-request@amanda.org with a body containing the line: "subscribe <your e-mail address>". As usual, one should never send subscription or administrative requests to the list itself.

Amanda is additionally documented in the book UNIX Backup and Recovery by W. Curtis Preston. W. Curtis Preston has kindly placed the entire Amanda chapter (written by John R. Jackson) online at: http://www.backupcentral.com/amanda.html.

David Smith has been programming for more than 30 years and has worked as a consultant for the last 10. He has designed access control systems and data transfer protocols for several applications on multiple platforms. He is currently an independent consultant and can be contacted at: David.Smith@acm.org.