Simplifying Site-Wide Backups
Benjamin J. Anello
Backing up UNIX workstations is one of the most important
jobs that a
system administrator must do. Besides restoring the
contents of crashed
hard drives, you might be called upon to restore the
files that a user
inadvertently removed from his/her workstation. When
you have many users
on a network of workstations, it is more efficient to
centralize the
backup of those systems. It is also cheaper to have
one tape drive with
multiple tape capability (a jukebox) on a server than
to have multiple
single-tape drives located on individual workstations.
A few years ago, a new customer had a network of about
20 workstations
and a server that were not being backed up. To solve
this problem, we
purchased an 8 mm tape drive jukebox and wrote some
Bourne shell and
nawk scripts to incrementally back up the server and
workstations on a
daily basis. Weekly incrementals occurred once a week,
with full backups
every month. The advantage of using scripts to run the
backups and a
jukebox to handle the tapes was that the network backup
could be done at
night when the users were not on the network.
This network backup scheme uses one master Bourne shell
script that
calls two nawk scripts. I wrote this backup system to
run on Sun
workstations running SunOS 4.1.x.
Overview
This script will backup all filesystems listed in a
control file. It
supports full backups (which I normally perform monthly)
and incremental
backups on a daily and weekly basis.
Normally, the script infers what type of backup to perform
from the date
and the name of the control file. When called to process
the filesystems
in .../server_backup_0, the system will perform a level
0, or total,
backup. When called with .../server_backup, the system
will perform an
incremental backup. Since for most systems the complete
backup and
incremental backups should process the same set of filesystems,
these
names can be links to the same control file.
Incremental backups are relative to the most recent
"weekly" backup (or
monthly if there have been no intervening weeklies).
A "weekly" backup
is simply an incremental backup performed on a Saturday.
These
relationships among the backups are enforced by manipulating
the "dump
level" passed to the underlying dump command. The
script records the
current "weekly" dump level in the file $SCRATCHDIR/.cur_dump_level
and
increments it each time a weekly is performed. Daily
dumps are always
level 8 backups. Thus daily dumps are always relative
to a weekly, since
the weekly dump levels should range from 2 to 6.
Scheduling
I run backups at midnight so that all the backups have
the same date.
The incremental backups usually are completed in 3 hours,
well before
users arrive at work. The backup script is called by
cron, but you must
also allow access to each workstation from the server
workstation by
using .rhost or host.equiv files. The cron entries are
as follows:
1 0 * * 2-6 /server/admin/backups/server_backup > /dev/console 2>$1
#1 0 * * 6 /server/admin/backups/server_backup_0 > /dev/console 2>&1
Comment out the second line for normal backups. Comment
out the first
line and uncomment the second line for the Saturday
that you run the
monthly backups. The server_backup_0 file is just a
symbolic link to the
server_backup file. Some conditional statements depend
on which filename
is running in the Bourne shell script.
Many backup tapes are required for incremental backups.
I have three
sets of 10 tapes that are rotated throughout the month.
The sets are
labelled "A," "B," and "C."
Normally, A1-A5 run the first week, with
A1-A4 being level 8 daily backups. A5 is the first weekly
backup (level
2). A6-A10 run the second week; A6-A9 are level 8 daily
backups, and A10
is the next weekly backup (level 3). I swap out the
"A" set for the "B"
set the next week and begin again. The "C"
set is used for months with 5
weeks and for overruns (if it takes two tapes to get
a weekly done, the
backups will move to the next tape, but this will be
reflected in the
tapelog report.)
Log Files
Two files are created by this backup system. The tapelog
file and the
logfile. The tapelog file lists, in a report format,
what was backed up
to which tape. One of the major points about any backup
system is being
able to find and restore those files that you backup.
The tapelog file
is for this purpose. The tapelog file specifies tape
name, section name,
hostname, partition, filesystem, and size that was backed
up. The tape
name and section name can be used to tell you which
tape you need and
which section to specify (the s flag) in your restore
command. The other
file created by this script is the log file, which lets
the operator see
the dump logs to be sure that the backup was completed
correctly. The
operator should review the log file as part of the daily
maintenance
check of the network.
server_backup _ The Main Script
The main Bourne shell script (Listing 1) calls the various
files listed
in Table 1. This script uses environment variables to
hide the actual
name and path of various UNIXcommands. This practice
makes porting of
the code to other systems much easier. These definitions
appear at the
beginning of the script and are followed by functions
that are called by
the main routine. These functions are listed in Table
2.
The main routine, begins by setting the cleanup function
to trap upon
program exit. The cleanup function will mail log files
to the operator
and print out the tapelog file. The main routine then
creates a log
file, checks that it is executing as root, and tests
to ensure that
another backup process is not running. If these conditions
are met, then
the script creates a lock file (.dumping) to prevent
any other instance
of the backup process from starting.
With the lock set, the script invokes init_juke to initialize
the
jukebox. This device-dependant function checks the jukebox
to be sure
that it is online and available. The script then invokes
the function
get_diskinfo() to determine which disks will be backed
up. This
function reads a file of hosts to be backed up and uses
the df command
to obtain the partitions that are local to each host
(df -t 4.2).
The script then determines what level backup to perform
and builds a
schedule listing the partitions involved and the tapes
they will each be
written to. To construct this schedule, the script must
estimate the
size of each backup image. For a full (level 0) backup,
the script uses
the size reported by the df command. For incremental
backups, the image
size is ascertained by performing a "dummy"
backup to /dev/null. Once
the schedule is built, the script simply loops through
each tape and
partition.
The nawk scripts diskinfo.nawk (Listing 2) and dumpsize.nawk
(Listing 3)
do most of the schedule building. The diskinfo script
builds the
schedule for a level 0 dump. It takes its input from
a df command and
creates a table with entries for each tape, showing
the host, partition
name, filesystem name, and disk-used size. The dumpsize
script builds a
similar table, but for incremental dumps, by processing
the dummy dump
output.
Conclusion
Being able to recover the data from a corrupt or crashed
hard drive or
accidentally deleted files can have a major impact on
the bottom line
for your company, and having a centralized backup system
can make the
maintenance of backup tapes much easier. I hope these
scripts will help
you set up such a centralized system for yourself.
About the Author
Benjamin J. Anello has a B.S. in Computer Graphic Communications
from
California Polytechnic State University in San Luis
Obispo, CA. He
administers both VAX/VMS systems and UNIX workstations,
programs in
FORTRAN, C, and C++ on Macintosh, VAX, and UNIX workstations,
administers INGRES databases, and maintains the company-wide
network. He
currently works for RE/SPEC Inc. in Albuquerque, New
Mexico (yes it IS
in the U.S.). He can be reached via email at ben.anello@respec.com.
|