Article

Simplifying Site-Wide Backups

Benjamin J. Anello

Backing up UNIX workstations is one of the most important jobs that a system administrator must do. Besides restoring the contents of crashed hard drives, you might be called upon to restore the files that a user inadvertently removed from his/her workstation. When you have many users on a network of workstations, it is more efficient to centralize the backup of those systems. It is also cheaper to have one tape drive with multiple tape capability (a jukebox) on a server than to have multiple single-tape drives located on individual workstations.

A few years ago, a new customer had a network of about 20 workstations and a server that were not being backed up. To solve this problem, we purchased an 8 mm tape drive jukebox and wrote some Bourne shell and nawk scripts to incrementally back up the server and workstations on a daily basis. Weekly incrementals occurred once a week, with full backups every month. The advantage of using scripts to run the backups and a jukebox to handle the tapes was that the network backup could be done at night when the users were not on the network.

This network backup scheme uses one master Bourne shell script that calls two nawk scripts. I wrote this backup system to run on Sun workstations running SunOS 4.1.x.

Overview

This script will backup all filesystems listed in a control file. It supports full backups (which I normally perform monthly) and incremental backups on a daily and weekly basis.

Normally, the script infers what type of backup to perform from the date and the name of the control file. When called to process the filesystems in .../server_backup_0, the system will perform a level 0, or total, backup. When called with .../server_backup, the system will perform an incremental backup. Since for most systems the complete backup and incremental backups should process the same set of filesystems, these names can be links to the same control file.

Incremental backups are relative to the most recent "weekly" backup (or monthly if there have been no intervening weeklies). A "weekly" backup is simply an incremental backup performed on a Saturday. These relationships among the backups are enforced by manipulating the "dump level" passed to the underlying dump command. The script records the current "weekly" dump level in the file $SCRATCHDIR/.cur_dump_level and increments it each time a weekly is performed. Daily dumps are always level 8 backups. Thus daily dumps are always relative to a weekly, since the weekly dump levels should range from 2 to 6.

Scheduling

I run backups at midnight so that all the backups have the same date. The incremental backups usually are completed in 3 hours, well before users arrive at work. The backup script is called by cron, but you must also allow access to each workstation from the server workstation by using .rhost or host.equiv files. The cron entries are as follows:

1 0 * * 2-6 /server/admin/backups/server_backup > /dev/console 2>$1
#1 0 * * 6 /server/admin/backups/server_backup_0 > /dev/console 2>&1

Comment out the second line for normal backups. Comment out the first line and uncomment the second line for the Saturday that you run the monthly backups. The server_backup_0 file is just a symbolic link to the server_backup file. Some conditional statements depend on which filename is running in the Bourne shell script.

Many backup tapes are required for incremental backups. I have three sets of 10 tapes that are rotated throughout the month. The sets are labelled "A," "B," and "C." Normally, A1-A5 run the first week, with A1-A4 being level 8 daily backups. A5 is the first weekly backup (level 2). A6-A10 run the second week; A6-A9 are level 8 daily backups, and A10 is the next weekly backup (level 3). I swap out the "A" set for the "B" set the next week and begin again. The "C" set is used for months with 5 weeks and for overruns (if it takes two tapes to get a weekly done, the backups will move to the next tape, but this will be reflected in the tapelog report.)

Log Files

Two files are created by this backup system. The tapelog file and the logfile. The tapelog file lists, in a report format, what was backed up to which tape. One of the major points about any backup system is being able to find and restore those files that you backup. The tapelog file is for this purpose. The tapelog file specifies tape name, section name, hostname, partition, filesystem, and size that was backed up. The tape name and section name can be used to tell you which tape you need and which section to specify (the s flag) in your restore command. The other file created by this script is the log file, which lets the operator see the dump logs to be sure that the backup was completed correctly. The operator should review the log file as part of the daily maintenance check of the network.

server_backup _ The Main Script

The main Bourne shell script (Listing 1) calls the various files listed in Table 1. This script uses environment variables to hide the actual name and path of various UNIXcommands. This practice makes porting of the code to other systems much easier. These definitions appear at the beginning of the script and are followed by functions that are called by the main routine. These functions are listed in Table 2.

The main routine, begins by setting the cleanup function to trap upon program exit. The cleanup function will mail log files to the operator and print out the tapelog file. The main routine then creates a log file, checks that it is executing as root, and tests to ensure that another backup process is not running. If these conditions are met, then the script creates a lock file (.dumping) to prevent any other instance of the backup process from starting.

With the lock set, the script invokes init_juke to initialize the jukebox. This device-dependant function checks the jukebox to be sure that it is online and available. The script then invokes the function get_diskinfo() to determine which disks will be backed up. This function reads a file of hosts to be backed up and uses the df command to obtain the partitions that are local to each host (df -t 4.2).

The script then determines what level backup to perform and builds a schedule listing the partitions involved and the tapes they will each be written to. To construct this schedule, the script must estimate the size of each backup image. For a full (level 0) backup, the script uses the size reported by the df command. For incremental backups, the image size is ascertained by performing a "dummy" backup to /dev/null. Once the schedule is built, the script simply loops through each tape and partition.

The nawk scripts diskinfo.nawk (Listing 2) and dumpsize.nawk (Listing 3) do most of the schedule building. The diskinfo script builds the schedule for a level 0 dump. It takes its input from a df command and creates a table with entries for each tape, showing the host, partition name, filesystem name, and disk-used size. The dumpsize script builds a similar table, but for incremental dumps, by processing the dummy dump output.

Conclusion

Being able to recover the data from a corrupt or crashed hard drive or accidentally deleted files can have a major impact on the bottom line for your company, and having a centralized backup system can make the maintenance of backup tapes much easier. I hope these scripts will help you set up such a centralized system for yourself.

About the Author

Benjamin J. Anello has a B.S. in Computer Graphic Communications from California Polytechnic State University in San Luis Obispo, CA. He administers both VAX/VMS systems and UNIX workstations, programs in FORTRAN, C, and C++ on Macintosh, VAX, and UNIX workstations, administers INGRES databases, and maintains the company-wide network. He currently works for RE/SPEC Inc. in Albuquerque, New Mexico (yes it IS in the U.S.). He can be reached via email at ben.anello@respec.com.