Cover V09, I04
Article
Figure 1
Figure 2

apr2000.tar


Automating Single-User Backups with Tape Verification

Alan C. Davis

The benefits of a complete, verified system and application backup have been acknowledged by systems administrators for many years. This task is one of the fundamental jobs that every sysadmin learns and is often one of the most important and neglected of any item in the sysadmin's job description. In almost four years working on the UNIX support hotline for a major UNIX OS and hardware manufacturer too many times the answer to “Do you have a current backup?” was an abashed “No”. This article will describe one method of performing unattended backups while the system is in single-user mode and includes a procedure for verifying the readability of the backup. The scripts described use only commands found in the base operating system distribution and aren't dependent on third-party or share/freeware packages. This method is useful for systems that have the luxury of a period of time when the system may be offline in which to perform the backup and where the data to be backed up will fit on the number of tapes available online.

The Problem

Backups are time-consuming, tedious, and a necessary part of UNIX systems administration. Depending upon the system hardware used and the amount of data, a backup may take anywhere from minutes to several hours. Offline backups must generally be run during times of low use -- typically nights and weekends. The manual tasks of shutting the system down to single-user mode, beginning the backup, and then waiting for the backup to finish are usually done by less skilled operators or junior administrators. In many businesses, the costs of this additional staff is prohibitive and is just one of the reasons given for inadequate backup procedures.

Automation is the Answer

The solution that will be described here combines several scripts to automate the numerous details of robustly executing single-user backups and verifying the readability of the backups when they are completed (Figure 1). The most important hardware requirement is that adequate tape capacity be available to store the complete backup. The scripts presented assume that a single tape is sufficient. The additional procedures to manage a multi-tape loader would be relatively easy to add, but the choice of loader and controlling software would determine the exact commands used.

There are a number of steps necessary to ensure a robust backup solution:

  • Verify that a tape is available and can be written to
  • Ensure that data already on the tape can be overwritten
  • Shutdown all applications
  • Create flag file to indicate backup required
  • Bring the system to single-user mode
  • Execute the backup commands for filesystems in a known sequence
  • Document the contents of the backup
  • Bring the system back online
  • Notify the appropriate sysadmin if any steps fail
  • Log successes and failures
  • Verify the data written to tape immediately
  • Unload the tape when verification is complete

The scripts described here were written for Compaq's Tru64 UNIX v4.0 and above. File and command locations may vary with other UNIX variants, but the scripts should be easily modified to work correctly. In many UNIX variants, the mt command is the primary method of controlling a tape drive. The return code from the mt online command is used to verify that a tape is loaded and is writable. In the scripts, the command for loading a tape, unloading a tape, and setting tapes on and offline are all defined as environment variables. These are then called as needed making it easy to adapt to different UNIX variants or to add loader capabilities. Not all tape drives implement the full set of control commands necessary to load and unload the tape.

Check the Tape

Once the tape is online, it must be read to determine whether any data already on it can be overwritten. The tape_check script calls the dump command, or whatever backup command is defined, to read the first dump file on the tape and determine whether it may be overwritten. The check used in this version is a simple aging. If there is a backup file already on the tape, and it's more than 72 hours old, it may be overwritten. If the file is less than the defined age, the backup is aborted and a message is sent via pager to the sysadmin. A more elaborate scheme using ANSI labels, rotating schedules, or staggered incremental backups could be devised and implemented as the tape_check script. Command-line switches will override the 72-hour check.

Stop Applications

The process of shutting down any applications running on the system is beyond the scope of this article, however there are a few issues that should be considered when choosing a shutdown method. The default shutdown command of Tru64 UNIX prior to version 5.0 doesn't service any application's special requirements for a graceful exit. It will kill processes without executing the scripts used to cleanly shut down the individual system services. The init command does execute the requisite kill scripts, however, it doesn't have a reboot option. As a workaround, the script that starts the backup process and that is called via cron is used to run any specific kill scripts to ensure an orderly application shutdown. This also allows any failure during the shutdown to be logged, and proper notification made while leaving the system up for the sysadmin to take any recovery steps necessary and to cleanly shut down the system.

Create Flag File

The run_cold_backup script creates a file on the root partition, /local/.autobackup that serves as a flag that after boot into single-user mode the backup script is to be executed.

The last step of this phase is to finally run the shutdown -r now command to reboot the server after all the applications have been stopped.

System Boot

As the system boots, one of the first scripts executed is /sbin/bcheckrc. This script checks and mounts all the local filesystems. A line is added to bcheckrc to also execute the script /local/autobackup. The autobackup script logs the start of the backup, executes the backup script and when the backup completes logs the end of the backup. The /local/.autobackup file is copied to /local/.verify_backup to serve as the indicator that the tape in the drive should be verified against the last backup log file. The last command removes the /local/.autobackup file.

Backup

The backup script does the actual work of the backup. It again verifies that a tape is available. It also creates a log file using a list of filesystems and other information specified in a site.rc file and derived from commands such as “date”. The vdump command is then executed on each filesystem in the list in turn, with a full list of all files in the backup redirected to the log file (Figure 2).

Verify Backup Integrity

Once the backup is complete the system continues to boot to multi-user mode and initializes the network. A script placed in the rc3.d directory checks for the /local/.verify_backup file, and, if it's found, begins verifying the backup. The last real file from each fileset is selected from the backup log file listing and the tmp directory is checked to ensure enough space is available to restore that file. The selected file is restored to the /tmp directory and a checksum is calculated for both the original file and the restored version. The checksums are compared and any difference in checksum or a failure to restore the file generates a notification to the sysadmin.

In order to restore the last file from each fileset, the vrestore command must read each file in the fileset to find the one specified. This method ensures that the full tape is readable.

Conclusion

This set of scripts enables a sysadmin to implement an unattended single-user backup system in a robust, easily maintained manner. The scripts as described have been in daily use for over 12 months on several systems with no failures. The only operator intervention that is required is to change the tapes daily and monitor the email and log files.

About the Author

Alan Davis is a UNIX systems administrator with over 12 years of experience in many different flavors on UNIX. His interests lie in the areas of process automation and administrative tools. He may be reached via email at: Davis_Consultants@att.net.