Cover V02, I05
Article
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Listing 1
Listing 2
Listing 3
Listing 4
Listing 5
Listing 6

sep93.tar


Fast Backup and Restore Scripts for DAT Drives

Jon Alder and Ed Schaefer

With the advent of 8mm and digital-analog (DAT) cartridge tape drives, the volume of data capable of being stored to tape has increased to the gigabytes range. The time a system administrator requires to search a tape for specific file(s) has also increased accordingly. If the traditional sequential tape search could be replaced by a direct access method, the time needed to restore specific files(s) would be significantly reduced.

Creating a tape with related directories in their own partition or volume could emulate a direct access method; the speed advantage would be skipping unwanted volumes and placing the tape head at the beginning of the volume which contains the desired file(s). With DAT tape drives, you can create multi-volume tapes with ordinary UNIX commands. This article presents backup, restore, and report shell scripts for controlling DAT drives in such a manner. The heart of the model is creating multiple backups on the same tape. One backup is defined as a Tape File System (TFS) with each TFS residing in its own volume.

The Tape File Mark

When you write to a DAT drive without rewinding using a command such as cpio or tar, a tape file mark (TFM) is written on the tape between each backup. Think of the TFM as dividing the tape into separate volumes. Figure 1 shows a TFM between two different tape file systems, TFS1 and TFS2.

Each TFM has a beginning side (BOT) and an ending side (EOT). To move between BOT and EOT of a Tape File Mark, and to move to other Tape File Systems on the tape, a device driver called a tape controller program is required. Because of the differences in these tape controller programs, fast restore will always go to EOT of the mark before the TFS requested. The tape is rewound to the physical beginning of the tape between tape requests.

The Design

The backup script, fast.bu (Listing 4), reads a file containing a list of directories or file systems to be backed up. First, fast.bu creates the first backup, referred to as volume 0, with a file containing the list. Then each file system entry in the list is written to tape in succession.

The restore script, restore.bu (Listing 5), performs maintenance on a tape created by fast.bu. restore.bu restores the volume 0 backup_list to disk and then displays a menu which allows an administrator to choose a Tape File System to view or restore.

The report script, restore.bu reports statistics on the last fast backup.

Configuration and Portability

The three scripts use a configuration file which parameterizes the differences in mini-cartridge tape drives, tape controller programs, and Unix versions. These scripts have been tested with the IBM RS-6000 (AIX), AT&T 3B2/600 (UNIX), and the HP 9000 (HP-UX).

Below is an example configuration entry for positioning the cursor on the screen:

cursor:cursor :Set the cursor position

The entry is divided into three colon-delimited fields. The first field is the shell script parameter name, the second is the value, and the third is a comment. Listing 1 and Listing 2 show configuration system variables for AIX and HP-UX, respectively.

Typically, the Terminal Independent Operation Command, tput with "cup" argument, would be used to position the cursor, but since the AIX tput doesn't support the "cup" argument, another substitiute is used. Listing 3 is a C program, cursor.c, which accepts the screen coordinates as arguments, with the upper left-hand corner being 0,0.

Backup List

Before creating a tape with fast.bu, you must create a file that lists each file system to be backed up, one per line. Below are three directories, /usr, /etc, and /jet, in the backup list:

usr
etc
jet

These directories are backed up relative to root (/) so that in a restore you could restore one or more of them to another disk location if you chose.

The fast.bu Script

Executing fast.bu

fast.bu requires the configuration file "-c" and the backup list of directories "-l" as arguments:

fast.bu -c /usr/bu/backup/config -l /usr/bu/backup/backup_list

Command-line Arguments

fast.bu checks for the existence of the arguments and defaults the configuration file to config and the backup list to backup_list in the present working directory (lines 23 to 37). To check whether two command-line arguments exist, the second command-line argument is appended to a value, x; if "$2" is not defined, then x will equal x (line 23).

fast.bu supports a variable number of command-line arguments since the space between the switch and the file location may or may not exist:

fast.bu -c/usr/bu/backup/config -l/usr/bu/backup/backup_list

The while loop/case statements (lines 40 - 72) support a variable number of arguments. The shell shift command shifts arguments to the left ($2 to $1, $3 to $2, etc) and decrements the number of variables, "$#", by one.

Initialization

The shell variables are set by searching the configuration file for a particular pattern and cutting out the second field delimited by a colon (lines 94 - 102).

The temporary files located in the working directory are nulled out or initialized with the system date (lines 112 - 119). The temporary files are used so the report.bu script can report on the status of the last backup.

If the preprocess.bu file exists in the program directory, each line in the file is executed by the shell (lines 124 - 130). If these commands are related to performing some function, they should be in a shell script of their own that is called by preprocess.bu.

Create Volume 0

fast.bu writes the backup list file, named tape.directory, to tape, using cpio and no rewind device to create volume 0 (lines 135 - 139). The HP-UX version of cpio uses the "R" for Re-syncronization argument (see the configuration file in Listing 2). HP-UX seems to have trouble reading some tape headers and if cpio is not told to (R)e-syncronize, it fails.

Create the Main Backup

Each of the directories in the backup list is then written to tape. The tty command determines whether the backup is running in the foreground or the background (line 147). If the command does not fail (exit status 0), the backup is running in the foreground.

Foreground Backup

In foreground backup, for each file system the number of files to be backed up and the estimated amount of time is displayed. As each file system backup progresses the percentage completed and the total time is displayed (see Figure 2).

For each file system, the number of files to back up is determined (line 154); the cpio command is executed in the background with the no rewind device. The while loop (lines 161 - 214) continues until the background cpio command completes (line 165) and the next file system is backed up.

Within the while loop, the following variables are calculated and displayed:

HOW_MUCH -- This is how many files have actually been backed up. The number is obtained by checking the line count in cpio's output file (line 170). For users of HP-UX: the cpio command creates an extra output line stating the reel number being used. This bumps the number of files done by 1.

PERCENT -- This is just percentage completed as determined by an awk script (line 176).

TIME_SO_FAR -- This is the amount of minutes between the system time and when this file system started backing up. The starting and present time are determined by the date command (lines 157 and 182) as the number of days, hours, and minutes since the first of the month. The awk script (line 185) accepts the start time and now strings, splits them into two arrays of three elements each using an awk internal function call, and then does the calculation. Be aware that awk arrays have a character index. For more information, see The AWK Programming Language. Notice the two arguments to the awk script are in quotes. If they weren't, the two arguments would be six.

ETTC -- This is the estimated time to completion determined by another awk script (line 199). This is a gross approximation since the awk script assumes that each file takes the same amount of time to backup. Surprisingly, when backing up hundreds of megabytes this approximation is close.

Background Backup

If the backup is running in the background or from cron (lines 226 - 237), each file system is backed up without displaying file and time information.

Report Section

After the backup is complete, each file system is displayed with its approximate size; the total approximate size in megabytes, the total time in hours, and the average rate in magabytes/hour are also displayed (lines 242 - 282). If the backup had been run from cron, the output of the report would have been mailed to the cron user (usually root). Figure 3 shows a sample report.

If the kernel has been defined in the configuration file, the script then backs up the kernel and rewinds the tape. If not, it simply rewinds the tape (lines 287 - 292). For HP-UX users: the cpio command refuses to back up the kernel, so we recommend eliminating the kernel definition from the configuration file.

If the postprocess.bu file exists in the program directory, each line in the file is executed by the shell (lines 297 - 303). Once again, if these commands are related to performing some function, they should be in a shell script of their own.

The restore.bu Script

Executing restore.bu

Once a fast backup tape has been created, you need a method of accessing the tape. restore.bu provides that method. When executing restore.bu, you'll use the same configuration file as for fast.bu:

restore.bu -c /usr/bu/backup/config 

Once configuration is completed, restore.bu rewinds the tape to the physical beginning and restores the directory file, tape.directory, from volume 0 (lines 94 - 96). The UNIX print format command, pr, creates two temporary files that are used to maintain and display the tape file systems (lines 98 - 99).

When the preliminary setup is complete, the script defines all the functions needed (lines 101 - 316). Then only three top-level function calls control the script (lines 320 - 322).

The Initial Display

The call to function DIS_dir() clears the screen, displays a heading with the creation day and time, and displays the file systems available in columns of three (lines 101 - 110).

The call to function FTS_dir() prompts for the number of a file system to access, verifies the choice, and places the tape head at the TFM at the beginning of the volume chosen (lines 112 - 175). Figure 4 shows an example for 5 tape file systems after DIS_dir and FTS_dir have been executed.

From the example in Figure 4, if the administrator chooses TFS 4, var, the tape head will fast forward ahead three tape file marks, since the head is already at the start of the first TFS (lines 166 - 174).

The Tape Management Menu

After the user chooses a TFS, the function nrestore_menu (lines 177 - 234) displays the main menu (see Figure 5). The menu options available, along with a description of what each does, are as follows.

View file system files

The view file system option provides a table of contents using the cpio command (line 219). The tee command redirects the output of the cpio command to the file tfs_list located in the working directory, and to the pg command.

If pg is terminated before the end of the cpio command, only those files displayed to that point exist in the tfs_list file. By default, pg displays 23 lines of text; to eliminate the need to continue striking the Enter key, at the pg prompt, change the window size by entering iw where i is the new window size (i. e. 5000w).

When the view completes or terminates, the restart function (lines 298 - 308) rewinds the tape, positions the tape head at the beginnning of the first volume, and then executes functions DIS_dir and FTS_dir redisplaying the Tape Managemenut Menu.

Restore a File

Function RF_menu (lines 236 - 275) restores file(s) to the system. The administrator must answer the following prompts:

1) Please enter the complete restore path. Wildcards are also supported. For example, to restore the /usr2 and all child directories, enter: usr2/*.

2) Do you want to overwrite existing files? If the answer is (Y)es, build the cpio argument variable with the unconditional copy switch.

3) Is this a high priority restore? If the answer is (Y)es, build the cpio variable preceded by the NICE parameter as set by the configuration file (line 263). In order to raise the restore priority with the nice command, log in as root.

Now the files are restored using the variables previously built (line 271). When the restore completes or aborts, the restart function executes as previously described.

Rewind Tape Drive and Exit

Choosing this option executes the rewind function, which rewinds the tape to its physical beginning and terminates the script (line 224).

Choose another Tape File System

Choosing this option executes the previously described restart function, which redisplays the Tape Management Menu.

Exit (Leaving tape at current position)

Choosing this option terminates the script without rewinding the tape. Now an administrator familiar with the tape commands and volume structure can act on a tape volumne from the command line. Before re-using the tape with fast backup, be sure to rewind the tape, since fast backup does not rewind before creating volume 0.

The report.bu Script

Listing 6 is the report.bu script, which reports on the last fast backup performed using the temporary files left in the working directory. It provides the same backup statistics as for the fast backup script.

Conclusion

This article has provided a method for creating fast backup and restore procedures using DAT tape drives and UNIX shell programing. With some prior planning, you can divide the system directories into Tape File Systems and thus further increase the speed of DAT tape drives.

References

Anderson, Craig."FAQ: Info About Tape Drives." comp.unix.aix, April 1993.

Strang, John, Tim O'Reilly, Linda Mui. Termcap and Terminfo. O'Reilly & Associates, 1989.

Aho, Alfred, Brian Kernighan, Peter Weinberger. The Awk Programming Language. Addison-Wesley, 1988.

Kochan, Stephen, Patrick Wood. Unix Shell Programming. Hayden Books, 1987.

Prata, Stephen, Donald Martin. Unix System V Bible, Commands and Utilities. The Waite Group, 1987.

About the Authors

Jon Alder has been working on open systems for 10 years, most of the time in systems design and management. He was the Director of MIS for Marie Callenders Restaurants and now works for Columbia Sportswear in the Developement division of the MIS department.

Ed Schaefer is an Informix software developer and UNIX system administrator at jeTECH Data Systems of Moorpark, CA, where he develops Time and Attendance Software. He has been involved with UNIX and Informix since 1987 and was previously head of software development at Marie Callendar Pie Shops of Orange, CA.