Cover V06, I11
Article
Listing 1
Listing 2
Listing 3

nov97.tar


Forget Me Not

Donald Bryson

About a year ago I took the call we all hate to take - a dead hard drive without a decent backup. The system was configured for nightly automatic backups, but the operator stopped changing tapes two weeks before the failing hard drive corrupted the file system. It's a dichotomy. An easy backup doesn't seem important to the user. If it's important, then it should take more of their time. If it takes too much time, then it's too much trouble. Users don't understand the importance of backups - until it's too late. Regardless of who lost the data, however, both the innocent and the guilty lose. On the way back from the trip to that site, I decided to write a backup nagging system. The goals of the system are simple - make backups easy to perform and very hard to forget. My solution was two script files: CheckNBack (Listing 1) and ForgetMeNot (Listing 2). CheckNBack automatically creates an archive from a root crontab entry and ForgetMeNot nags the users when things are not right.

Together the two scripts guard against operator negligence. They inform both the operator and his/her supervisor of potential catastrophes caused by those easily forgotten tape drives. CheckNBack verifies that a tape is in the drive. It emails the supervisor if the tape is not present. It verifies that the tape has been changed, and emails the supervisor if it hasn't. It also verifies that the tape is readable after the backup is complete. It emails the supervisor if it isn't. ForgetMeNot keeps the entire site informed. When a user logs into the system, they know the status of the previous backup. If there is a problem with the backup, you can even disable user login until the problem is resolved. ForgetMeNot also displays the system date and time. CheckNBack is executed from cron at night, and the correct date and time are very important - especially if the system contains a database management system that does not have transaction logging. A backup performed while the database is in flux is almost as bad as no backup at all.

Types of Backups

CheckNBack and ForgetMeNot focus on user data. Backups can be classified as system backups or data backups. System backups are byte-by-byte copies of a file system, partition, or complete hard drive. A system backup, at the least, must archive the special character block devices from the system. Also, a good system backup should include some type of recovery disk with the archive. Recovery disks allow booting the system from diskette and restoring the archive in the event of hard drive failure.

cpio is useful for system backups. UNIX V Release 4 also has a utility, volcopy, that can perform system backups. Most commercial backup programs can also perform system backups. System backups should be done regularly, but not necessarily nightly.

Data backups archive the user data files. Every multiuser system should have a daily data backup. A data backup can be as simple as copying a few data files to a diskette or as involved as copying multiple partitions of data to a multigigabyte tape. Again, the main distinction between a data backup and a system backup is that the system backup creates recovery disks, includes the system configuration files, and includes the character block device files. CheckNBack only performs data backups.

You can use any number of backup schemes to perform either a system or data backup. Backup schemes are generally grouped as full, incremental, or differential. A full backup archives all files from the source directory. Incremental and differential backups archive only modified or new files. The difference between a differential and incremental backup is how the backup determines which files have changed. An incremental backup compares the hard drive with the archived files on any type of backup. Its starting point is any backup: full or incremental. A differential backup compares the hard drive with the last full backup. Differential backups archive all changed files since the last full backup.

Although they are not as safe as a full backup, differential backups are safer than incremental backups. You can restore a system with your last full backup and any good differential backup even if you have a few bad tapes. However, one lost incremental tape is a catastrophe. The lost incremental tape is the only source for the changes between the previous tape and next tape. Worse than the lost data, the hard drive is restored to an unknown state. The restored image does not reflect the system on any given day if you restore with a missing incremental tape. That is not the case with a differential archive. It may be dated, but it is an image of the system at a given time. Unfortunately, it is very easy to lose an archive: a bad tape, a tape lounging in the drive for weeks, or the wrong tape at the wrong time. CheckNBack only creates full backups. You can, however, modify the script to perform other types of backups.

tar and cpio

CheckNBack uses tar to perform the backup. It is a good choice for data backups because some versions of tar support automatic compression. While tar does not completely verify the archive, it does perform minimal checksums when the archive is read back. If the archive is valid, then the sum of the tar header variable, chksum, for each file entry in the directory, always equals 0.

While tar is my personal tool of choice, it is important to remember its limitations. Many versions of tar cannot archive symbolic links. For example, prior to the release of Open Server 5.0, the SCO version of tar did not archive symbolic links. Now it stores the source file name when it archives a linked file. However, it does not store the actual data. Make sure you are archiving actual data and not merely file names. Many versions of tar cannot archive empty directories. Also, tar can only store 17 levels of nesting in the archive. Even with less than 17 levels of nesting, you can still get in trouble with file names (the maximum length of a file name is only 99 characters).

While CheckNBack does not use cpio, cpio does have several advantages over tar. These advantages make it the tool of choice for system backups. It can backup the device files. Path names can be longer than tar; cpio allows 1024 characters compared to the 99 character limit of some implementations of tar. You can calculate the checksum value for a cpio archive and then compare the result with sum -r. This is better validation than the directory checksum of tar.

Commercial Backup Programs

System backups can also be performed with various commercial programs. Commercial products have many advanced features over and above a simple script like CheckNBack. The advanced features of commercial backup applications may be a disadvantage for simple system configurations, however. The advanced features of such products add to the complexity of the backup process, and may thus require the attention of an experienced system administrator, not the situation for which CheckNBack was designed.

Compression

CheckNBack uses compression when it is safe. Some tar implementations can automatically compress the archive. For example, when the latest version of SCO tar creates an archive with the C switch (instead of the c switch), it flags an archive as compressed. It does not automatically do the compression, it just flags the archive as compressed; you must manually compress the files beforehand. The C switch places a compressed flag in the tar header. The flag indicates tar should pipe the archive through uncompress when extracting. compress creates a major problem for general distribution scripts like CheckNBack; compress silently breaks the file linkage. If CheckNBack used the compress utility, it would break the linkage on any linked file that it touched - an unneeded surprise for any sys admin trying to restore a system on a Saturday night.

The GNU version of tar can also compress its archive. It pipes the archive through gzip. This does not break symbolic links. It is simple to use, add the z switch to the tar command. The GNU version of tar uses gzip for legal reasons. The GNU version of tar could actually use compress because compress can pipe stdin and stdout like gzip. However, several compression algorithms contained in the UNIX compress command are patented by IBM and UNISYS. These algorithms preclude compress being distributed under the GNU license; therefore, GNU uses gzip.

Compression saves tape, but loses reliability. A compressed archive with a bad block is usually useless. Data loss is compounded by compression. CheckNBack uses compression on a GNU system, but does not use compression on others. You can disable the compression feature of CheckNBack even on a GNU system. Set GNU=NO in the configuration file, CheckNBack.cfg. Doing so just turns compression off, even on a GNU system.

cron

CheckNBack is designed to execute from cron. You should always submit the CheckNBack job as root. This prevents any problem with file permissions. It also guarantees the job will execute on any system. Only root can submit a cron job if cron.deny or cron.allow do not exist in the crontab directory. Also, do not submit the cron job by editing the crontab file manually. Manual changes to the crontab do not take effect until reboot. Many UNIX systems are not rebooted often. Submit your backup with crontab -e. This submits it immediately. Add "15 0 * * * CheckNBack" to the root crontab. This will run the backup every morning at 12:15 A.M. Note that ForgetMeNot requires the nightly backup to always finish the next day, so on most systems, you should start the backup right after midnight .

Installation

You should create a directory to store the CheckNBack and ForgetMeNot files. Any directory will do. There are three stock email templates that are mailed to the supervisor in the event of a problem: noread.txt, numdiff.txt, and tchange.txt. noread.txt is sent if CheckNBack cannot read the archive it created. numdiff.txt is sent if the number of archived files are different from the current number of files in the data directory. tchange.txt is sent when the operator has not changed the tape from the day before. Customize these messages to suite your needs.

The configuration file is CheckNBack.cfg (see Listing 3). It can be stored in /etc/default, the CheckNBack work directory, or any other directory you select. Set the variable, configdir, to your choice in both scripts. CheckNBack.cfg holds the configuration information for both CheckNBack and ForgetMeNot. The variable, GNU, indicates whether the system is a GNU system that can generate compressed tar archives. The option, NOCHECK, is used to prohibit ForgetMeNot from checking on a particular weekday. If the backup fails, the NOLOGIN parameter prevents the user from logging into the system. Naturally, this feature is disabled for the root account. BACKUPDIR is the top data directory. The destination device is stored in BACKUPDEVICE. You may disable both CheckNBack and ForgetMeNot by setting the DISABLE variable to YES. The work directory for both scripts is contained in CBDIR. If you wish to display the system date and time when your user logs into the system, set VERIFYDATE to YES. Email is sent to the designated backup supervisor indicated with BACKMGR. This should not be the operator, but the operator's supervisor.

Every user's .profile should be modified to execute ForgetMeNot. The lines that must be added to the user's .profile are:

ForgetMeNot
if [ ! ?# -eq 4 ]
then echo "Login not allowed."
exit
fi

The script files, CheckNBack and ForgetMeNot, can be located in the work directory or placed in /bin or /usr/bin. CheckNBack.man and ForgetMeNot.man should be loaded into the local man page directory.

Conclusion

CheckNBack and ForgetMeNot have limitations: it does not perform system backups; the archive is not byte-by-byte verified; device files are not archived; and CheckNBack only supports compression on a GNU system. However, used together they can be useful tools. Let's not take anymore of the dreaded gone-because-it-was-forgotten phone calls from users.

About the Author

Don is president of Quality Software Solutions, Inc. He is the author of TimeClock, TimeClock Lyte, DbDelta, and the Property Presentation System. He can be reached via email at dbryson@tclock.com.

About the Author

Don is president of Quality Software Solutions, Inc. He is the author of TimeClock, TimeClock Lyte, DbDelta, and the Property Presentation System. He can be reached via email at dbryson@tclock.com.