Article

Backups with Standard UNIX Commands

R. King Ables

One of the most important duties of any system administrator is backing up system data. If any disaster strikes the computer and files are lost, they can be restored from the backup media. Your backup copy is like an insurance policy for the data your organization depends upon.

The original designers of UNIX did not need a particularly sophisticated backup utility since the original UNIX machines were mostly used for research and experimentation, not for production as they are today. The Computer Research Group at the University of California at Berkeley made the most significant contribution to UNIX backup utilities during that time when they included the dump(1) command in their distribution. Because it came with the operating system and provided most of the capabilities people needed, dump became very popular. It has been included in many other versions of UNIX, although sometimes not named dump. In USL's System V Release 4, for instance, it is called ufsdump, because it works with the Berkeley filesystem, known as ufs in SVR4. In IBM's AIX operating system, the dump program is called backup.

If you have a version of dump, you should read the article entitled "Dump and Restore" elsewhere in this issue. If your version of UNIX does not come with some version of dump, or you use a type of filesystem that it cannot backup, you must find another method of doing backups. Several software companies offer products to fill this need, but if you have a small shop run on a shoestring budget, you may not be able to afford such software. About the only option left to you is to use one of the utilities included in just about every version of UNIX, tar(1) or cpio(1). They may not be as fancy as commercial products or have as many options, but you will be able to backup your system, which, after all, is the major goal.

Backing up Files with tar

tar (standing for Tape ARchive) is really meant for putting a directory structure on a tape for transport to another site. This makes it good for doing a full backup, but not very good for doing incremental backups. While it is possible to use tar to do incremental dumps, the method is fairly convoluted and can fail depending on the number of files to be backed up and the size of the shell's command buffer. In general, don't use tar for incremental dumps. Use it only when you simply want to make a full copy of a directory hierarchy.

One advantage some versions of tar have is they can sense end of media on certain devices. For example, tar in AIX recognizes when the floppy device is full and prompts you for another floppy. Only a very small site or a home machine is a likely candidate for backing up onto floppy disks in the first place. But if that describes your system, tar may be more suitable.

One big disadvantage tar has when doing tape backup is that it exits when it encounters a bad spot on the tape, when writing, or worse, when reading. The data on the tape following the bad spot will be inaccessible. Programs have been written to skip past this spot and get other data from the tape, but it is better not to ask for the problem in the first place.

If you do select tar as your backup method, be sure to specify relative path names (such as ./etc) rather than absolute pathnames (like /etc). These pathnames are included in the tape headers and prevent you from restoring the files anywhere other than the same spot. In the case of a single file restore, you may wish to read the file off of the tape and compare it with the existing version before replacing it. Some versions of tar now allow you to override the absolute pathname with a relative pathname, but not all do, and if your version does not, you'll be stuck restoring the file wherever tar wants you to. [For example, SCO UNIX has a -A option that strips leading slashes, making absolute paths into relative paths, but SVR4 doesn't offer such an option. --lsr]

To backup a filesystem, such as the /usr filesystem, with tar, you would become root and use a command such as:

% cd /
% tar -cvf /dev/rst0 ./usr

First, cd to the top of the directory tree so that everything on the tape will be relative to the root. The three arguments to the tar command are:

c	create a tar file
v	verbose (print out each filename as it is written to the tape)
f	specify the name of the tape device in the next argument

The f argument is always followed by the name of the tape device. After all the arguments, give the name of the directory to write onto the tape relative to the current directory rather than as an absolute pathname.

To later see what is contained on the tape, type:

% tar -tvf /dev/rst0

Rather than the c argument, use t to print the tape's table of contents.

Restoring Files with tar

To restore the entire contents of the tape just written, use the following command:

% tar -xvf /dev/rst0

The x argument means extract the files from the tape. In many cases you will not want to extract the entire tape, but only a single file from the tape. For example, let's say someone (not you, but another administrator!) accidentally deleted the sendmail.cf file from your system. This file can be found in different places in different versions of UNIX, but for this example, say yours is in /usr/lib/sendmail.cf. Since you wrote the tape with relative pathnames, the file on the tape will be called ./usr/lib/sendmail.cf. This is important because when you specify a filename, it must match the name of the file on the tape exactly. To restore only this file, use the command:

% tar -xvf /dev/rst0
./usr/lib/sendmail.cf

tar will search the tape for the specified filename or filenames if more than one is listed on the command line.

Perhaps you suspect sendmail.cf was changed, but you cannot be sure simply by looking at the date on the file. To restore the sendmail.cf file in a different directory so that it can be compared to the existing /usr/lib/sendmail.cf file, you would do this:

% cd /tmp
% tar -xvf /dev/rst0 ./usr/lib/sendmail.cf

This extracts the file from the tape and creates it as /tmp/usr/lib/sendmail.cf. You could then compare it to /usr/lib/sendmail.cf with diff(1) and replace it or delete it as you wish.

tar is adequate for making complete copies of directories and full backups of filesystems, but if you want to get any more clever than this by doing incremental backups, you probably will want to use cpio instead.

Backing up Files with cpio

When combined with the find(1) command, cpio (which stands for CoPy Input and Output) provides a good mechanism for doing full and incremental backups of files. cpio has several advantages over tar:

cpio recovers from bad spots on the tape

cpio can backup "special" files in /dev

cpio can accept a list of files to backup rather than working by directory hierarchy

cpio produces more portable files than tar for transport across different versions of UNIX

cpio accepts a list of files on its standard input stream and dutifully writes them to the tape. The problem becomes, how do you generate a list of files? This is where the find(1) command comes in.

The find command searches a filesystem hierarchy starting at a specified directory and performs certain actions on the files it finds depending on command-line arguments. The syntax of the find command is unlike most other UNIX commands and may be a bit difficult to understand at first. Studying these examples as well as the examples in the man page, plus playing with the command on your system, should help.

In our case, we simply want to print a list of files that fit a certain criteria and pipe that list into cpio.

% cd /
% find ./usr -xdev -print

will print a list of files contained in the /usr filesystem. The -xdev argument keeps the search list from spanning onto another physical device (i.e., another filesystem mounted underneath the /usr mount point, like /usr/local, which might be on another disk). [Editors note: find -mount is the SVR4 syntax to do the same thing as find -xdef.] Notice the use of the relative pathname for /usr since cpio, like tar, will write the filename on the tape as it is given. The backup tape should contain relative pathnames rather than absolute pathnames.

By combining this find command with the cpio command, you can write each file in the /usr partition onto a tape with:

% find ./usr -xdev -print | cpio -oB >/dev/rst0

The o argument to cpio puts cpio in "output mode," which causes other options to behave in specific ways. The B argument instructs cpio to use a blocksize of 5120 bytes rather than the default 512. The larger size is more efficient for a tape.

To look at the contents of the tape, use the following command:

% cpio -itvB </dev/rst0

The i argument puts cpio in "input mode," the t argument (similar to tar's t argument) prints out a table of contents of the tape, and the v argument makes it verbose. Without the v argument, only the names would be displayed. With the v, the listing appears in much the same format of an ls -l listing of a directory, with ownership, file modes, and dates.

To backup only files that have been modified recently, just tell find to only list files that have been modified in the last N days:

% find ./usr -xdev -mtime -3 -print | cpio -oB >/dev/rst0

will backup all files modified in the last 3 days (72 hours). The -mtime -3 argument means "less than 3 days." An -mtime 3 argument means "exactly 3 days" within about 24 hours of the time find is executed. An -mtime +3 argument means "more than 3 days."

On most SVR-4 systems, you can tell cpio to recognize the end of tape and request another tape by using the -O option. The -O option can only be used with the -o option. After the -O option comes a filename, usually the raw tape drive device name. When it reaches the end of tape, you'll be given a chance to swap tapes. If you also use the -M option, you can specify a message for cpio to use when it prompts you to swap tapes. You can even embed a %d in the message to make cpio show you a tape cartridge sequence number when it prompts you. If you can fit a single partition or filesystem on a single tape, tape swapping won't be an issue. If your backup medium is 8mm or 4mm tape, swapping will rarely present a problem. If you use cartridge or 9-track tape, you may be limited to a few hundred megabytes or less for a single tape. The -O and -M options help you surmount this problem. However, if you are using cron to run cpio, you won't be around to answer the prompt and swap the tape. Consider your tape's capacity when planning your backup. In particular, if the cpio provided with your system does not include the -O option, you must plan your backup strategy carefully with the tape capacity in mind.

Restoring Files with cpio

Restoring files from a cpio tape is similar to restoring files from a tar tape. To restore the entire backup tape, use the following command:

% cpio -ivdB </dev/rst0

To restore a specific file from the tape:

% cpio -ivdB ./usr/lib/sendmail.cf \
</dev/rst0

Like tar, the name specified on the cpio command must match the name as it was recorded during the backup exactly. Some versions of cpio are smart enough to know that ./filename is the same as filename and do not include the ./ as part of the name on the tape. You can always list the contents of the tape in order to see just how cpio wrote the filenames.

Incremental Backups with cpio

Incremental backups allow you to backup the files that have changed since the previous backup without having to write the unchanged files onto the tape again. If you are unfamiliar with the idea of an incremental backup, you should read the sidebar Backups for the Beginner, which accompanies the "Dump and Restore" article elsewhere in this issue before continuing.

To select files that have been updated since the last backup, you must maintain a "bookmark" file which has a creation date of the last time you did a backup. Then use the -newer argument of the find command to find files that have been modified since that date. The commands:

% find ./usr -xdev -newer \
/etc/backup-time -print | \
cpio -oB >/dev/rst0
% touch /etc/backup-time

will write all files in the /usr partition that have been modified since the last time the /etc/backup-time file was modified (probably the last time the touch command was used to update its modification time). When these same two commands are executed the following day, the result is that all files modified in that 24-hour period are listed by the find command and written to tape by cpio.

Remote Backups with cpio

Sometimes (often) you must backup a machine that has no local tape drive attached. Rarely will you have a tape drive on every server. To perform a remote backup across the network, you can divert the output of cpio into another pipe and send it into a command on a remote machine that will write the tape.

The dd(1) command is used to write data directly to a device, such as a tape drive. The syntax of dd is about as painful as the syntax of the find command, although if you've been around long enough to have used CDC's NOS operating system, it might feel somewhat familiar. Like find, however, dd can be a valuable command to know and it is usually worth suffering with the syntax in order to master its use.

To modify the previous backup commands to write the backup to a remote tape drive, change the commands to be:

% find ./usr -xdev -newer /etc/backup- time -print | \
cpio -o | rsh tapehost dd of=/dev/rst0 obs=5120
% touch /etc/backup-time

The find command is the same as before. The cpio command has been changed so that the output is not written directly to a tape drive. The B argument also has been removed because cpio doesn't need to worry about blocking the data on the tape since the dd command will take care of this. Then you pipe the output of cpio into an rsh(1) command. rsh (remote shell) executes the command provided on the specified remote host, assuming you have the proper authorization to execute commands remotely. (See "UNIX Security in a Networked Environment," by Laurie Sefton in the January/February issue of Sys Admin, Volume 2, Number 1.)

With the dd command, the of= is used to specify the output file for the command and obs= is used to specify the output block size. This is the way to control the blocksize on the tape.

To list the contents of this tape, reverse the whole process:

% rsh tapehost dd if=/dev/rst0 ibs=5120 | cpio -ivt

Note that the B is left off of the cpio command here. The dd command uses if= and ibs= for input file and input block size, respectively.

To restore the entire contents of the tape:

% rsh tapehost dd if=/dev/rst0 ibs=5120 | cpio -ivd

To restore the single sendmail.cf file from the tape:

% rsh tapehost dd if=/dev/rst0 ibs=5120 | cpio -ivd \
./usr/lib/sendmail.cf

Putting It All Together

You can take all of these different ideas and combine them into a script that simulates Berkeley-style dump levels using find and cpio. The script also allows you to write the backup onto a remote tape drive. The cbackup (Listing 1), crestore (Listing 2), and clist (Listing 3) scripts provide a starting point for developing your own customized backup/restore procedures using commands that are available on just about every UNIX system.

For example, we can backup files in the /tmp filesystem, which is probably not a real-world example, but it is short enough to make the point, and write the backup to a tape drive on another machine.

% cbackup 0 /tmp batman:/dev/rst0
Remote backup to /dev/rst0 on batman
tmp
tmp/file1.txt
tmp/level.1
tmp/mail.txt
tmp/level.2
1 blocks
1+0 records in
0+1 records out
%

The records in and records out messages come from the dd command, which is used because this is a remote backup. These messages will not be present when the backup is to a local tape drive.

This command created a level 0 backup. If a later backup is run with a level of 1 (probably the following day), then only files modified since this backup would be written to the tape.

To list the contents of the backup tape:

% clist batman:/dev/rst0
0+1 records in
40777  root        0  Jun 26 16:53:28 1993  tmp
100644 ables       0  Jun 26 16:51:06 1993  tmp/file1.txt
100644 ables       0  Jun 26 16:56:40 1993  tmp/level.1
100644 ables       0  Jun 26 16:53:08 1993  tmp/mail.txt
100644 ables       0  Jun 26 16:53:28 1993  tmp/level.2
1 blocks
1+0 records out
%

To restore a single file from the tape:

% crestore batman:/dev/rst0 . tmp/mail.txt
0+1 records in
1+0 records out
tmp/mail.txt
1 blocks
%

This creates a directory called tmp in the current directory, since we specified . as the destination directory, with the mail.txt file in it. To restore the entire backup tape into the real /tmp directory, you would use:

% crestore batman:/dev/rst0 /
1+0 records in
0+1 records out
tmp
tmp/file1.txt
tmp/level.1
tmp/mail.txt
tmp/level.2
1 blocks
%

The files are named with relative path names, but the crestore script has been told to use / as the restore directory, so the names are correct.

Reference

Thomas, Rebecca, and Rik Farrow. X Administration Guide for System V. Englewood Cliffs, NJ: Prentice Hall, 1989. ISBN 0-13-942889-5.

There are many more options and variations to the commands used in this article. If you must do your backups with standard UNIX utilities other than dump and restore, Chapter 3 of this book explores these and other issues in greater detail and is an excellent reference on the subject of backups.

About the Author

R. King Ables has been a UNIX user since 1980 and has been managing systems or developing system management and networking tools since 1983. He is currently doing system and network management development for HaL Computer Systems in Austin, TX.