Cover V07, I11
Article
Listing 1
Listing 2

nov98.tar


Managing Multi-File System Backup Using dump/restore

Yufan Hu

As UNIX system administrators, many of us are still using our good old friends, dump and restore, to do backups and restores for our UNIX file systems. dump and restore are some of the oldest and most popular backup tools used on many UNIX systems, especially those derived from BSD.

Over the years, the combination of dump and restore has proven its reliability and simplicity. Small is beautiful has been the philosophy of programming since the very beginning of the UNIX system. dump and restore are small, reliable, and flexible. They are not only able to do backup jobs independently, but are also able to be integrated with other programs to complete more sophisticated jobs.

Root is Relative

On UNIX, the file system refers to the entire directory tree starting from the root, /. From a user's point of view, there is no concept of a partition or a disk. Everything is located within this giant directory.

For administrators, the view is a bit different. We still have the concept of disk and disk partitions. We put a disk or a partition into the overall file system by mounting the disk or partition onto a directory. After mounting, the contents of the root directory of the disk become the contents of the mounting directory. The overall file system is thus expanded without the users noticing the difference, except for the added directory tree branch. From the administrator's view, we call the directory tree in a single disk or disk partition a file system. We call the directory on which we mount the file system the mounting point of that file system.

The dump command is made to back up one file system at a time. To back up a file system, we simply use dump as in:

dump  0bdsf  126  54000  13000  /dev/rst8  /home

Where /home is the mounting point of the file system. It is also possible to specify the disk or partition device name for the file system instead of the mounting point, as in:

dump 0bdsf 126 54000 13000   /dev/rst8   /dev/sd2f

I prefer using the mounting point to the device name, because the mounting point is more meaningful than the device name as far as the backup/restore process is concerned.

dump can also do incremental backups by specifying different dump levels, as in:

dump 5bdsf 126 54000 13000 /dev/rst8 /dev/sd2f

A simple daily incremental backup plan can be set up as follows:

Sunday    Full backup      dump 0bdsf 126 54000 13000 /dev/rst8 /home

Monday    Backup changes   dump 9bdsf 126 54000 13000 /dev/rst8 /home
made on Monday

Tuesday   Backup changes   dump 8bdsf 126 54000 13000 /dev/rst8 /home
made on Tuesday

Wednesday Backup changes   dump 7bdsf 126 54000 13000 /dev/rst8 /home
made on Wednesday

Thursday  Backup changes   dump 6bdsf 126 54000 13000 /dev/rst8 /home
made on Thursday

Friday    Backup changes   dump 5bdsf 126 54000 13000 /dev/rst8 /home
made on Friday

dump is also able to dump a file system onto a tape drive on a remote system by specifying a remote device name for the tape drive, as in:

rdump 5bdsf 126 54000 13000 back:/dev/rst8 /home

After dump finishes the backup, it automatically rewinds the tape.

Backup Multiple File Systems

Since dump is only for a single file system backup, we can only back up a file system residing on one disk partition each time we invoke dump. As a simple backup tool, dump always thinks it dumps one disk partition to one tape. In many cases, it may not be feasible to back up only one file system onto one whole tape. For example, we may have many small partitions, such as /, /usr, /usr/local, etc., which have only a couple of hundred megabytes, but a relative large tape with say 5 GB. When performing an incremental backup, we almost always have many file systems with only a few megabytes of data to be backed up. In these situations, we certainly want to dump all these file systems onto one tape. This way it not only saves the tape media, but also saves the effort needed to change the tapes.

By using a so-called "non-rewinding" device name for the same tape drive, we can fool the dump to use only a segment of the tape for one file system. The following simple shell script dumps three file systems onto one tape, one after another.

#!/bin/sh

mt -f /dev/nrst8 rewind
dump 0bdsf 126 54000 13000 /dev/nrst8 /
dump 0bdsf 126 54000 13000 /dev/nrst8 /usr
dump 0bdsf 126 54000 13000 /dev/nrst8 /usr/local
mt -f /dev/nrst8 rewind

Since we are using the non-rewinding device driver, dump will not automatically rewind the tape after it finishes the backup. When dump is started for the next file system, it simply writes to the media following the previous dump. Thus it makes dumping multiple file systems onto one tape possible.

We can use the mt command to force the tape to rewind before the whole backup process starts and after it finishes. It is also possible to dump multiple file systems located in different hosts onto one tape. For example:

#!/bin/sh

rsh back mt -f /dev/nrst8 rewind
rdump 0bdsf 126 54000 13000 back:/dev/nrst8 /
rdump 0bdsf 126 54000 13000 back:/dev/nrst8 /usr
rdump 0bdsf 126 54000 13000 back:/dev/nrst8 /usr/local
rsh walnut rdump 0bdsf 126 54000 13000 back:/dev/nrst8 /
rsh peanut rdump 0bdsf 126 54000 13000 back:/dev/nrst8 /usr
rsh chesnut rdump 0bdsf 126 54000 13000 back:/dev/nrst8 /local
rsh back mt -f /dev/nrst8 rewind

The above shell script controls the backing up of different file systems located on different hosts to the tape drive on the host named as "back".

Restoring from a dump Tape

Restoring files from a dump tape is also straightforward. Suppose /home has been dumped to a tape, and we want to restore files in /home/yufan. We can do this via command restore in interactive mode:

shell> restore if /dev/rst8
restore> add yufan
restore> extract

The above interaction will extract the directory yufan into the current directory. From here, we can copy the files to their destination.

The restore command is also simple. It has no idea of multiple file systems on one tape. It takes the given device as a tape and tries to read from it as one file system.

restore can also be fooled using the non-rewinding device driver, but it is a bit more difficult than dump. Before we can use restore on a tape segment, we must make sure the beginning of the segment is currently under the tape head, because restore cannot read the dumped file system from the middle of a segment. This effect can be achieved by using the mt command to manually position the tape to the beginning of the segment. For example, if we want to restore files from /usr/local and the tape was created using the shell script previously shown, we can use following commands to restore /usr/local/lib:

shell>mt -f /dev/nrst8 rewind
shell>mt -f /dev/nrst8 fsf 2
shell>restore if /dev/nrst8
restore>add lib
restore>extract

We used mt -f /dev/nrst8 fsf 2 to move the head forward two EOF marks, thus skipping the segment for / and /usr. This leaves the head at the first block of the segment.

Pitfalls

It is straightforward to dump multiple filesystems onto a tape. We can simply write a shell script to call multiple dump commands, one after another, until all file systems are backed up or the tape is exhausted. We can also write the script so that it logs in to other machines on the local network and fires the dump command there to remotely backup the file systems to the tape. This way we are able to put more than a dozen file systems onto a tape. There is really no rule as to the maximum you can put onto a tape. It is certainly possible to put 10, 20, or even 30 file systems onto a big tape, especially when we are doing incremental backup.

But then how are we going to restore from any one of these segments? Is the file system I wanted in the tape? If so, in which segment? How many EOFs do we need to forward before we can reach the segment?

If we can keep the shell script, we can always check the script, count the number of dump commands before the target segment, and issue a mt command with the proper number of fsf. This seems perfect until the day you need to change the script to add and delete file systems. The script no longer matches any old tapes. Keeping backup copies of these scripts may be one solution, but from time to time you need to know which script was used.

Even if a good matching record can be found for each tape, counting the number of segments each time we need to restore something is certainly not a pleasant job. This is a job meant to be done by the computer instead of a human. After years of suffering, I finally decided to give our old friends a little help so they can be a bit more friendly. The results are the two Perl scripts, mdump and mrestore to make the dump/restore job a bit easier.

mdump - Keep a Record of Multiple Dumps

As discussed above, it is vital to keep a record of what has been dumped to the tape and its exact location (in which segment) on the tape. The best place to store this record is on the tape itself. This way the record always correctly represents the tape, and the two will not be separated.

Since we can use the non-rewinding device to divide the tape into multiple segments for dump, we might as well allocate one such segment for the purpose of housekeeping. The first segment is the one to use, because it can be read before we move to any other segments.

mdump simply stores the list of all file systems, in the form of the mounting point directory names, and in the order these file systems are stored in the tape. It simply prints the names as a string, using ; as the delimiter, and pipes the result to the dd command to store the string in the first segment of the tape.

When we dump multiple file systems from multiple hosts, one problem we may encounter is name clashing. Mounting point /usr can exist on more than one host. And, if /usr on all these hosts is to be dumped onto the tape, how can we distinguish among them? Even if we only dump multiple file systems on the same host to a tape, we may still have the problem of determining which host these file systems belong to if we forgot to properly label and mark the tapes.

To prevent this lost identity problem, mdump does some name normalization before storing the names onto the tape. It uses the form of host:/directory to distinguish each dumped file system. This form of naming is also expected as arguments passed to mdump for any remote file systems that need to be dumped. If a local file system is given to mdump without the leading host part, the local host name is automatically added. This way we will not lose track of which segment belongs to which file system on which host, because all the information has been recorded on the tape itself.

Apart from simple housekeeping job, mdump also tries to automate some scripting work. It always automatically rewinds the tape when started to ensure that the housekeeping information is recorded at the beginning of the tape. mdump can automatically eject the tape if instructed to do so via a command-line option. It will also decide whether to use local dump or a remote dump via rsh based on the host on which the file system resides.

The basic command-line syntax of mdump is as follows:

mdump [options] filesystem filesystem....

We can have following minimum options:

-l num - Indicates the dump level. num can be from 0 to 9, where 0 represents a full dump.

- f dev - Indicates the device name for the tape. It can take the form host:/dev/nrst8 to indicate a remote tape drive.

-u - Indicates that we want the /etc/dumptable to be updated for incremental backup.

-e - Instructs mdump to eject the tape when finished.

We can pass as many file systems to mdump as the tape is capable of holding. Each file system name can be a local directory name that represents the local mounting point of the file system, or can be in the form of host:/directory if it is a remote one. We can also put all these file system names into a plain text file (one line for each file system) and pass the file to mdump. mdump will read the file for the file system names to be dumped.

With mdump, the previous shell script used to dump multiple partitions can be reduced to a one-line command.

mdump -l 0 -f  back:/dev/nrst8 peanut:/ peanut:/usr peanut:/local walnut:/
walnut:/usr walnut:/local

This one-liner can be nicely fitted into the crontab to do scheduled system backup jobs. A sample crontab for backup using mdump can be as follows:

50 23 * * 0 mdump -e -u -l 0 -f back:/dev/nrst8 peanut:/ peanut:/usr \
peanut:/local walnut:/ walnut:/usr walnut:/local
50 23 * * 1 mdump -e -u -l 9 -f back:/dev/nrst8 peanut:/ peanut:/usr \
peanut:/local walnut:/ walnut:/usr walnut:/local
50 23 * * 2 mdump -e -u -l 8 -f back:/dev/nrst8 peanut:/ peanut:/usr \
peanut:/local walnut:/ walnut:/usr walnut:/local
50 23 * * 3 mdump -e -u -l 7 -f back:/dev/nrst8 peanut:/ peanut:/usr \
peanut:/local walnut:/ walnut:/usr walnut:/local
50 23 * * 4 mdump -e -u -l 6 -f back:/dev/nrst8 peanut:/ peanut:/usr \
peanut:/local walnut:/ walnut:/usr walnut:/local
50 23 * * 5 mdump -e -u -l 5 -f back:/dev/nrst8 peanut:/ peanut:/usr \
peanut:/local walnut:/ walnut:/usr walnut:/local

Note that we use the -e option to automatically eject the tape, so that it will not be accidentally rewritten.

mdump will preserve the order in which these file systems appear on the command line or in the text file. Although this order is less important, as mrestore is capable of locating the segment automatically, we might want to put frequently accessed file systems toward the beginning of the tape to speed up the restoring process.

mrestore - Restoring from a Multi-File System dump Tape

After all the hard work done by mdump, restoring from a multi-file system dump table becomes a straightforward job. For example, if we want to restore files or directories for /usr on walnut, the following command will bring us straight to the segment:

mrestore -f back:/dev/nrst8 walnut:/usr

From there, we can use whatever restore command to do the interactive restoring of the files or directories from the segment.

mrestore expects a file system name convention similar to mdump. When such a file system name is given, mrestore automatically consults the housekeeping information stored at the first segment on the tape and calculates the segment number of the file system. It then uses the mt command to position the tape at the beginning of that segment.

Unlike the native restore command, you can pass more than one file system names to the mrestore command. The mrestore command will sort the names according to the sequence of the corresponding file system stored on the tape. It will sequentially position the tape to the corresponding segment for us to do the restoring work one file system after another, without rewinding the tape for each file system.

To identify the contents of a tape created by mdump, we can use mrestore to list the contents of the tape:

mrestore -l -f back:/dev/nrst8

Conclusion and Issues mdump and mrestore are simple Perl scripts that wrap the native system dump and restore commands. They are intended to bring some convenience when performing multiple file system backups to a single tape cartridge.

Even though mdump does some housekeeping for the contents it stores, it can not replace the current organization of our backup archives. We still need to clearly mark and label tapes. mdump does not have any idea of the contents other than that of the tape it created. We need our own approach to organizing tape archives if we ever want to restore anything from them. The current version of mdump and mrestore do not deal with multi-volume dump archives very well. They are not intended for such multi-tape backups. To do remote backup using dump, we need to use the rmt protocol and rsh command. These commands require proper setting of .rhosts file, which may be of some security concern.

About the Author

Dr. Yufan Hu started his system administration and software development career using UNIX in 1983 on a PDP-11/23. Since then, UNIX System and network administration have been part of his research and development career. He is currently in charge of networking and system-related activities for Regent Electronics Corp. He can be reached at: yufan@recmail.com.