Backups with Standard UNIX Commands
R. King Ables
One of the most important duties of any system administrator
is backing
up system data. If any disaster strikes the computer
and files are
lost, they can be restored from the backup media. Your
backup copy
is like an insurance policy for the data your organization
depends
upon.
The original designers of UNIX did not need a particularly
sophisticated
backup utility since the original UNIX machines were
mostly used for
research and experimentation, not for production as
they are today.
The Computer Research Group at the University of California
at Berkeley
made the most significant contribution to UNIX backup
utilities during
that time when they included the dump(1) command in
their distribution.
Because it came with the operating system and provided
most of the
capabilities people needed, dump became very popular.
It has
been included in many other versions of UNIX, although
sometimes not
named dump. In USL's System V Release 4, for instance,
it is
called ufsdump, because it works with the Berkeley filesystem,
known as ufs in SVR4. In IBM's AIX operating system,
the dump
program is called backup.
If you have a version of dump, you should read the article
entitled "Dump and Restore" elsewhere in this
issue. If your
version of UNIX does not come with some version of dump,
or
you use a type of filesystem that it cannot backup,
you must find
another method of doing backups. Several software companies
offer
products to fill this need, but if you have a small
shop run on a
shoestring budget, you may not be able to afford such
software. About
the only option left to you is to use one of the utilities
included
in just about every version of UNIX, tar(1) or cpio(1).
They may not be as fancy as commercial products or have
as many options,
but you will be able to backup your system, which, after
all, is the
major goal.
Backing up Files with tar
tar (standing for Tape ARchive) is really meant for
putting
a directory structure on a tape for transport to another
site. This
makes it good for doing a full backup, but not very
good for doing
incremental backups. While it is possible to use tar
to do
incremental dumps, the method is fairly convoluted and
can fail depending
on the number of files to be backed up and the size
of the shell's
command buffer. In general, don't use tar for incremental
dumps.
Use it only when you simply want to make a full copy
of a directory
hierarchy.
One advantage some versions of tar have is they can
sense end
of media on certain devices. For example, tar in AIX
recognizes
when the floppy device is full and prompts you for another
floppy.
Only a very small site or a home machine is a likely
candidate for
backing up onto floppy disks in the first place. But
if that describes
your system, tar may be more suitable.
One big disadvantage tar has when doing tape backup
is that
it exits when it encounters a bad spot on the tape,
when writing,
or worse, when reading. The data on the tape following
the bad spot
will be inaccessible. Programs have been written to
skip past this
spot and get other data from the tape, but it is better
not to ask
for the problem in the first place.
If you do select tar as your backup method, be sure
to specify
relative path names (such as ./etc) rather than absolute
pathnames
(like /etc). These pathnames are included in the tape
headers
and prevent you from restoring the files anywhere other
than the same
spot. In the case of a single file restore, you may
wish to read the
file off of the tape and compare it with the existing
version before
replacing it. Some versions of tar now allow you to
override
the absolute pathname with a relative pathname, but
not all do, and
if your version does not, you'll be stuck restoring
the file wherever
tar wants you to. [For example, SCO UNIX has a -A option
that strips leading slashes, making absolute paths into
relative paths,
but SVR4 doesn't offer such an option. --lsr]
To backup a filesystem, such as the /usr filesystem,
with tar,
you would become root and use a command such as:
% cd /
% tar -cvf /dev/rst0 ./usr
First, cd to the top of the directory tree so that everything
on the tape will be relative to the root. The three
arguments to the
tar command are:
c |
create a tar file |
v |
verbose (print out each filename as it is written to the tape) |
f |
specify the name of the tape device in the next argument |
The f argument is always followed by the name
of the tape device. After all the arguments, give the
name of the
directory to write onto the tape relative to the current
directory
rather than as an absolute pathname.
To later see what is contained on the tape, type:
% tar -tvf /dev/rst0
Rather than the c argument, use t to print
the tape's table of contents.
Restoring Files with tar
To restore the entire contents of the tape just written,
use
the following command:
% tar -xvf /dev/rst0
The x argument means extract the files from the
tape. In many cases you will not want to extract the
entire tape,
but only a single file from the tape. For example, let's
say someone
(not you, but another administrator!) accidentally deleted
the sendmail.cf
file from your system. This file can be found in different
places
in different versions of UNIX, but for this example,
say yours is
in /usr/lib/sendmail.cf. Since you wrote the tape with
relative pathnames, the file on the tape will be called
./usr/lib/sendmail.cf.
This is important because when you specify a filename,
it must match
the name of the file on the tape exactly. To restore
only this file,
use the command:
% tar -xvf /dev/rst0
./usr/lib/sendmail.cf
tar will search the tape for the specified filename
or filenames if more than one is listed on the command
line.
Perhaps you suspect sendmail.cf was changed, but you
cannot
be sure simply by looking at the date on the file. To
restore the
sendmail.cf file in a different directory so that it
can be
compared to the existing /usr/lib/sendmail.cf file,
you would
do this:
% cd /tmp
% tar -xvf /dev/rst0 ./usr/lib/sendmail.cf
This extracts the file from the tape and creates it
as
/tmp/usr/lib/sendmail.cf. You could then compare it
to /usr/lib/sendmail.cf
with diff(1) and replace it or delete it as you wish.
tar is adequate for making complete copies of directories
and
full backups of filesystems, but if you want to get
any more clever
than this by doing incremental backups, you probably
will want to
use cpio instead.
Backing up Files with cpio
When combined with the find(1) command, cpio (which
stands for CoPy Input and Output) provides a good mechanism
for doing
full and incremental backups of files. cpio has several
advantages
over tar:
cpio recovers from bad spots on the tape
cpio can backup "special" files in /dev
cpio can accept a list of files to backup rather
than working by directory hierarchy
cpio produces more portable files than tar for
transport across different versions of UNIX
cpio accepts a list of files on its standard input stream
and
dutifully writes them to the tape. The problem becomes,
how do you
generate a list of files? This is where the find(1)
command
comes in.
The find command searches a filesystem hierarchy starting
at
a specified directory and performs certain actions on
the files it
finds depending on command-line arguments. The syntax
of the find
command is unlike most other UNIX commands and may be
a bit difficult
to understand at first. Studying these examples as well
as the examples
in the man page, plus playing with the command on your
system, should
help.
In our case, we simply want to print a list of files
that fit a certain
criteria and pipe that list into cpio.
% cd /
% find ./usr -xdev -print
will print a list of files contained in the /usr
filesystem. The -xdev argument keeps the search list
from
spanning onto another physical device (i.e., another
filesystem mounted
underneath the /usr mount point, like /usr/local, which
might be on another disk). [Editors note: find -mount
is the
SVR4 syntax to do the same thing as find -xdef.] Notice
the
use of the relative pathname for /usr since cpio, like
tar, will write the filename on the tape as it is given.
The
backup tape should contain relative pathnames rather
than absolute
pathnames.
By combining this find command with the cpio command,
you can write each file in the /usr partition onto a
tape with:
% find ./usr -xdev -print | cpio -oB >/dev/rst0
The o argument to cpio puts cpio in "output
mode," which causes other options to behave in
specific ways.
The B argument instructs cpio to use a blocksize of
5120 bytes rather than the default 512. The larger size
is more efficient
for a tape.
To look at the contents of the tape, use the following
command:
% cpio -itvB </dev/rst0
The i argument puts cpio in "input
mode," the t argument (similar to tar's t argument)
prints out a table of contents of the tape, and the
v argument
makes it verbose. Without the v argument, only the names
would be
displayed. With the v, the listing appears in much the
same
format of an ls -l listing of a directory, with ownership,
file modes, and dates.
To backup only files that have been modified recently,
just tell find
to only list files that have been modified in the last
N days:
% find ./usr -xdev -mtime -3 -print | cpio -oB >/dev/rst0
will backup all files modified in the last 3 days (72
hours). The -mtime -3 argument means "less than
3 days."
An -mtime 3 argument means "exactly 3 days"
within
about 24 hours of the time find is executed. An -mtime
+3 argument
means "more than 3 days."
On most SVR-4 systems, you can tell cpio to recognize
the end
of tape and request another tape by using the -O option.
The
-O option can only be used with the -o option. After
the -O option comes a filename, usually the raw tape
drive
device name. When it reaches the end of tape, you'll
be given a chance
to swap tapes. If you also use the -M option, you can
specify
a message for cpio to use when it prompts you to swap
tapes.
You can even embed a %d in the message to make cpio
show you a tape cartridge sequence number when it prompts
you. If
you can fit a single partition or filesystem on a single
tape, tape
swapping won't be an issue. If your backup medium is
8mm or 4mm tape,
swapping will rarely present a problem. If you use cartridge
or 9-track
tape, you may be limited to a few hundred megabytes
or less for a
single tape. The -O and -M options help you surmount
this problem. However, if you are using cron to run
cpio,
you won't be around to answer the prompt and swap the
tape. Consider
your tape's capacity when planning your backup. In particular,
if
the cpio provided with your system does not include
the -O
option, you must plan your backup strategy carefully
with the tape
capacity in mind.
Restoring Files with cpio
Restoring files from a cpio tape is similar to restoring
files
from a tar tape. To restore the entire backup tape,
use the
following command:
% cpio -ivdB </dev/rst0
To restore a specific file from the tape:
% cpio -ivdB ./usr/lib/sendmail.cf \
</dev/rst0
Like tar, the name specified on the cpio command
must match the name as it was recorded during the backup
exactly.
Some versions of cpio are smart enough to know that
./filename
is the same as filename and do not include the ./ as
part of
the name on the tape. You can always list the contents
of the tape
in order to see just how cpio wrote the filenames.
Incremental Backups with cpio
Incremental backups allow you to backup the files that
have changed
since the previous backup without having to write the
unchanged files
onto the tape again. If you are unfamiliar with the
idea of an incremental
backup, you should read the sidebar Backups for the
Beginner,
which accompanies the "Dump and Restore" article
elsewhere
in this issue before continuing.
To select files that have been updated since the last
backup, you
must maintain a "bookmark" file which has
a creation date
of the last time you did a backup. Then use the -newer
argument
of the find command to find files that have been modified
since
that date. The commands:
% find ./usr -xdev -newer \
/etc/backup-time -print | \
cpio -oB >/dev/rst0
% touch /etc/backup-time
will write all files in the /usr partition that
have been modified since the last time the /etc/backup-time
file
was modified (probably the last time the touch command
was
used to update its modification time). When these same
two commands
are executed the following day, the result is that all
files modified
in that 24-hour period are listed by the find command
and written
to tape by cpio.
Remote Backups with cpio
Sometimes (often) you must backup a machine that has
no local tape
drive attached. Rarely will you have a tape drive on
every server.
To perform a remote backup across the network, you can
divert the
output of cpio into another pipe and send it into a
command
on a remote machine that will write the tape.
The dd(1) command is used to write data directly to
a device,
such as a tape drive. The syntax of dd is about as painful
as the syntax of the find command, although if you've
been
around long enough to have used CDC's NOS operating
system, it might
feel somewhat familiar. Like find, however, dd can be
a valuable command to know and it is usually worth suffering
with
the syntax in order to master its use.
To modify the previous backup commands to write the
backup to a remote
tape drive, change the commands to be:
% find ./usr -xdev -newer /etc/backup- time -print | \
cpio -o | rsh tapehost dd of=/dev/rst0 obs=5120
% touch /etc/backup-time
The find command is the same as before. The cpio
command has been changed so that the output is not written
directly
to a tape drive. The B argument also has been removed
because
cpio doesn't need to worry about blocking the data on
the tape
since the dd command will take care of this. Then you
pipe
the output of cpio into an rsh(1) command. rsh
(remote shell) executes the command provided on the
specified remote
host, assuming you have the proper authorization to
execute commands
remotely. (See "UNIX Security in a Networked Environment,"
by Laurie Sefton in the January/February issue of Sys
Admin,
Volume 2, Number 1.)
With the dd command, the of= is used to specify the
output file for the command and obs= is used to specify
the
output block size. This is the way to control the blocksize
on the
tape.
To list the contents of this tape, reverse the whole
process:
% rsh tapehost dd if=/dev/rst0 ibs=5120 | cpio -ivt
Note that the B is left off of the cpio
command here. The dd command uses if= and ibs=
for input file and input block size, respectively.
To restore the entire contents of the tape:
% rsh tapehost dd if=/dev/rst0 ibs=5120 | cpio -ivd
To restore the single sendmail.cf file from the
tape:
% rsh tapehost dd if=/dev/rst0 ibs=5120 | cpio -ivd \
./usr/lib/sendmail.cf
Putting It All Together
You can take all of these different ideas and combine
them into a
script that simulates Berkeley-style dump levels using
find
and cpio. The script also allows you to write the backup
onto
a remote tape drive. The cbackup (Listing 1), crestore
(Listing 2), and clist (Listing 3) scripts
provide a
starting
point for developing your own customized backup/restore
procedures
using commands that are available on just about every
UNIX system.
For example, we can backup files in the /tmp filesystem,
which
is probably not a real-world example, but it is short
enough to make
the point, and write the backup to a tape drive on another
machine.
% cbackup 0 /tmp batman:/dev/rst0
Remote backup to /dev/rst0 on batman
tmp
tmp/file1.txt
tmp/level.1
tmp/mail.txt
tmp/level.2
1 blocks
1+0 records in
0+1 records out
%
The records in and records out messages
come from the dd command, which is used because this
is a remote
backup. These messages will not be present when the
backup is to a
local tape drive.
This command created a level 0 backup. If a later backup
is run with
a level of 1 (probably the following day), then only
files modified
since this backup would be written to the tape.
To list the contents of the backup tape:
% clist batman:/dev/rst0
0+1 records in
40777 root 0 Jun 26 16:53:28 1993 tmp
100644 ables 0 Jun 26 16:51:06 1993 tmp/file1.txt
100644 ables 0 Jun 26 16:56:40 1993 tmp/level.1
100644 ables 0 Jun 26 16:53:08 1993 tmp/mail.txt
100644 ables 0 Jun 26 16:53:28 1993 tmp/level.2
1 blocks
1+0 records out
%
To restore a single file from the tape:
% crestore batman:/dev/rst0 . tmp/mail.txt
0+1 records in
1+0 records out
tmp/mail.txt
1 blocks
%
This creates a directory called tmp in the current
directory, since we specified . as the destination directory,
with the mail.txt file in it. To restore the entire
backup tape into
the real /tmp directory, you would use:
% crestore batman:/dev/rst0 /
1+0 records in
0+1 records out
tmp
tmp/file1.txt
tmp/level.1
tmp/mail.txt
tmp/level.2
1 blocks
%
The files are named with relative path names, but the
crestore script has been told to use / as the restore
directory,
so the names are correct.
Reference
Thomas, Rebecca, and Rik Farrow. X Administration
Guide for System V. Englewood Cliffs, NJ: Prentice Hall,
1989.
ISBN 0-13-942889-5.
There are many more options and variations to the commands
used in this article. If you must do your backups with
standard UNIX
utilities other than dump and restore, Chapter 3 of
this book explores
these and other issues in greater detail and is an excellent
reference
on the subject of backups.
About the Author
R. King Ables has been a UNIX user since 1980 and has
been managing systems
or developing system management and networking tools
since 1983. He is
currently doing system and network management development
for HaL Computer
Systems in Austin, TX.
|