A Flexible
System for Centralized Backup
Ed L. Cashin
Storage capacity is growing, and the increasing centralization
of storage capacity is a current trend. Companies are buying storage
area networks (SANs) and tape robots. A plethora of software products
are available to help us back up diverse clients to centralized
storage, but you may already have all the software you need.
The Strategy
What do we need from backup systems? We need reliability, manageability,
and performance.
For our systems to be reliable, we must ensure that we will be
able to restore from the backups, even if we lose the backup software.
One way is to make sure that the backup data is in a "universal"
format. In the UNIX world, tar and dump are ubiquitous
tools and are excellent choices for storage formats.
If we have a lot of machines, all these tar and dump
backups might be cumbersome. To simplify backup management, we will
control all of the backups from a central location. The backups
will go to central storage over a TCP/IP network.
By centralizing control of the backups, using universal backup
formats, and a TCP/IP network, we eliminate the dependency between
data and the hardware on which it resides. The ability to restore
any host's backed-up data to any host with a network card is
a boon for the systems administrator, who may take advantage of
the extra flexibility to reallocate hardware resources.
This article describes a conceptual framework for building open
backup systems with readily available software. It also presents
a concrete example of a trivial, high-performance implementation
of the framework -- one client host sending backups to a server
host connected to a SAN.
The Framework
The backup system comprises four roles: a scheduler, triggers,
senders, and receivers:
- Scheduler -- The schedule of what backups occur and when
they occur is maintained in a central location. The scheduler
may be anything, even a real human being initiating backups. More
likely, it will be a crontab containing entries for scripts. The
scripts will contain a sequence of triggers.
- Triggers -- Triggers are found on the host where the scheduler
resides. When the scheduler calls a trigger, the trigger connects
to the backup client and starts the client's sender. See
Listing 1.
- Senders -- Senders run on the backup clients. When started
by a trigger on the scheduler host, a sender begins backing up
the client, thus sending all the data to a receiver on the storage
host. See Listing 2.
- Receivers -- A receiver script directs incoming backups
to the storage and records information about the incoming backup
on the host where the central storage resides. See Listing 3.
The Example
In this example, "meili" is a host with a lot of storage
capacity, perhaps through a SAN, mounted onto the /usr/local/backup
directory. Meili will also host the scheduler and triggers.
The example client is "nilda", although many similar
clients could exist. Assume that nilda is a Solaris machine with
mature ufsdump and ufsrestore utilities1.
Both hosts have OpenSSH installed, version 2.9p1. OpenSSH version
2.9p1 can do the ssh version 2 protocol. If you are using
a different version of ssh, you may have to use that version's
ssh protocol or one of the equivalents of the methods in
this example.
The secure shell is a way of connecting jobs over the network.
In this example, the ssh 2 protocol provides a level of security
that would not be possible with the more traditional (and naive)
rsh tools. The ssh protocol provides sophisticated
protection against eavesdropping, password sniffing, and connection
hijacking. For more information on the ssh protocols, refer
to the RFC by Tatu Ylonen in the ssh source distribution.
Normally with ssh, you must enter a password. Even when
using public key authentication instead of regular password-based
authentication, you still usually enter a passphrase at some point
to decrypt your private key. But if we want to use ssh for
unattended jobs (such as nightly backups), we have to do something
new -- task-based authentication, where each public/private
key pair is associated with a specific inter-host task. That's
shown in the example below:
Root's crontab on meili implements the scheduler:
[root@meili ecashin]# crontab -l
# -------- start the triggers at 6PM every night
0 18 * * 0 /usr/local/sbin/triggers-sun | /usr/lib/sendmail -t
0 18 * * 1 /usr/local/sbin/triggers-mon | /usr/lib/sendmail -t
0 18 * * 2 /usr/local/sbin/triggers-tue | /usr/lib/sendmail -t
0 18 * * 3 /usr/local/sbin/triggers-wed | /usr/lib/sendmail -t
0 18 * * 4 /usr/local/sbin/triggers-thu | /usr/lib/sendmail -t
0 18 * * 5 /usr/local/sbin/triggers-fri | /usr/lib/sendmail -t
0 18 * * 6 /usr/local/sbin/triggers-sat | /usr/lib/sendmail -t
Each trigger script runs the full backups first and then any incrementals
that have to be done (see Listing 1).
Notice that the ssh command uses a special identity file
called trigger-full. This is a key generated on the host
where the scheduler lives. We then configure each client's
authorized_keys2 files so that the clients recognize this
key as the trigger for a full backup.
Here is how the trigger key is generated on the scheduler host:
[root@meili /root]# ssh-keygen -t dsa -C trigger-full -f ~/.ssh/trigger-full
Generating public/private dsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/trigger-full.
Your public key has been saved in /root/.ssh/trigger-full.pub.
The key fingerprint is:
d4:6c:56:82:68:86:3b:3b:d1:ef:5f:ff:90:d2:16:0e trigger-full
We hit enter when asked for a passphrase. Normally, this would be
a big mistake, since any user with the ability to read the key files
could execute arbitrary commands on any remote host that recognized
the keys. However, it is important that a passphrase is not required
to use the key, because we want these jobs to run while we relax on
the beach. The difference is that we make sure that each password-free
key can run only one specific command on the remote host, and that
association is configured on the remote host, in the authorized_keys2
file. We'll do that configuration shortly.
This method is more secure than keeping a password in a local
file, like an expect script. If the password in the local
file is discovered by an unauthorized person, then that person gains
full access to the remote host. With task-based authentication,
however, all the person gains is the ability to run the remote half
of a backup job --that's not fantastic, but it's not
disastrous either, and it's not really what most attackers
are looking for.
The following command installs the new key in the client's
authorized_keys2 file, adding the from and command
options before the key:
[root@meili /root]# printf \
'from="meili.beulah.net",command="/usr/local/sbin/sender-full" \
%s\n' "'cat ~/.ssh/trigger-full.pub'" | ssh nilda 'cat >> \
~/.ssh/authorized_keys2'
root@nilda.beulah.net's password:
Then the authorized_keys2 file on the client, nilda, looks
like this:
[root@nilda /root]# cat ~/.ssh/authorized_keys2
from="meili.beulah.net",command="/usr/local/sbin/sender-full" ssh-dss
AAAAB3NzaC1kc3MAAACBAJfVa80YDiqBC1ZQtuaXVajnx0eYdQEmhR5Y1FidIVqE7QtNsJ/
8wAcogWOLXf5cJdPZbOHfpAjcEwDV8LS+T408Oa+F1eoe8hYBNCLBAYmXCqoPZYT5Jvj/
N9i6HW+QawE8x+rcydbZNuyRszvFqEWP2IRHMF89hFuLyZF5PlUzAAAAFQDSmq2rOMSZQNv
VwcuNzvbDF4SDewAAAIBxplH60BqfytambTUuzFESjtxuift5h31zdZcLoMxdQiG3VQnI
RbixpOSoTrzYadScxjmIfiRTxFgrd1px40XA1hrCodJvddI8wRD2QnuqamDys9xv3zAYl
+VlazlMgFy6KLIaHcFlTCxOuTauFbZayqgR38NB5CmVyrxDasgJYAAAAIAiSbH2VLtWh
+LE8arD0cq1DG6eJwsv9vvAwErgqK5GGiUO5l7rTRsphbS0vW4We8rk620zSFX3Aolds
QkUoatuJzsd6jryV3o11Ps8hHWWBhe/+2ryzTm2polCioyTZC4X/jRPcoTqz5i+Dz5
JuZOH63c4Yqv+Q5SUG9xEZLnlAw== trigger-full
Make sure that the authorized keys file is only writable by the owner
or ssh will not trust it. Nilda is now ready to recognize full
backup triggers from meili (the host on which the scheduler resides).
The sshd manpage describes the options that are available
for the authorized_keys files. We use from to limit
the hosts that can use the key. We then use the command option
to limit the use of this key to a specific command. There are other
options available to further limit functionality.
By using these special options in the client's authorized_keys2
file, we are setting up task-specific authentication. This kind
of authentication is much more secure than the old .rhosts
file way of getting unattended machines to trust one another, but
it has much of the same convenience. Make sure you keep the keys
in places where only root can read them so that no one can trigger
spurious backups on the clients or cause other problems.
The storage host, to which the client will send the backup, must
recognize the clients when they have backups to send, so it's
necessary to generate a key for "senders". The method
is similar to the key generation performed on the storage server,
again using empty passphrases:
[root@nilda ecashin]# ssh-keygen -t dsa -C sender -f ~/.ssh/sender
Generating public/private dsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/sender.
Your public key has been saved in /root/.ssh/sender.pub.
The key fingerprint is:
5b:85:4d:77:5d:2c:68:d1:fb:68:86:25:a6:9b:e4:6f sender
Installing the public key on the storage server should look familiar:
[root@nilda ecashin]# printf 'from="nilda.beulah.net",command="/usr/local/sbin/sshbkp-receiver" \
%s\n' "'cat ~/.ssh/sender.pub'" | ssh meili 'cat >> \
~/.ssh/authorized_keys2'
root@meili's password:
As long as the permissions on the authorized_keys2 file are
0640 or something similarly restrictive, the storage host, meili,
will now run the receiver script every time it recognizes this sender
key.
See Listing 2 for the sender script on nilda, the client.
The sender on nilda sends the backup to the receiver on meili
(see Listing 3).
Running the scripts manually helps to show whether things are
ready to go:
[root@nilda ecashin]# /usr/local/sbin/sender-full
To: ecashin@users.sf.net
Subject: nilda sender-full
------------ /usr/local/sbin/sender-full
------------ starting backup on Sat May 26 18:38:17 EDT 2001
------------ storage host: meili
------------ backing up device: /dev/sda1 mount point: /boot
DUMP: Date of this level 0 dump: Sat May 26 18:38:17 2001
DUMP: Dumping /dev/sda1 (/boot) to standard output
DUMP: Label: none
DUMP: mapping (Pass I) [regular files]
DUMP: mapping (Pass II) [directories]
DUMP: estimated 6711 tape blocks.
DUMP: Volume 1 started with block 1 at: Sat May 26 18:38:17 2001
DUMP: dumping (Pass III) [directories]
DUMP: dumping (Pass IV) [regular files]
DUMP: Volume 1 completed at: Sat May 26 18:38:25 2001
DUMP: Volume 1 6700 tape blocks (6.54MB)
DUMP: Volume 1 took 0:00:08
DUMP: Volume 1 transfer rate: 837 kB/s
DUMP: 6700 tape blocks (6.54MB)
DUMP: finished in 8 seconds, throughput 837 kBytes/sec
DUMP: Date of this level 0 dump: Sat May 26 18:38:17 2001
DUMP: Date this dump completed: Sat May 26 18:38:25 2001
DUMP: Average transfer rate: 837 kB/s
DUMP: DUMP IS DONE
------------ finishing backup on Sat May 26 18:38:25 EDT 2001
Running /usr/local/sbin/triggers-sat from the host on which
the scheduler resides should produce the same output. You can use
ssh's -v option and your system logs to help diagnose
problems if it doesn't work the first time.
Check the storage host to verify that the dumps are there:
[root@meili /root]# ls /usr/local/backup/
nildadump-boot-20010526-183817.gz
If the commands work without prompting you for a passphrase, then
your cronjobs will have no trouble running the backups without you.
The rest is tuning and filling out the system -- the hard parts
are over.
Restoring is quick and should be easy2:
[root@nilda test-restore]# ssh meili 'cat \
/usr/local/backup/nildadump-boot-20010526-183817.gz' | zcat | \
ufsrestore fx - kernel.h
set owner/mode for '.'? [yn] n
[root@nilda test-restore]# diff kernel.h /boot/kernel.h
(no differences encountered)
Once you understand how the system works, it is easy to change it
to optimize performance and suit your needs. For example, your storage
server may have a tape autoloader. In that case, your sshbkp-receiver
script would do well to use logger to record each backup received
(as well as tape ids and tape positions, if available) in the system
logs. A tape is a serial device without filenames, and this information
will help you locate backups quickly when it's time to perform
a restore.
Another easy modification would be to use tar instead of
dump. This system is composed of "social software",
tools that have no interest in making life difficult for competing
software. Because the tools are standard UNIX tools that get along
well together, you will find the system is highly adaptable to your
specific needs.
1 My test machine, nilda, is really a Linux machine. dump
and restore work on Linux, but I haven't found them
to be as reliable as their Solaris counterparts, ufsdump
and ufsrestore. In the past, the dump on Linux systems
could not properly handle multi-volume dumps. I generally use tar
on Linux systems, YMMV.
If you decide to go with dump on Linux, use the most current
version. I had better luck with restores after upgrading to the
latest version. See the Web site for the project at Sourceforge:
http://dump.sourceforge.net/
2 OpenSSH version 2.9p1 and possibly ssh2 have a problem in filters
in that they fail to recognize the PIPE signal. If your ssh
has this problem, you will notice that restores look like they are
hanging. That's because ufsrestore or restore
will exit and yet ssh will ignore the subsequent PIPE signal
and keep running.
The OpenSSH maintainers have been notified of the problem and
are discussing the issue. Meanwhile, you can use ps to check
whether the restore process has exited.
Ed Cashin is a UNIX systems administrator at the University
of Georgia, working mostly with Linux and Solaris. Formerly a professional
programmer, Ed still enjoys programming in C, Perl, and even Objective-C.
His favorite things include TeX (Don Knuth's typesetting program)
and FreeBSD's amazing "ports" system for source-based
software maintenance.
|