Cover V10, I09

Listing 1
Listing 2
Listing 3


A Flexible System for Centralized Backup

Ed L. Cashin

Storage capacity is growing, and the increasing centralization of storage capacity is a current trend. Companies are buying storage area networks (SANs) and tape robots. A plethora of software products are available to help us back up diverse clients to centralized storage, but you may already have all the software you need.

The Strategy

What do we need from backup systems? We need reliability, manageability, and performance.

For our systems to be reliable, we must ensure that we will be able to restore from the backups, even if we lose the backup software. One way is to make sure that the backup data is in a "universal" format. In the UNIX world, tar and dump are ubiquitous tools and are excellent choices for storage formats.

If we have a lot of machines, all these tar and dump backups might be cumbersome. To simplify backup management, we will control all of the backups from a central location. The backups will go to central storage over a TCP/IP network.

By centralizing control of the backups, using universal backup formats, and a TCP/IP network, we eliminate the dependency between data and the hardware on which it resides. The ability to restore any host's backed-up data to any host with a network card is a boon for the systems administrator, who may take advantage of the extra flexibility to reallocate hardware resources.

This article describes a conceptual framework for building open backup systems with readily available software. It also presents a concrete example of a trivial, high-performance implementation of the framework -- one client host sending backups to a server host connected to a SAN.

The Framework

The backup system comprises four roles: a scheduler, triggers, senders, and receivers:

  • Scheduler -- The schedule of what backups occur and when they occur is maintained in a central location. The scheduler may be anything, even a real human being initiating backups. More likely, it will be a crontab containing entries for scripts. The scripts will contain a sequence of triggers.
  • Triggers -- Triggers are found on the host where the scheduler resides. When the scheduler calls a trigger, the trigger connects to the backup client and starts the client's sender. See Listing 1.
  • Senders -- Senders run on the backup clients. When started by a trigger on the scheduler host, a sender begins backing up the client, thus sending all the data to a receiver on the storage host. See Listing 2.
  • Receivers -- A receiver script directs incoming backups to the storage and records information about the incoming backup on the host where the central storage resides. See Listing 3.
The Example

In this example, "meili" is a host with a lot of storage capacity, perhaps through a SAN, mounted onto the /usr/local/backup directory. Meili will also host the scheduler and triggers.

The example client is "nilda", although many similar clients could exist. Assume that nilda is a Solaris machine with mature ufsdump and ufsrestore utilities1.

Both hosts have OpenSSH installed, version 2.9p1. OpenSSH version 2.9p1 can do the ssh version 2 protocol. If you are using a different version of ssh, you may have to use that version's ssh protocol or one of the equivalents of the methods in this example.

The secure shell is a way of connecting jobs over the network. In this example, the ssh 2 protocol provides a level of security that would not be possible with the more traditional (and naive) rsh tools. The ssh protocol provides sophisticated protection against eavesdropping, password sniffing, and connection hijacking. For more information on the ssh protocols, refer to the RFC by Tatu Ylonen in the ssh source distribution.

Normally with ssh, you must enter a password. Even when using public key authentication instead of regular password-based authentication, you still usually enter a passphrase at some point to decrypt your private key. But if we want to use ssh for unattended jobs (such as nightly backups), we have to do something new -- task-based authentication, where each public/private key pair is associated with a specific inter-host task. That's shown in the example below:

Root's crontab on meili implements the scheduler:

[root@meili ecashin]# crontab -l
# -------- start the triggers at 6PM every night
0 18 * * 0 /usr/local/sbin/triggers-sun | /usr/lib/sendmail -t
0 18 * * 1 /usr/local/sbin/triggers-mon | /usr/lib/sendmail -t
0 18 * * 2 /usr/local/sbin/triggers-tue | /usr/lib/sendmail -t
0 18 * * 3 /usr/local/sbin/triggers-wed | /usr/lib/sendmail -t
0 18 * * 4 /usr/local/sbin/triggers-thu | /usr/lib/sendmail -t
0 18 * * 5 /usr/local/sbin/triggers-fri | /usr/lib/sendmail -t
0 18 * * 6 /usr/local/sbin/triggers-sat | /usr/lib/sendmail -t
Each trigger script runs the full backups first and then any incrementals that have to be done (see Listing 1).

Notice that the ssh command uses a special identity file called trigger-full. This is a key generated on the host where the scheduler lives. We then configure each client's authorized_keys2 files so that the clients recognize this key as the trigger for a full backup.

Here is how the trigger key is generated on the scheduler host:

[root@meili /root]# ssh-keygen -t dsa -C trigger-full -f ~/.ssh/trigger-full
Generating public/private dsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/trigger-full.
Your public key has been saved in /root/.ssh/
The key fingerprint is:
d4:6c:56:82:68:86:3b:3b:d1:ef:5f:ff:90:d2:16:0e trigger-full
We hit enter when asked for a passphrase. Normally, this would be a big mistake, since any user with the ability to read the key files could execute arbitrary commands on any remote host that recognized the keys. However, it is important that a passphrase is not required to use the key, because we want these jobs to run while we relax on the beach. The difference is that we make sure that each password-free key can run only one specific command on the remote host, and that association is configured on the remote host, in the authorized_keys2 file. We'll do that configuration shortly.

This method is more secure than keeping a password in a local file, like an expect script. If the password in the local file is discovered by an unauthorized person, then that person gains full access to the remote host. With task-based authentication, however, all the person gains is the ability to run the remote half of a backup job --that's not fantastic, but it's not disastrous either, and it's not really what most attackers are looking for.

The following command installs the new key in the client's authorized_keys2 file, adding the from and command options before the key:

[root@meili /root]# printf \
  'from="",command="/usr/local/sbin/sender-full" \
  %s\n' "'cat ~/.ssh/'" | ssh nilda 'cat >> \
  ~/.ssh/authorized_keys2''s password:
Then the authorized_keys2 file on the client, nilda, looks like this:

[root@nilda /root]# cat ~/.ssh/authorized_keys2
from="",command="/usr/local/sbin/sender-full" ssh-dss 
JuZOH63c4Yqv+Q5SUG9xEZLnlAw== trigger-full
Make sure that the authorized keys file is only writable by the owner or ssh will not trust it. Nilda is now ready to recognize full backup triggers from meili (the host on which the scheduler resides).

The sshd manpage describes the options that are available for the authorized_keys files. We use from to limit the hosts that can use the key. We then use the command option to limit the use of this key to a specific command. There are other options available to further limit functionality.

By using these special options in the client's authorized_keys2 file, we are setting up task-specific authentication. This kind of authentication is much more secure than the old .rhosts file way of getting unattended machines to trust one another, but it has much of the same convenience. Make sure you keep the keys in places where only root can read them so that no one can trigger spurious backups on the clients or cause other problems.

The storage host, to which the client will send the backup, must recognize the clients when they have backups to send, so it's necessary to generate a key for "senders". The method is similar to the key generation performed on the storage server, again using empty passphrases:

[root@nilda ecashin]# ssh-keygen -t dsa -C sender -f ~/.ssh/sender
Generating public/private dsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/sender.
Your public key has been saved in /root/.ssh/
The key fingerprint is:
5b:85:4d:77:5d:2c:68:d1:fb:68:86:25:a6:9b:e4:6f sender
Installing the public key on the storage server should look familiar:

[root@nilda ecashin]# printf 'from="",command="/usr/local/sbin/sshbkp-receiver" \
  %s\n' "'cat ~/.ssh/'" | ssh meili 'cat >> \
root@meili's password:
As long as the permissions on the authorized_keys2 file are 0640 or something similarly restrictive, the storage host, meili, will now run the receiver script every time it recognizes this sender key.

See Listing 2 for the sender script on nilda, the client.

The sender on nilda sends the backup to the receiver on meili (see Listing 3).

Running the scripts manually helps to show whether things are ready to go:

[root@nilda ecashin]# /usr/local/sbin/sender-full
Subject: nilda sender-full

------------ /usr/local/sbin/sender-full
------------ starting backup on Sat May 26 18:38:17 EDT 2001
------------ storage host: meili
------------ backing up device: /dev/sda1 mount point: /boot
DUMP: Date of this level 0 dump: Sat May 26 18:38:17 2001
DUMP: Dumping /dev/sda1 (/boot) to standard output
DUMP: Label: none
DUMP: mapping (Pass I) [regular files]
DUMP: mapping (Pass II) [directories]
DUMP: estimated 6711 tape blocks.
DUMP: Volume 1 started with block 1 at: Sat May 26 18:38:17 2001
DUMP: dumping (Pass III) [directories]
DUMP: dumping (Pass IV) [regular files]
DUMP: Volume 1 completed at: Sat May 26 18:38:25 2001
DUMP: Volume 1 6700 tape blocks (6.54MB)
DUMP: Volume 1 took 0:00:08
DUMP: Volume 1 transfer rate: 837 kB/s
DUMP: 6700 tape blocks (6.54MB)
DUMP: finished in 8 seconds, throughput 837 kBytes/sec
DUMP: Date of this level 0 dump: Sat May 26 18:38:17 2001
DUMP: Date this dump completed:  Sat May 26 18:38:25 2001
DUMP: Average transfer rate: 837 kB/s
------------ finishing backup on  Sat May 26 18:38:25 EDT 2001
Running /usr/local/sbin/triggers-sat from the host on which the scheduler resides should produce the same output. You can use ssh's -v option and your system logs to help diagnose problems if it doesn't work the first time.

Check the storage host to verify that the dumps are there:

[root@meili /root]# ls /usr/local/backup/
If the commands work without prompting you for a passphrase, then your cronjobs will have no trouble running the backups without you. The rest is tuning and filling out the system -- the hard parts are over.

Restoring is quick and should be easy2:

[root@nilda test-restore]# ssh meili 'cat \
  /usr/local/backup/nildadump-boot-20010526-183817.gz' | zcat | \
  ufsrestore fx - kernel.h
set owner/mode for '.'? [yn] n
[root@nilda test-restore]# diff kernel.h /boot/kernel.h
(no differences encountered)
Once you understand how the system works, it is easy to change it to optimize performance and suit your needs. For example, your storage server may have a tape autoloader. In that case, your sshbkp-receiver script would do well to use logger to record each backup received (as well as tape ids and tape positions, if available) in the system logs. A tape is a serial device without filenames, and this information will help you locate backups quickly when it's time to perform a restore.

Another easy modification would be to use tar instead of dump. This system is composed of "social software", tools that have no interest in making life difficult for competing software. Because the tools are standard UNIX tools that get along well together, you will find the system is highly adaptable to your specific needs.

1 My test machine, nilda, is really a Linux machine. dump and restore work on Linux, but I haven't found them to be as reliable as their Solaris counterparts, ufsdump and ufsrestore. In the past, the dump on Linux systems could not properly handle multi-volume dumps. I generally use tar on Linux systems, YMMV.

If you decide to go with dump on Linux, use the most current version. I had better luck with restores after upgrading to the latest version. See the Web site for the project at Sourceforge:

2 OpenSSH version 2.9p1 and possibly ssh2 have a problem in filters in that they fail to recognize the PIPE signal. If your ssh has this problem, you will notice that restores look like they are hanging. That's because ufsrestore or restore will exit and yet ssh will ignore the subsequent PIPE signal and keep running.

The OpenSSH maintainers have been notified of the problem and are discussing the issue. Meanwhile, you can use ps to check whether the restore process has exited.

Ed Cashin is a UNIX systems administrator at the University of Georgia, working mostly with Linux and Solaris. Formerly a professional programmer, Ed still enjoys programming in C, Perl, and even Objective-C. His favorite things include TeX (Don Knuth's typesetting program) and FreeBSD's amazing "ports" system for source-based software maintenance.