Backups for Large UNIX Installations
Victor Hazlewood and Chris Daharsh
In today's rapidly changing environments of large enterprise intranets, World Wide Web, news, email, DNS, file and NFS servers, computer system backups could possibly be the single most important, yet most often neglected, job of system administrators. For the system administrator working at a large UNIX installation, the task of providing a comprehensive backup plan can be quite daunting. With services provided by hundreds of desktop systems, tens of mid-range servers, and a handful of large data and computational servers, the development of a comprehensive, robust, and reliable backup plan in a large heterogenous UNIX environment is quite a challenge.
The challenges involved in developing a large computing center's backup plan involves (but is not limited to):
a large heterogeneous UNIX environment
large amounts of data (10-100 GBs per day and more)
large number of backup files per month
file system sizes up to 200 GBs
network and system backup scheduling issues
offsite backup repository
In this article, the design and implementation of backups for the UNIX Systems at the San Diego Supercomputer Center (SDSC) will be presented. Figure 1 shows an overview of the resources at SDSC. The backup system implemented at SDSC is called rbackup. rbackup employs the use of standard UNIX utilities to schedule, monitor, and create file system backups for subsequent transfer to SDSC's archival system, which is based on IBM's High Performance Storage System (HPSS). For more information on HPSS see http://www.sdsc.edu/hpss.
Overview of the UNIX Environment
SDSC's computing environment includes more than 200 production UNIX systems. Approximately 190 of these systems are desktop systems running SunOS, Sun Solaris, Digital UNIX, or SGI Irix. These systems are managed centrally using a reference system and cfengine. A reference system allows a single copy of each UNIX operating system to be maintained and installed on new systems. cfengine is a language-based tool specifically designed for configuring and maintaining BSD and System V-like operating systems attached to a TCP/IP network. The combination of these two administration tools allows a small group of administrators to maintain a large installation of desktop systems.
Additionally, a large Auspex file server delivers NFS service of home directories, third-party and application software, and scratch space to these desktops. An added benefit of this model (reference system, cfengine and centralized file server) is the elimination of the need to backup local disks on the desktop systems. The reference system and cfengine can restore a damaged desktop system by reinstalling it as new. This significantly reduces the number of systems requiring backup to a more manageable number.
Ten of the 200 UNIX systems provide key services or are independent systems requiring backup of specific local disks. The Cray systems are examples of independent systems requiring local disk backups. These systems provide the following services: CRAY parallel compute server, CRAY vector compute server, Digital compute server, Sun Enterprise compute server, Auspex File server, reference system server, administration (DB) server, news server, mail server, and DNS server.
Due to the dynamic nature of the computing environment at SDSC, the backup strategy was designed to be flexible enough to handle a diverse set of hetereogeneous systems as they arrive. As systems arrive, their backup requirements would be determined by number of file systems delivered and the size of those file systems. The mission of the new system, its networking capabilities, and organization of file systems and their sizes would strongly influence the new system's placement in the networking topology. SDSC already provides users with an archival system for long-term storage of dataset and it made sense to leverage those resources for use as the backup repository. Other sites would have to develop a similar capability from existing or new systems.
System backups at a large UNIX installation like SDSC is often one of the single largest consumers for network bandwidth. With large file and space requirements, the design of the network between the servers (file, data, and compute) and the backup repository plays a critical part in the performance and reliability of the implementation.
During the evolution of SDSC, the local networking infrastructure has evolved to include 10 and 100 Mbit/s Ethernet segments, FDDI rings, FDDI-based Gigaring (switch-based FDDI), private data network segments, and High-Performance Parallel Interface (HIPPI) networks. As it relates to the High Performance Storage System (HPSS), each of the 10 data and compute servers employs networking technologies that best suit the data transfer requirements expected from the individual servers and the capabilities of the system. See Figure 2 for a graphical overview of the server networking topology. The Cray compute servers use HIPPI (with FDDI as fail-over) networks for the expected large-size datasets. The other compute servers and the large Auspex file server use FDDI networks for the large number of files with their small to mid-size requirements. The remaining servers have low volume and low space requirements and use 10- or 100-based Ethernet networks, which are routed to the FDDI data network for delivery to HPSS. See Table 1 for amounts of data transferred from the server systems to HPSS via the rbackup system. (The Auspex could benefit from HIPPI capability to HPSS, but it is not available).
Local requirement (HPSS)
SDSC has made a large investment in an archival system for the storage of user data for more than five years. In June 1997, SDSC installed the HPSS as its production archival system. HPSS has delivered high degrees of scalability, flexibility, and data transfer capability. In testing, HPSS was able to accept and manage over 4 TB of data in 72 hours from several systems employing Ethernet (routed), FDDI, and HIPPI networking technologies. In production today, HPSS averages over 350 Gb of data transferred with peaks of greater than 1 TB per day.
SDSC's implementation of HPSS makes extensive use of class of service, a feature designed to categorize devices and files based on differing characteristics. These characteristics include device speed and file size. For example, the automatic class of service feature directs large files to high-speed devices and small files to slower devices. At SDSC, the class of service is chosen automatically, but can be overridden by the user. rbackup sets the class of service (setcos 20) for file transfers to HPSS.
HPSS uses client/server technologies to transfer datasets from client systems to the HPSS server. The interface most commonly used by users and the method employed by the rbackup system is the pftp, or parallel ftp, program. Users and the rbackup scripts use pftp in the very same way that ftp is used to transfer files between UNIX systems.
Any backup plan developed for SDSC must be implemented around HPSS. Other options exist, but would require significant investment in additional hardware and software to implement. No commercially available backup software currently exists that would automate the backup process to HPSS from the heterogeneous systems at SDSC. Therefore, SDSC developed the tools to implement a backup system around HPSS. The tools include:
a Perl script run by cron installed on each system called rbackup.all
a mail filter to process log messages delivered by mail
HTML and cgi-bin files used to monitor the backups
It was highly desirable to develop the backup implementation using standard UNIX utilities for portability. With the dynamic nature of SDSC, new systems arrive and are expected to be placed in production as soon as possible. Using the standard UNIX utilities, dump and restore (and their vendor-specific relatives), within a Perl script run from cron, the rbackup system can be easily ported to any UNIX system. Restores from backups implemented with the rbackup system can be implemented in a single line command, such as, pftp get backupfile | restore -rf, or in automated scripts run by the operators.
Efficient use of the individual computer systems, network resources available to those systems, and HPSS resources can be obtained with rbackup through careful scheduling. Scheduling with rbackup is made up of two parts. These include the time of backup initiation for each system and the number of concurrent processes on each system, each of which can be tuned to deliver the shortest overall backup time for each system.
An overall backup schedule was developed and is monitored regularly to reduce, as much as possible, the contention for network and HPSS resources between systems. Table 1 shows the rbackup start times for each system and the average length of backup time for the period July 20-26. These times were summarized from the detailed information provided by the rbackup Web pages (described later).
To demonstrate the volume of data transferred from each server, statistics were compiled for backups transferred to HPSS for the period July 20-26. The statistics are presented in Table 1 and give a good indication of the volume of data delivered by rbackup to HPSS. This information can provide clues to network infrastructure weaknesses. One example of using the statistics provided by the rbackup system points to the large amount of data orginating from the Auspex NFS file server. The volume of data from this server is greater than the backup volume from the Cray vector compute server and could benefit from employing a HIPPI network from the Auspex to the HPSS system. Unfortunately, at this time Auspex does not have a HIPPI adapter available for their large file server line. Another example shown by Table 1 was that the Cray Parallel Compute Server conflicted often with four of the 10 backups that start and end between 0:00 and 5:00. We found that the Cray Parallel Compute Server was backing up nearly all of its file systems twice. The installation of the rbackup system was not completed properly and led to the unnecessary backup of all file systems (scratch areas are normally not backed up). Additionally, the Cray Parallel Compute Server was used as an HPSS testbed, and all critical file systems were sent daily to HPSS from another script (not rbackup.all) to place a load on HPSS before becoming a production system. Once these two problems were addressed, the conflicts were significantly reduced.
For the individual systems, rbackup has a unique feature that allows the administrator to select the number of concurrent dump processes to run (see the variable max_children in the rbackup.all script). By adjusting this value, the administrator can select the number of simultaneous dump processes that can be efficiently handled by the individual system. By adjusting this value and using network performance tools, like ax_perfmon and ax_perfhist, which are available on Auspex file servers, one can adjust max_children to maximize the data throughput from the target system to HPSS. See Figure 3 for example charts from ax_perfmon from July 20-26.
The rbackup.all script sends status information to a mail alias that includes a list of recipients and a pipe to a mail filter. The Web-based pages shown in Figures 4, 5, and 6 are generated by the combination of the mail filter and cgi-bin scripts, and show start times, end times, bytes transferred, and number of failed processes for backups initiated on individual systems. This information can be used to monitor backup failures, backup cycle time, and insight into adjusting start times for overlapping or conflicting backups.
Information available from the rbackup Web pages includes; the top level form (Figure 4), an overall summary (not shown), the most recent level 0, 3, and 6 backups for individual systems Figure 5, and individual system reports for all backups initiated by rbackup since inception Figure 6.
The rbackup implementation comprises the rbackup.all perl5 script and a local configuration file (localinfo.pl) run by cron, a mail filter, an HTML script, and a cgi-bin script (see Listings 1-3 on the Sys Admin Web site: http://www.samag.com). If the rbackup.all script fails on any file system, an operator can restart the backups by using the rbackup.rerun script (not shown). Future enhancements of the rbackup scripts include: the combination of the rbackup.all and rbackup.rerun scripts, reporting via the Web to the file system level (currently reporting at the system level), and the conversion to a centralized scheduling and execution implementation.
The mail filter is implemented by having rbackup.all send email to the backupcheckr mail alias. The alias should point to the mail filter from the aliases file as follows:
backupcheckr: "|/backups/bin/rbackup.filter", backupmgrs
The rbackup package is reproduced below.
rbackup.all (rbackup.all Perl script)
Mail filter for backupcheckr alias (rbackup mail filter Perl script)
rbackup.html (rbackup.html source code)
filter.cgi (cgi-bin source)
rbackup provides SDSC with solutions to six of the seven backup challenges that system administrators face at large UNIX installations. All challenges except the offsite backup repository are being addressed by the rbackup system. The offsite backup repository will be addressed by the upcoming development and implementation of tape families in the HPSS system. Offsite backups notwithstanding, rbackup provides a portable backup solution with scheduling and monitoring features that make performing backups on a diverse set of UNIX systems an easy task.
This work was funded in part by the National Science Foundation Cooperative Agreement ASC-8902825. All brand and product names are trademarks or registered trademarks of their respective holders.
About the Author
Victor Hazlewood is the lead of the HPC Systems group at the San Diego Supercomputer Center (SDSC) and has over 10 years of experience administering UNIX systems.
Chris Daharsh is the lead of the Server Systems group at SDSC and has been working with UNIX for over 5 years.