Designing for Disaster
Chad Barker and Mark Frisbey
In today's competitive world, the time to recover from a disaster is of paramount importance. Downtime converts directly to lost revenue. Too much lost revenue, and the recovery process may be futile if the organization's financial resiliency is exceeded by the loss. Thus, it is essential that computing environments be designed with disaster recovery in mind. The extra effort invested in solid system design can make the difference between the system administration group's having a bad day, and the entire company's employees looking for new jobs. This article explores such a disaster-minded design effort. Although our specifics relate to an HP-UX environment, the basic issues apply to any OS.
Downtime can involve three components: 1) replacement of hardware, 2) reconstruction of the disk layout, and 3) restoration of the data. The speed of data recovery is a function of the speed of the recovery system and the size of the files. Filesystems with many small files take longer to recover than filesystems with large files. The time to recover also depends on whether the recovery system is local or remote. If it is remote, the amount of network traffic will affect restoration time significantly. The time to get from the hardware replacement to the restoration of data is determined by the complexity of the I/O subsystem layout. The old adage "time is money" becomes a reality.
Cheaper and larger disk drives coupled with logical volumes and multiple I/O channels have increased the intricacy of machines dramatically. Decisions are made to optimize the I/O performance with the possible disadvantage of increasing the time to restore the data. Administrators have typically used either SAM or the command-line interface. SAM on earlier versions of the OS did not provide the flexibility some machines needed. It is also very clumsy on an ASCII terminal. The command-line syntax could be lengthy, complex, and extremely susceptible to typos (some of us were not hired for our typing skills).
The machine on which our disaster recovery programs were developed has 52 disks, 7 volume groups, 31 logical volumes, and 6 I/O interfaces. We built the machine using the command-line syntax, which took about four hours. When we tested the programs, that time was reduced to about 20 minutes. One factor that affected the time was whether we used hfs or vxfs filesystems. vxfs filesystems build much more quickly than hfs.
The driving force to develop a simpler approach was the increased complexity of the activity. There were a myriad of things that could be incorporated into such a project, but we focused on a small subset to get started. We originally thought to include volume groups that contained parts of the OS, but if we allowed the user to remove a volume group and it contained a portion of the OS, that could create a new set of problems. So we decided to not pursue that avenue. We thought tying into a backup system would provide a great addition. If we could automate the rebuilding of the volume groups and the restore, it would reduce the recovery time and ease the burden on the administrator because he or she would not have to babysit the entire process. With any tape media, there is always the possibility of a tape failure. When that happens, there is not much you can do except go back to another tape and possibly lose some data.
We refer here to the person using the program as the user. Note that if you are upgrading from HP-UX 10.X to 11.00 and use this program, you will have to edit the volumegroup.devs files, because the device addresses will be different on HP-UX 11.00. Some of the underlying factors that we considered and addressed in the design phase are listed below:
1. Simple to use interface so an individual unfamiliar with the machine could rebuild it
2. Program must be able to run on an ascii terminal
3. Allow the administrator to create configurations from a template
4. Allow the program to generate the configuration files on an existing machine
5. Support building and destruction at the volume group level
6. Allow the building and destruction of all volume groups (non-OS)
7. Allow for extra swap spaces
8. Allow for extra crash dump spaces (11.0 only)
9. Tie into a backup system (local and remote)
The directory structures associated with the program are:
data - This directory contains the configuration files for the machine.
temp - This is the tmp directory.
files - This directory contains the base files and setup files.
logs - This is where the logs are stored.
There are many files associated with the program, and they are located in the README file in the files directory. The main files associated with the program and their functions are:
AutoConf - Auto-generates the configuration files
bld.dump - Builds the extra crash dump spaces
bld.sys - Builds the logical volumes
build - Builds the volume groups
confVG - Used to create new volume group or edit and existing one
get_dump.info - Gets the dump info from an existing machine
restmon - Monitors the restore of data returning the percent complete and the approximate time to complete the restore
restoreVG - Restores the volume group
rmlv - Removes logical volumes
rmvg - Removes the volume group
restmenu - The menu system for the restore
setup - The main program
swaprm - Removes extra swap spaces
The best way to describe what the program does is to walk through the various options. The submenus will not be covered here, but those are self-explanatory.
1. Configure the Volume Groups Information.
2. Auto configure information for volume groups already on the machine.
3. Build all the Volume Groups.
4. Build a selection of the Volume Groups.
5. Remove all the Volume Groups.
6. Remove a selection of the Volume Groups.
7. Restore data for logical volumes.
This option allows the user to create a new volume group or modify an existing volume group's information. The program will allow the user to edit the basic templates for the volume group device file and the logical volume file (volumegroup.devs and volumegroup.info, respectively). If there is extra crash dump space needed, the process is manual at this time. The user will have to copy the dump.info from the files directory to the data directory and edit the file. Note that if the crash space is in an existing volume group, care should be taken because the crash dump space must be contiguous. For this reason, if a crash dump space is configured, the crash dump space will be done first to allow for turning off access to the defined disk. Upon completion of adding the dump space, the access is turned on. Listing 1 shows the format of the configuration files.
This option allows the user to build the configuration files on an existing machine. All volume groups will be recorded. The OS volume groups are stored in the data/OS directory and the rest will be in the data directory. If there is a crash dump space, it will create all the correct entries in the dump.info file (11.00 only). RAID systems using alternate links are handled by separating the primary devices into the volumegroup.devs file. The alternate links will be in the volume-groupname.alt. This prevents the program from doing a pvcreate on the alternate link and causing the controller to fail over. If there are logical volumes defined that are not in the /etc/fstab file, they are stored in a file called lv.ex.
Options 3 and 4
These options allow the user to build all of the volume groups or a subset of volume groups. The rebuild is based on the configuration files for each of the volume groups. The entries for the /etc/fstab are generated as the logical volume is built. If the user has edited the logical volume configuration file and has defined more space than exists, the program will not make the logical volumes that are beyond the disk boundaries of the volume group. If the info file has been edited, and the number of drives to stripe on is less than the number of disks in the volume group, the program will inform you of this error and brings up a vi session to correct the problem. The user would then remove the volume group and remake it.
Options 5 and 6
These options allow the user to remove all of the volume groups or remove a subset of volume groups. If the volume group contains a secondary swap space, the user will be asked if the machine can be rebooted at this time. If the reboot is chosen, the entry in the /etc/fstab file will be removed and the machine rebooted. If the reboot cannot happen, then the program will not perform the task and return to the menu. Upon reboot the program will have to be restarted, and it will continue with the removal process.
This option allows the user to restore a volume group(s) from tape. At this point, the only backup system supported is standard HP-UX dump/restore, vxdump/vxrestore and IBM's ADSM. In 11.00, the -s option for vxrestore does not work. Extra time is needed to rewind and fast-forward the tape. The limitation is that dump fits on one tape, and bdf is stored in /df.out. This file must be in place before starting the restore. If the user has edited the volumegroup.info files and moved filesystems from one volume group to another, the program will prompt the user that the filesystem does not exist in the /df.out file. It will then bring up a vi session, and the user can make the necessary changes. If no changes are made to the /df.out file, the logical volume in question will not be restored, and an entry in the error log will be made. As with any restore, there is no way to tell how long it will take. We designed a program to show the percent complete compared to the entry in the /df.out file, the current transfer rate, and the approximate time to complete the restore. The time to complete the restore of each logical volume is stored in the log file. Multiple tapes could be used with some modification.
Our program was originally designed and built on a K450 with 128 Gb of disks and six I/O channels. Included in the disk space are two RAID systems. The disk sizes varied from 4.2 Gb to 4.55 Gb and the rpm of the disks were 7200 and 10000. The program was tested extensively on this system. For further testing, it was moved to a C160 with 14 Gb of disk space. The initial setup had parts of the OS in both volume groups. The auto-configure was run, and the configuration files were modified to reflect all of the OS that was contained in the vg00 volume group. We then installed 11.00, restored the configuration directory, and built the system. The machine had one non-OS volume group with six logical volumes striped on four disk drives. The time to recover from the building of the volume group and logical volumes to the completion of the restoration of 5.2 Gb of data was 33 minutes using a local tape drive. We then removed the volume group and restarted the rebuild using a remote 20 Gb, 8-mm tape drive.
The time to recover was about two hours and fifty-nine minutes. Another test run was performed in which we changed the volume group to non-striped drives. The striped drive configuration restored about 11% faster. It is apparent that a local subsystem will recover faster than a remote one. The local restore ran five times faster than the remote restore and three times faster than the ADSM restore.
This program may require some tailoring to work in every custom setup, but it will provide the means to recover most system configurations. The programs can be easily modified to fit your environment. The future enhancements we are considering include the ability to drill down to the single logical volume level for removal and building and adding different backup systems into the restore. A priority system will be devised to handle the order in which logical volumes are made when they are mount points within another filesystem.
If a filesystem has a mount point in another filesystem, the order in which the logical volumes are made is important, otherwise there will be an error during the mounting of the filesystem. This is especially true if the logical volumes are in different volume groups.
Warnings and Notes
If you have filesystem that mounts in other filesystems, make sure they are done in the correct order. The program will produce an error if they are done in the wrong order.
If you have parts of the OS in different volume groups, make sure that you include the volume groups in the OS pattern in the setup.env file.
This program may not work with some specific configurations. It has been tested on several different machines with different configurations and it seems to work well. Before using this program, however, the user should test the program in the environment to ensure its reliability prior to using it in production. n