Automated Boot Disk Fail Over
George Winter
It's a simple truth -- if the boot disk fails, your system fails. Boot disk mirroring is a way of avoiding this dilemma. Mirroring the boot disk creates, as far as the operating system is concerned, an exact copy of the original boot disk. If the primary boot disk fails, the mirrored disk automatically takes over and system operation is uninterrupted. With this configuration, you might think boot disk failures are a thing of the past. As I shall explain, this is not necessarily the case. In this article, I discuss what can go wrong, how to protect against problems, how to recover when things do go wrong, and more. I cover two types of boot disk configurations, the relative merits of each, and an alternative to mirroring.
This article relates specifically to the Sun Solaris environment, although these principles are applicable to other UNIX variants. This article also assumes that you have a working knowledge of Veritas Volume Manager or Solstice DiskSuite. For detailed configuraton procedures, contact the manufacturer.
I've observed, during my experiences as a systems administrator and system support engineer that elaborate, supposedly resilient configurations often become needlessly complex in the effort to create a bulletproof environment. When properly configured, a boot-disk recovery should be easy and quick. I've also found that simple configurations are the best. They are easier to create, work well, and because of their simplicity, are easier to recover from, regardless of the type of failure.
Configuration Considerations
The first item to consider when configuring your boot disk is the partition scheme. I suggest that the boot disk be configured into two partitions. Slice 0 for the root partition and slice 1 for swap. (The only exception I see to this is if your /var partition tends to increase in size dramatically. In this case, create a third partition for /var.) If the boot disk fails, the partition scheme is easy to figure out -- two partitions. Many times I've had to bring down systems and reconfigure the boot disk because of a full partition. The chances of this happening with a simple partition scheme are greatly reduced. With two partitions, the free space normally available in other partitions is consolidated into the single root partition, thereby reducing the need for reconfigurations. As I will explain later, a simple boot disk partition scheme also simplifies mirroring, unmirroring, copying, backing up, and restoring boot disks.
Nothing should reside on the boot disk except OS related software and packages. This may seem obvious, but I have seen boot disks contain a variety of items, including databases. This means that users other than the systems administrator (root) have been given permission to write to the boot disk. This opens up the possibility that the boot disk can be accidentally filled or damaged and then bring down the system. No type of mirroring scheme will protect you from this. Currently, a 9 Gb disk is considered small. It may be tempting to utilize part of the free space available on boot disk for other tasks -- do not do it!
Most enterprise-class systems are designed with space for two internal disks. This is an ideal situation for mirrored boot disks. With the advent of storage arrays, I see an increasingly large number of boot disks placed on them. It seems like a convenient place to put the boot disk, and no additional hardware need be purchased. In fact, I believe this is the last place the boot disk(s) should be placed. Generally, a tremendous amount of I/O occurs to storage array disks. A boot disk swapping to an array is competing with other applications for storage array resources. This can take an expensive and fast system and bring it to its knees waiting for swap. Placing the boot disk on a storage array also creates additional points of failure (SCSI cards, fiber connections, fiber/SCSI cables, etc.). Thus, my suggestion is to avoid placing boot disks on storage arrays.
The boot disks should instead be located on a separate SCSI bus with as few single points of failure as possible associated with this bus. Little, if any, extra hardware should be located on the boot-disk SCSI bus. Some systems have an internal SCSI bus that is ideal for this situation. Configuring your system this way may cost a bit more, immediately, but will pay off later in system performance and resiliency.
I also suggest, if possible, create the same or similar boot disk configuration for every system that you administer. If the boot disk fails or is destroyed, you do not have to guess or research how the boot disk was configured. This can save time when trying to recover a system -- a point at which time is often most critical.
Mirroring with Veritas
For this discussion, I assume that mirroring the boot disk involves two disks -- the original boot disk, and a second disk to be used as a mirror of the first. For clarity, I will call the original boot disk the primary boot disk and the second disk, the mirrored disk.
Veritas Volume Manager (VM) software is a popular alternative to the conventional file system features of the OS. VM is frequently used to create various types of large (RAID 0, 0+1 or 5) file systems in conjunction with disk farms or storage arrays. During VM installation, if you place all disks under Veritas' control, you will have what VM calls an encapsulated boot disk'. VM assumes that you want the disk configured this way. This means that the VM software has complete control of the boot disk and is ready to mirror it, if possible. Once the boot disk is configured like this, the VM software must be running or you will not be able to boot.
If there is another disk on the same controller as the boot disk, the second disk is considered by VM to be the most likely candidate for mirroring. If you decide to mirror your boot disk, VM will suggest this disk. You do not have to accept this suggestion. You may mirror to another disk or not mirror the boot disk at all.
One of the strict requirements of VM is that there be a disk group called rootdg. Name any disk group rootdg, and this requirement will be satisfied. Rootdg does not have to be defined as the boot disks. However, if you encapsulate the boot disk, VM automatically places it in the disk group rootdg. Because the encapsulated boot disk must be in a disk group, and VM requires that the disk group rootdg exist, it is convenient to place the encapsulated boot disk in rootdg.
I've visited sites that have encapsulated boot disks but don't realize it. Be careful of this. If you decide to encapsulate and mirror the boot disk, make sure you know all of the repercussions of this action. When using VM, consider the following questions:
Do you know exactly what is happening to each disk (primary and mirror) when it is encapsulated/mirrored? With VM, the primary and mirrored disks are treated quite differently. The primary disk will generally retain its partition scheme except that two additional partitions will be added: one for the public region and one for the private region. The mirrored disk is treated very differently. It only has two partitions -- the public and private regions.
To illustrate this point look at the partition scheme for a typical boot disk as it might appear before encapsulation:
Pre-encapsulated boot disk:
Slice Partition name
0 / (root)
1 swap
2 (entire disk)
3 /usr
4 /var
5 /opt
Once the disk is encapsulated, VM creates two additional partitions called the private region and the public region. On an unused disk, VM uses the third and fourth partitions. When an existing disk such as a boot disk is encapsulated, VM attempts to use any available partitions. In our example above, VM would use partitions 6 and 7 because they are the only ones not used. (Note that VM will make its best guess as to which partition it can use. You have no control over which partitions VM will use. If it does not find a free partition, the encapsulation will fail.)
Next, the mirrored boot disk can be created. Assuming the disk was previously unused its partition scheme will be as follows:
Mirrored disk:
Slice Partition name
2 (entire disk)
3 (private region)
4 (public region)
As you can see, on the mirrored disk, VM uses partitions 3 and 4. This is the default behavior of VM. Notice that there are only two partitions on the mirrored boot disk. How can you boot off a disk with just a private region and a public region? VM does this with several tricks including modifying /etc/system so that VM modules are force loaded early in the boot process. In this way, you can boot off the mirrored boot disk even though there are no normal partitions in the traditional sense (/, /usr, /var, etc.). Once this procedure is completed, the system boots fine and can boot off either disk. Keep in mind that this adds a new level of complexity.
What is the procedure for upgrading or patching the VM software? VM can't be stopped once the boot disk is encapsulated (if it's not running, the machine won't boot). However, you can't upgrade the VM software unless you stop it. Catch-22! There are several techniques that can be used to accomplish this, but they take time. My point here is that performing this procedure is not straightforward. Before attempting to upgrade or patch the VM software on a production system, I suggest testing the process on a test machine so that you feel comfortable with the process and can estimate how long it will take. Then, perform this task on your production machine.
What is the procedure for unencapsulating the primary boot disk? Under VM, you can only unencapsulate /, /usr, and /var. For example, if you originally had a separate /opt partition, it cannot be unencapsulated using the VM utility vxunroot. This is, in my opinion, a major drawback with VM. There are procedures for recovering the partitions that existed previous to installing VM, but they are tricky and don't always work. This is not really made clear when first encapsulating the boot disk. If you don't understand VM well, and encapsulate the boot disk, you can be setting yourself up for problems at upgrade time.
What is the procedure for upgrading the OS? Again, this procedure is complicated by the fact that the boot disk is encapsulated. The boot disk must first be unencapsulated. (See previous question.)
What is the procedure for replacing a failed boot disk, and what are the consequences of this replacement? This depends on which disk failed. If the primary boot disk fails, there are several techniques for replacing the disk. The most commonly used technique involves replacing the boot disk as if it were a simple mirror. However, the new primary boot disk does not have the same partition scheme as the disk it replaces. The following illustrates how a sample partition scheme might be changed before and after primary boot disk replacement:
Primary Boot Disk Before Replacement (Notice that the /, swap, /usr, and /var partitions still exist):
Slice Partition Name
0 / (root)
1 swap
2 (entire disk)
3 /usr
4 /var
6 private region
7 public region
One day, the primary boot disk fails and is replaced according to documentated procedures. The partition scheme now looks like this:
Primary Boot Disk After Replacement:
Slice Partition Name
2 (entire disk)
3 private region
4 public region
Yikes! What happened to the rest of the partitions? They were never created by the replacement procedure and are gone! The system now must be backed up and recovered to a different disk, and critical system files must be recovered to their pre-VM state if you want to recreate these partitions. Look back on some of the previous questions. You must unencapsulate the disk to upgrade the OS and VM software. Encapsulation has turned mundane tasks into a lot of work. And, keep in mind that your system must be halted for much of this work.
The replacement procedure for the mirrored boot disk is generally more straightforward. The mirrored boot disk has two partitions and the replacement disk will also have the same two partitions.
What happens if somebody with root permissions accidentally types rm -rf *? Mirroring the primary boot disk does not protect against accidental removal of part or all critical system files. If you modify one of the boot disks, the other is mirrored instantly. If critical system files are accidentally deleted, the system is down and the mirrored boot disk was of no help.
How do you recover an encapsulated boot disk from catastrophic failure using backup tapes? This procedure is complicated by the fact that the disk was originally encapsulated. There are procedures for performing this type of recovery. Before you can boot off the recovered disk, you must boot off an alternative source (CD-ROM), modify certain system files, install a boot block, and reboot -- time-consuming and tricky operations.
Mirroring with Solstice DiskSuite
Solstice DiskSuite (SDS), formerly known as Online DiskSuite, is Sun's version of volume management software. It works on the same basic principles as VM, but the underlying configuration is markedly different. SDS requires the user to manually partition disks, where VM does this automatically. For those of us who are more inclined to keep control over system configurations, this a good thing. I prefer to know and have control over all modifications that occur to my system ,including boot disk partitions.
Preparing the primary boot disk for mirroring under SDS is simple. The disk does not first have to be encapsulated as under VM. There is no direct corollary to encapsulation using SDS. However, there is another manual configuration consideration under SDS. State databases must be created. The state database is used by SDS to keep track of where data is stored related to the SDS configuration. It is easy to configure and once configured, needs little or no maintenance. Note that a quorum of state databases must be maintained at all times. In this way, SDS is always able to keep track of its configuration. I suggest that these databases be kept in a separate partition. In later versions of SDS, state databases can be placed inside SDS-controlled partitions. You will need to keep a number of state databases on disks other than the boot disks. If all of the state databases reside on the boot disks, you will not have a quorum if one disk fails.
Unlike VM, SDS does not need any special considerations related to the pre-mirrored boot disk configuration. If the initial disk has many or uncommon partitions (/usr/sbin, for example), it makes no difference to SDS. Whatever partitions exist on the primary boot disk are created on the mirrored boot disk, partition for partition. This eliminates any restrictions that could be placed on the primary boot-disk partition scheme. Since the primary and mirrored boot disks have the same partition scheme, it's a simple matter to replace a failed boot disk or break the mirror and return the boot disk to its original configuration.
How do you replace a failed boot disk with SDS? Replacement of a failed boot disk is a simple procedure. Essentially the failed disk is physically replaced, partitioned to match the existing boot disk and re-synchronized to the existing boot disk. Because one of the original boot disks exists, there is no guesswork required determining the partition scheme. Simply copy the partition map of the original boot disk.
What is the procedure for upgrading the operating system, SDS, and installing patches? Upgrading the operating system, SDS, or installing patches is a simple procedure and is essentially the same for each task. Unmirror the boot disk, bring it back to its original unmirrored state, and perform the upgrade. After the upgrade is complete, it is again simple to remirror the boot disk. This is a significant advantage when using SDS. Upgrades are an important part of system maintenance, and SDS makes them easy.
Under what conditions can SDS cause system failures? Like VM, SDS does not protect you from accidental deletion of critical system files or missteps.
Can SDS and VM be used together? Yes! Using both of these software packages can create a very powerful yet simple configuration. SDS has a superior method of mirroring the boot disk. VM, because of its popular GUI and simple configuration process for RAID volumes, is commonly used where large file systems need to be created.
If you use VM, you will still need to create a rootdg. There are various methods of doing this. The easiest is to put one disk in rootdg. It is not advisable to use the rootdg disk group for important data as this disk group cannot be exported to another host. Another method is to create a rootdg in a small slice of any disk, usually the boot disk. This fools VM into thinking a rootdg, in the normal sense, actually exists when all that exists is a slice. Check with your vendor regarding this procedure and supportability.
The Physical Copy Alternative
There is a third way to protect your boot disk. This method requires no additional third-party software. Instead, it uses built-in UNIX utilities to make a physical copy of the boot disk. Using the UNIX utility dd(1m), an exact copy of the boot disk is created. If the original boot disk fails, it is removed, and the copied boot disk replaces the failed disk. This operation is simple and takes just a few moments. It does not provide for automatic fail over but it can protect against accidental damage of the boot disk due to human error. Although this method does not replace backing up the boot disk, it does offer boot disk protection in that there is immediately an available copy of the original boot disk, in the event the original boot disk is unusable.
A sample script that utilizes dd is shown in Listing 1. The idea is that slice 2 (the slice that represents the entire disk) is copied. In this way, whatever was on the original disk, including the boot block and partition scheme, is copied to the secondary boot disk. Using this method, it doesn't matter how the disk is partitioned. An exact copy of the boot disk is created. If the original boot disk fails, remove the copied boot disk, place it in the original boot disk location and boot the machine. All software, patches, and modifications that were installed on the original boot disk are on the copied boot disk. Your downtime is limited to the amount of time that it takes to replace the disk. An added feature is that people who are unfamiliar with the system can perform this simple disk switch easily with minimum direction. This script is commonly run as a cron job. The boot disk copy is then automatically synchronized with the original boot disk on a regular basis. Test and modify this script before using it in your environment.
One caveat to using this method is that both the original and copied disks must have the exact same geometry. In most cases, this means that both disks must share the same manufacturer, model number, capacity, and geometry.
Final Thoughts
Encapsulation and mirroring the boot disk are not a replacement for backing it up. The boot disk should always be backed up no matter how it was configured. Before any operation is performed on a boot disk you should make sure that you have a valid backup. Always test any configuration before placing it into a production environment.
The procedures outlined here might differ depending on which version of the OS or software that you are using. Check the man pages and vendor documentation before attempting these procedures. Regardless of which boot disk mirroring procedure you choose, you should have the replacement procedure clearly written and readily available to anyone who might need to perform it on a moment's notice.
There is no question that mirroring the boot disk will create a huge reliability improvement of the boot environment. If a single disk has a MTBF of 200K hours, mirroring this disk will exponentially increase the MTBF of the boot system as a whole. THis type of reliability increase is very attractive. In high availability or Fault Tolerant environments, this type of configuration is essential, but it also comes with a price. Be sure that you know the mirroring software well and understand its capabilities and limitations.
In situations where you absolutely must have a mirrored disk, I suggest that you use SDS. It takes a bit more time to configure, but creates a robust boot disk mirroring solution. If having a copy of the boot disk that is immediately available covers your needs, then I suggest that you make a copy of the boot disk using dd (see Listing 1 for a sample script). This is the simple method, and simple methods have fewer things to go wrong when you need to get your system back up in a hurry.
About the Author
George Winter has worked in the San Francisco Bay area for ten years as a UNIX Systems Administrator, Systems Support Engineer, and a Systems Engineer for several companies. He currently works for Sun Microsystems as a Systems Engineer in the developer.com region. He can be reached at: george.winter@sun.com. |