Building and Using a SAN (Part II)

W. Curtis Preston

In Part I of Building and Using a SAN (Sys Admin, May 2001), I talked about the three main areas of functionality that SANs make possible:

Offline storage consolidation -- Offline storage consolidation is simply another way to say that you can share a tape library via the SAN. Tape drives are dynamically allocated to the servers that need them.
Online storage consolidation -- When you move from discrete disks behind individual servers to putting all of your storage on the SAN, you have consolidated your online storage. This adds quite a bit of manageability and backup and recovery functionality. In the last issue, I only covered the administrative manageability issues. In this issue, I will cover the ways that online storage consolidation make backups easier.
Truly highly available systems -- Although highly available systems are an entirely different discipline, I also wanted to cover how SANs make it easier to build truly highly available systems.

I will continue the discussion about building and using a SAN by examining the ways that online storage consolidation makes backups easier. I will then examine high-availability systems, followed by an overview of what's involved in building a SAN. See Figure 1 for an example of a SAN.

Online Storage Consolidation

Now that we've consolidated all of our online storage onto the SAN, what about backups? There are two very interesting possibilities that online storage consolidation facilitates. Consider the storage array in Figure 1. With proper zoning, it is visible to all three servers. What if you could make a copy of Server-A's data, and make it visible to Server-B? This would allow Server-B to back up Server-A's data without creating any load on Server-A at all. What if you had dozens of servers that you wanted to back up this way? If you could make a copy of each of their data sets available on the SAN, you could have one server dedicated to backing them up. That would allow you to back up terabytes of data without impacting the servers using that data. This is being done today in data centers across the country.

You may be wondering how this is accomplished. Whether you are talking about an enterprise volume manager or a feature built into a storage array, there are two main ways that these copies can be created. The most common way of making these copies is to create a "third mirror" of the data. Via the SAN, an entirely different set of disks is attached as an additional mirror to the primary disks containing the data you wish to back up. Once this mirror is established, the application using the data is put into backup mode, and the mirror is split. Once the mirror is split, the application is taken out of backup mode. Once this is done, you have a completely independent set of disks that contain a copy of your production data set. You can then take as long as you want to back it up.

Another way to create the copy is to use a "snapshot". Instead of making an entirely separate mirror, you use snapshot software to create a "symbolic" mirror. A snapshot is not a physical copy of the data like the split mirror is. To create the snapshot, you put the application that is using the primary disk in backup mode, create the snapshot, and take the application out of backup mode. The snapshot then appears to any server on the SAN as a valid, mountable disk, when it is really a symbolic representation only.

When you create a snapshot, the software or hardware keeps track of what blocks have changed on that device since you created the snapshot, and stores a copy of the "before" images of those blocks, before they are changed. This is referred to as "copy on write" technology, and is just like the snapshots that you can create with Network Appliance filers and the Veritas File System. When your backup application attempts to back up the snapshot, the snapshot application watches which blocks of data it is asking for. If it asks for a block of data that hasn't changed since the snapshot was taken, the data is retrieved from the original disk. If it asks for a block of data that has changed, it retrieves it from the cache disk. This is all invisible to the backup application.

While both of these technologies are available on local servers and disks, it is the SAN that truly brings them to life. Only by putting your storage on the SAN are you able to make the snapshots or split mirrors available for another server to back them up. There is even an emerging technology within the SAN that will allow you to send the data directly from the split mirror or snapshot to a tape drive on the SAN without going through any CPU. This is referred to as serverless backups1, and the technology that makes it possible is called third-party copy.

Truly Highly Available Systems

Another area that has benefited from the advent of SANs has been highly available systems. Because SANs allow for multiple paths from multiple servers to multiple physical devices, it is easier to build a truly highly available system. Consider Server-B and Server-C (Figure 1) -- both of them have an independent path to the right two switches. Each of the right two switches also has an independent path to the two disk arrays. If you used an enterprise volume manager and a high-availability application, you could mirror the two disk arrays and mirror the application on both servers. There would be no single point of failure in that system. This model can be expanded to clusters of dozens of systems, all sharing common functions.

Clustered systems present a unique challenge to backup and recovery systems. Different people take different tactics. The first tactic is to back up both nodes as clients. The upside to this is that everything will be backed up. The downside is that you will back up the shared data twice. (Actually, you will back it up as many times as you have nodes in the cluster.) The other tactic is to back up only the cluster entity. The downside to this is that you might not back up some of the operating system configuration files on the individual nodes. The upside, of course, is that you only back up shared data once. The difficulty to any method of backing up a cluster is if it supports individual applications to fail over. The best example of this is Compaq's Tru64 TruCluster application. It allows each application to failover from one node to another, without affecting other applications. The upside to this is that you can load balance applications across multiple nodes in the cluster. The difficulty lies in knowing which node of the cluster has the Oracle database. Here's what happens:

1. You attempt to back up the entity known as cluster-a.

2. The backup application logs into cluster-a, but in reality, it is actually logged into node-a or node-b.

3. Either node-a or node-b is actually running Oracle. The other node can see the Oracle database files, but cannot put the database in backup mode.

4. Your job is to figure out which node the application is actually running on, and then rsh/ssh to that system to put the database into backup mode. We did this with a wrapper to Compaq's Cluster Application Availability (CAA) application.

As you can see, clusters present a unique challenge to the backup and recovery folks. Although the topic of high availability is a very important one, it is well beyond the scope of this article. I merely wanted to point out that it is the third major area that can benefit from SAN technology, and that it can bring challenges as well.

Building the SAN

Figure 1 depicts a theoretical SAN that contains three servers, three switches, a router, two disk arrays (i.e., "RAID in a box), a high-end disk array, and a tape library. How do you go about building such a beast? The first question is what do you want to accomplish with the SAN? That is why I covered how to use a SAN explaining how to build one. Once you have answered the previous question, you can ask who is going to provide you with the components you will need to build your SAN?

You may wonder which vendors the theoretical SAN in Figure 1 uses. Its components could be from many different vendors, within reason. Any of the functionality discussed earlier in this article (or in Part I) is available from at least three vendors, but not necessarily from all of them2. For example, earlier in this article, I described the concept of a split-mirror backup. Such functionality is available with Compaq, EMC, Hitachi, and IBM, and I'm sure there are other vendors that offer this functionality. The only differences will be how they implement it and how much they cost.

Before considering a SAN, you should have a pretty good idea what you would like it to accomplish. (I hope this article will give you some ideas.) Once you have turned this idea into a plan, you should talk to multiple vendors to see how they can help you to complete your plan and accomplish your goals. For example, several vendors can sell you OEM versions of every component in Figure 1, including the servers. (That is, of course, if you are willing to use their brand of operating system.) Even if you don't want to use their operating system, you could use your preferred operating system and buy everything from the Host Bus Adapters (HBA) to the high-end disk array from them. This usually makes the job easier because the different components will have been tested together.

It's amazing what can happen when you don't have tested components. I can think of one project where saved backplane space by plugging a SAN router directly into the HBA of a large UNIX server. (I will let the router and server manufacturers remain nameless.) Although there shouldn't have been a problem with this configuration, there definitely was. The router was expecting one thing, and the HBA was expecting another. The result was that the UNIX server panicked as soon as it came up. We made it work by hard-coding the AL_PA on the router, and this information was added to the support matrix for the router. I hope you will not be doing what we were doing; your solution will have already been tested in a lab somewhere.

Zoning

A SAN like the one in Figure 1 will probably want to use zoning, so that all devices are not visible to all servers. For example, you may want to make the high-end disk array visible to Server-B and Server-C, but not to Server-A. However, you probably will want the tape library to be visible to all three servers. Zones can help you accomplish this. You will want to talk to the vendors about how they make zoning possible.

Once connected to the SAN, with the zones configured the way you want them, each server can then build the device files necessary to access the disk and tape drives on the SAN. This can be done live with many versions of UNIX and NT. For example, the drvconfig, tapes, and disks commands on Solaris will build the proper device files without having to reboot. HP-UX uses ioscan and insf, and NT can be told to search for new devices from the Control Panel. However, many operating systems try to put a label on every disk they can see. Therefore, you may wish to use zoning to protect disks from other operating systems. For example, don't let your NT box see the disks that you intend to use for Solaris!

A typical installation procedure would then include using native backup commands to test connectivity to each tape and disk drive. For UNIX, test the tape drives with dump, tar, and cpio, and use NTBACKUP for NT. To test connectivity to the disk drives, use dd on UNIX, and disk manager on NT. You may find that all devices do not show up on the first try, and you may need to make modifications to the SAN or HBA configurations to allow all devices to be seen. This step is probably the hardest. Once all devices are visible in all the right places, the rest is down hill. A good VAR or consultant can be of great help during this phase. After this phase is complete, you can install whatever SAN-management software products that you purchased. These products will help you manage the SAN and allocate the resources that it has available. There are a number of products in this arena that are now emerging, and I may revisit these products later with a review.

This brings my series on storage area networks to a close. It's difficult to fit this much information into a few thousand words, but I hope you have found it helpful. My upcoming O'Reilly & Associates book, SAN & NAS Backup & Recovery focuses on the topics that I have covered in the last two articles, and explores the concept of network-attached storage (NAS), and the unique challenges that it brings to the backup and recovery table. Remember -- back it up or give it up!

1 I'm well aware that some call what I describe above as serverless backup. I refer to that as clientless backups, because the data is not going through the backup client that is using the data. It is still going through the backup server. It's a fine line, but I'll walk it for now.

2 That is with the exception of third-party copy. This is an emerging technology that is not provided by all vendors yet.

W. Curtis Preston has specialized in storage for over eight years, and has designed and implemented storage systems for several Fortune 100 companies. He is the owner of Storage Designs, the Webmaster of Backup Central (http://www.backupcentral.com), and the author of two books on storage. He may be reached at curtis@backupcentral.com. (Portions of some articles may be excerpted from Curtis's books.)