Building
and Using a SAN (Part II)
W. Curtis Preston
In Part I of Building and Using a SAN (Sys Admin, May 2001),
I talked about the three main areas of functionality that SANs make
possible:
- Offline storage consolidation -- Offline storage consolidation
is simply another way to say that you can share a tape library
via the SAN. Tape drives are dynamically allocated to the servers
that need them.
- Online storage consolidation -- When you move from discrete
disks behind individual servers to putting all of your storage
on the SAN, you have consolidated your online storage. This adds
quite a bit of manageability and backup and recovery functionality.
In the last issue, I only covered the administrative manageability
issues. In this issue, I will cover the ways that online storage
consolidation make backups easier.
- Truly highly available systems -- Although highly available
systems are an entirely different discipline, I also wanted to
cover how SANs make it easier to build truly highly available
systems.
I will continue the discussion about building and using a SAN by examining
the ways that online storage consolidation makes backups easier. I
will then examine high-availability systems, followed by an overview
of what's involved in building a SAN. See Figure 1 for an example
of a SAN.
Online Storage Consolidation
Now that we've consolidated all of our online storage onto
the SAN, what about backups? There are two very interesting possibilities
that online storage consolidation facilitates. Consider the storage
array in Figure 1. With proper zoning, it is visible to all three
servers. What if you could make a copy of Server-A's data,
and make it visible to Server-B? This would allow Server-B to back
up Server-A's data without creating any load on Server-A at
all. What if you had dozens of servers that you wanted to back up
this way? If you could make a copy of each of their data sets available
on the SAN, you could have one server dedicated to backing them
up. That would allow you to back up terabytes of data without impacting
the servers using that data. This is being done today in data centers
across the country.
You may be wondering how this is accomplished. Whether you are
talking about an enterprise volume manager or a feature built into
a storage array, there are two main ways that these copies can be
created. The most common way of making these copies is to create
a "third mirror" of the data. Via the SAN, an entirely
different set of disks is attached as an additional mirror to the
primary disks containing the data you wish to back up. Once this
mirror is established, the application using the data is put into
backup mode, and the mirror is split. Once the mirror is split,
the application is taken out of backup mode. Once this is done,
you have a completely independent set of disks that contain a copy
of your production data set. You can then take as long as you want
to back it up.
Another way to create the copy is to use a "snapshot".
Instead of making an entirely separate mirror, you use snapshot
software to create a "symbolic" mirror. A snapshot is
not a physical copy of the data like the split mirror is. To create
the snapshot, you put the application that is using the primary
disk in backup mode, create the snapshot, and take the application
out of backup mode. The snapshot then appears to any server on the
SAN as a valid, mountable disk, when it is really a symbolic representation
only.
When you create a snapshot, the software or hardware keeps track
of what blocks have changed on that device since you created the
snapshot, and stores a copy of the "before" images of
those blocks, before they are changed. This is referred to as "copy
on write" technology, and is just like the snapshots that you
can create with Network Appliance filers and the Veritas File System.
When your backup application attempts to back up the snapshot, the
snapshot application watches which blocks of data it is asking for.
If it asks for a block of data that hasn't changed since the
snapshot was taken, the data is retrieved from the original disk.
If it asks for a block of data that has changed, it retrieves it
from the cache disk. This is all invisible to the backup application.
While both of these technologies are available on local servers
and disks, it is the SAN that truly brings them to life. Only by
putting your storage on the SAN are you able to make the snapshots
or split mirrors available for another server to back them up. There
is even an emerging technology within the SAN that will allow you
to send the data directly from the split mirror or snapshot to a
tape drive on the SAN without going through any CPU. This is referred
to as serverless backups1, and the technology that makes it possible
is called third-party copy.
Truly Highly Available Systems
Another area that has benefited from the advent of SANs has been
highly available systems. Because SANs allow for multiple paths
from multiple servers to multiple physical devices, it is easier
to build a truly highly available system. Consider Server-B and
Server-C (Figure 1) -- both of them have an independent path
to the right two switches. Each of the right two switches also has
an independent path to the two disk arrays. If you used an enterprise
volume manager and a high-availability application, you could mirror
the two disk arrays and mirror the application on both servers.
There would be no single point of failure in that system. This model
can be expanded to clusters of dozens of systems, all sharing common
functions.
Clustered systems present a unique challenge to backup and recovery
systems. Different people take different tactics. The first tactic
is to back up both nodes as clients. The upside to this is that
everything will be backed up. The downside is that you will back
up the shared data twice. (Actually, you will back it up as many
times as you have nodes in the cluster.) The other tactic is to
back up only the cluster entity. The downside to this is that you
might not back up some of the operating system configuration files
on the individual nodes. The upside, of course, is that you only
back up shared data once. The difficulty to any method of backing
up a cluster is if it supports individual applications to fail over.
The best example of this is Compaq's Tru64 TruCluster application.
It allows each application to failover from one node to another,
without affecting other applications. The upside to this is that
you can load balance applications across multiple nodes in the cluster.
The difficulty lies in knowing which node of the cluster has the
Oracle database. Here's what happens:
1. You attempt to back up the entity known as cluster-a.
2. The backup application logs into cluster-a, but in reality,
it is actually logged into node-a or node-b.
3. Either node-a or node-b is actually running Oracle. The other
node can see the Oracle database files, but cannot put the database
in backup mode.
4. Your job is to figure out which node the application is actually
running on, and then rsh/ssh to that system to put
the database into backup mode. We did this with a wrapper to Compaq's
Cluster Application Availability (CAA) application.
As you can see, clusters present a unique challenge to the backup
and recovery folks. Although the topic of high availability is a
very important one, it is well beyond the scope of this article.
I merely wanted to point out that it is the third major area that
can benefit from SAN technology, and that it can bring challenges
as well.
Building the SAN
Figure 1 depicts a theoretical SAN that contains three servers,
three switches, a router, two disk arrays (i.e., "RAID in a
box), a high-end disk array, and a tape library. How do you go about
building such a beast? The first question is what do you want to
accomplish with the SAN? That is why I covered how to use a SAN
explaining how to build one. Once you have answered the previous
question, you can ask who is going to provide you with the components
you will need to build your SAN?
You may wonder which vendors the theoretical SAN in Figure 1 uses.
Its components could be from many different vendors, within reason.
Any of the functionality discussed earlier in this article (or in
Part I) is available from at least three vendors, but not necessarily
from all of them2. For example, earlier in this article, I described
the concept of a split-mirror backup. Such functionality is available
with Compaq, EMC, Hitachi, and IBM, and I'm sure there are
other vendors that offer this functionality. The only differences
will be how they implement it and how much they cost.
Before considering a SAN, you should have a pretty good idea what
you would like it to accomplish. (I hope this article will give
you some ideas.) Once you have turned this idea into a plan, you
should talk to multiple vendors to see how they can help you to
complete your plan and accomplish your goals. For example, several
vendors can sell you OEM versions of every component in Figure 1,
including the servers. (That is, of course, if you are willing to
use their brand of operating system.) Even if you don't want
to use their operating system, you could use your preferred operating
system and buy everything from the Host Bus Adapters (HBA) to the
high-end disk array from them. This usually makes the job easier
because the different components will have been tested together.
It's amazing what can happen when you don't have tested
components. I can think of one project where saved backplane space
by plugging a SAN router directly into the HBA of a large UNIX server.
(I will let the router and server manufacturers remain nameless.)
Although there shouldn't have been a problem with this configuration,
there definitely was. The router was expecting one thing, and the
HBA was expecting another. The result was that the UNIX server panicked
as soon as it came up. We made it work by hard-coding the AL_PA
on the router, and this information was added to the support matrix
for the router. I hope you will not be doing what we were doing;
your solution will have already been tested in a lab somewhere.
Zoning
A SAN like the one in Figure 1 will probably want to use zoning,
so that all devices are not visible to all servers. For example,
you may want to make the high-end disk array visible to Server-B
and Server-C, but not to Server-A. However, you probably will want
the tape library to be visible to all three servers. Zones can help
you accomplish this. You will want to talk to the vendors about
how they make zoning possible.
Once connected to the SAN, with the zones configured the way you
want them, each server can then build the device files necessary
to access the disk and tape drives on the SAN. This can be done
live with many versions of UNIX and NT. For example, the drvconfig,
tapes, and disks commands on Solaris will build the
proper device files without having to reboot. HP-UX uses ioscan
and insf, and NT can be told to search for new devices from
the Control Panel. However, many operating systems try to put a
label on every disk they can see. Therefore, you may wish to use
zoning to protect disks from other operating systems. For example,
don't let your NT box see the disks that you intend to use
for Solaris!
A typical installation procedure would then include using native
backup commands to test connectivity to each tape and disk drive.
For UNIX, test the tape drives with dump, tar, and
cpio, and use NTBACKUP for NT. To test connectivity
to the disk drives, use dd on UNIX, and disk manager on NT.
You may find that all devices do not show up on the first try, and
you may need to make modifications to the SAN or HBA configurations
to allow all devices to be seen. This step is probably the hardest.
Once all devices are visible in all the right places, the rest is
down hill. A good VAR or consultant can be of great help during
this phase. After this phase is complete, you can install whatever
SAN-management software products that you purchased. These products
will help you manage the SAN and allocate the resources that it
has available. There are a number of products in this arena that
are now emerging, and I may revisit these products later with a
review.
This brings my series on storage area networks to a close. It's
difficult to fit this much information into a few thousand words,
but I hope you have found it helpful. My upcoming O'Reilly
& Associates book, SAN & NAS Backup & Recovery
focuses on the topics that I have covered in the last two articles,
and explores the concept of network-attached storage (NAS), and
the unique challenges that it brings to the backup and recovery
table. Remember -- back it up or give it up!
1 I'm well aware that some call what I describe above as
serverless backup. I refer to that as clientless backups, because
the data is not going through the backup client that is using the
data. It is still going through the backup server. It's a fine
line, but I'll walk it for now.
2 That is with the exception of third-party copy. This is an emerging
technology that is not provided by all vendors yet.
W. Curtis Preston has specialized in storage for over eight
years, and has designed and implemented storage systems for several
Fortune 100 companies. He is the owner of Storage Designs, the Webmaster
of Backup Central (http://www.backupcentral.com), and the
author of two books on storage. He may be reached at curtis@backupcentral.com.
(Portions of some articles may be excerpted from Curtis's books.)
|