Building
and Using a SAN: Part I
W. Curtis Preston
In the past few months, I've discussed many things about
storage area networks. I first talked about what life has become
like for those who do not have a SAN, and how a SAN could make things
a whole lot better. Then I talked about Fibre Channel, the underlying
technology behind SANs (see Figure 1). I followed that with a summary
of the different building blocks that are used to build a SAN, which
include the elements in the following list:
Servers -- The servers are what give a SAN its purpose. Each
server will connect to the SAN by one or more Fibre Channel connections.
Host Bus Adaptors (HBAs) -- An HBA is the proper term for
what is commonly called a "Fibre Channel card." The Fibre
Channel cable connects to the server via the HBA, which resides
in the server.
Hubs -- By connecting a server to a hub, you are connecting
it to an arbitrated loop. An arbitrated loop can contain only 126
devices, but a public loop can be connected to a switched fabric,
just as a shared Ethernet network can be connected to a switched
Ethernet backbone by connecting the hub into a port on a switch.
Switches -- In a pure switched fabric topology, each server
will connect to the SAN via its own port on a Fibre Channel switch.
Multiple switches can be cascaded together to create larger SANs,
up to 16 million nodes.
Routers -- A router (sometimes referred to as a bridge) bridges
the gap between serial SCSI (i.e., Fibre Channel) and parallel SCSI,
allowing you to connect legacy SCSI devices, such as tape drives
and older disk drives, to your SAN.
Disk Systems -- Disk systems come in all shapes and sizes,
including single Fibre Channel disks, JBOD arrays, RAID in a box,
and enterprise-level storage arrays. With the proper hardware and
software, disk systems connected to a SAN can be much easier to
manage, allocate, and repair.
Tape systems -- Connecting tape systems to a SAN (usually
via a SAN router) allows you to share these tape systems with multiple
servers, significantly increasing the usability of large tape libraries.
Cabling -- Although it may seem obvious, the cabling is an
essential part of the SAN. Most Fibre Channel cabling uses fiber
optic cables, although copper cables can be used where distances
are shorter and cost is an issue. One interesting difference between
fiber optic cables and legacy SCSI cables is that, while they may
be a bit more fragile, they either work or they don't. Think
about the number of times that you held a SCSI cable in your hand
that looked fine but performed poorly, and you should be able to
appreciate this difference!
Software -- Without software, much of the functionality made
possible by SANs would actually result in chaos. Software does a
number of things, including controlling access to the hundreds or
thousands of devices connected to the SAN.
Using the SAN
Why are we talking about how to use the SAN before talking about
how to build the SAN? The answer to that question is simple. Until
you know what you can do with a SAN, you shouldn't be building
one in the first place! Therefore, I'm going to talk about
the things that a SAN makes possible to give you some ideas of how
you might use one to make your life easier. Then, I'll explain
how to make these things a reality by actually buying and building
the SAN.
There are essentially three things that the SAN in Figure 1 makes
possible:
1. Offline storage consolidation
2. Online storage consolidation
3. Truly highly available systems
(This is speaking very broadly, of course.) Offline storage consolidation
is just a different way of saying that you can share a tape library
between multiple servers. Online storage consolidation allows you
to create a pool of disk systems that can be allocated to the servers
that need them. Because a SAN allows for multiple servers to access
multiple physical disks via multiple physical paths, it also allows
for truly highly available systems. Because this lost+found
column is dedicated to backup and recovery, I will cover the topic
of offline storage consolidation first. Also, when covering the
other two topics, I will concentrate on the ways in which they benefit
or affect backup and recovery.
Offline Storage Consolidation
In the upper right-hand corner of Figure 1, you can see a tape
library connected to the SAN via a SAN router. Assuming that there
is a zone that contains the tape library and the three servers,
Server-A, Server-B, and Server-C will all see each of the tape drives
in that tape library as if they were physically attached. In this
configuration, the robotic device is also connected to one of the
SCSI buses on the back of the SAN router. This makes the robotic
device also visible to all three servers. Each server will then
be able to build device files just as they would for a parallel
SCSI-attached tape drive or robotic device.
Now what do we do? This allows each server to attempt to take
control of both the robotics and the tape drives! If we actually
allowed each server to do so independently, chaos would result.
This is where the robot and drive-sharing software from your backup
software vendor comes into play. It will act as a "traffic
cop," allocating tape drives to the servers that need them.
How this actually works will vary greatly between vendors. The description
below is valid for several of the large vendors.
One server is configured to be the main server (e.g., NetWorker
server, or NetBackup Master Server). It will hold the backup database
and scheduling information. (The backup database will keep track
of what has been backed up to what tape.) This server is also typically
dedicated as the robotic control server for the robot in the tape
library. (This will cause it to act as the "traffic cop"
discussed above.) Each of the other two servers are then configured
as media servers (e.g., NetWorker Storage Node, or NetBackup Media
Server). Each drive within the tape library is then configured as
a "virtual" drive that each server must request to use.
When one of the servers needs to back up or recover, it requests
a drive from the main server. If the main server has an available
tape drive, it may immediately assign that drive for use by the
requesting server. Once that has been done, the drive will no longer
be available to other servers until it has been released by the
server that is using it. If another server requests a drive, and
there are no available drives, a server requesting a tape drive
will be told to wait until a drive becomes available. When another
server finishes with its drive, it tells the main server that it
no longer needs the drive. The main server can then allocate that
drive to another server that is waiting for a drive.
As you can see, this setup results in dynamic allocation of tape
drives to the servers that need them. This also allows each server
connected to the SAN to back up their entire data set directly to
its own tape drive. It can do this without transferring the data
via the LAN, and without forcing the administrator to allocate one
or more tape drives to its exclusive use.
This setup allows you to really maximize the return on investment
for tape libraries. Instead of buying many smaller libraries and
connecting a few drives to each server, you can buy one or two really
large tape libraries, and dynamically allocate the tape drives to
the servers that need them. What a savings! By buying two libraries
and connecting them both to the SAN, you even increase the availability
of tape drives to all servers. In the pre-SAN days, a given media
server could only rely on the tape library that was physically attached
to it. Now, every time you buy a new library, you can add a new
level of availability to your backup system, because every server
can use every library.
Online Storage Consolidation
Many large data centers have thousands of discrete disks between
all of their servers. There are a lot of disadvantages to this that
can be solved with SANs. When one disk fails, the server must be
taken down to repair it. Although this can be solved by buying RAID
arrays with hot-swappable disks and at least one hot spare for each
array, allocating a hot spare for each array can be expensive. Also,
if the hot spare has already been used and has not been replaced,
you cannot automatically use the spare from one server to fix a
bad drive in another server. You wouldn't even want to do this
manually, because doing so would degrade the integrity of the server
from which you borrowed the drive.
Consider aging or decrepit disk drives. I remember one client
that had an entire set of servers that could not afford down time.
Behind these servers were disk drives with a known problem that
wasn't discovered until the servers had been in production
for a long time. They would occasionally leak swag oil on the disk,
rendering it useless. The vendor was offering free replacements,
but doing so took over a year due to the downtime and hassle required
to make the swap. Although this is similar to the last problem,
it's slightly different. Using discrete disks does not easily
allow for preventive replacement of drives that you know are old
or potentially faulty.
Another manageability issue is space allocation. It is a common
to have one server starving for disk space, while another has disks
to spare. I know of one very large data center where they had individual
servers that were out of space, but the entire data center was actually
only using 10% of its total capacity. They had over 30 TB of available
spinning disk, but couldn't allocate a single disk to the servers
that needed them. Doing so would have required physically moving
the disk from one server to another and downtime for both servers.
Putting your disks behind a SAN can solve all three of these problems
as well as others. Consider the first two problems. One way to solve
them would be to put all of your storage on a rather large storage
array behind the SAN. This array could have redundant power supplies,
paths, and disks. This array can also contain a small pool of hot
spare disks. This would require far fewer disks than having a hot
spare for each system. When any disk in the array fails, it can
be automatically replaced by the storage array, and the array would
notify you that you need to replace the bad drive. The bad drive
could then be replaced online. This would be the "hardware"
solution, and is the most tested method of solving those problems.
However, you don't have to have an expensive array. You could
simply place your JBOD disk behind your SAN, and use enterprise
volume management software to manage your disks and hot spares.
Either of these solutions also makes proactive maintenance much
easier. You simply buy the new disk, connect it to the array (or
the SAN), and use the appropriate volume manager to make the new
disks active and the old disks inactive. The old or failing disks
can then be moved out of service. Perhaps the greatest benefit of
consolidating your storage behind a SAN is dynamic space allocation.
Just as tape drives can be allocated to the servers that need them,
disk drives need only be allocated to the servers that need more
storage. If you find that you've allocated too much storage
for a given server, simply deallocate that storage from that server,
and return it to the pool of storage that is available for servers
that need it. If a given server is out of storage, and you don't
have any more disk available in the spare storage pool, you can
just buy more disk, and attach it to the SAN! Buying disks this
way would also allow you to buy bigger arrays that are cheaper per
GB. If you were using discrete disks and only needed 10 GB, you
might be able to justify a 20-GB array, but not a 200-GB array,
which would be much cheaper per GB. Someone would ask you to prove
that the server will need soon need 200 GB. However, if you were
putting that 200 GB behind the SAN, allowing any server to use it
when it needs it, you could do just that! It would be much easier
to justify that the entire data center will soon need 200 new GB
of disk.
In Part II of "Building and Using a SAN", I will cover:
- How online storage consolidation simplifies backups.
- How SANs make creating highly available systems easier.
- What's involved in building a SAN.
W. Curtis Preston has specialized in storage for over eight years,
and has designed and implemented storage systems for several Fortune
100 companies. He is the owner of Storage Designs, the Webmaster of
Backup Central (http://www.backupcentral.com), and the
author of two books on storage. He may be reached at curtis@backupcentral.com.
(Portions of some articles may be excerpted from Curtis's books.)
|