W. Curtis Preston
Before I get into this month's topic, I'll review what
I've covered so far. In the first article of this series, I
discussed the reasons that SANs exist. Because this column is dedicated
mainly to backup and recovery, I covered the ways that SANs make
backup and recovery easier. The second and third articles in this
series explained the basics of Fibre Channel, starting with Fibre
Channel's advantages over parallel SCSI.
Although I did not use the term parallel SCSI in previous articles,
I'd like to introduce it now. Since SCSI refers to both the
physical medium and the protocol, we need a term that refers to
"traditional" SCSI. In traditional (i.e., bus-attached)
SCSI, SCSI data travels over several conductors in parallel. (SCSI
cables range from 50 to 80 conductors in a single cable.) Therefore,
we are referring to traditional SCSI as parallel SCSI.
In contrast, you will remember that Fibre Channel has only two
conductors, one for transmitting and one for receiving. If SCSI
traffic was meant to travel across several conductors in parallel,
how does Fibre Channel accept SCSI data? The answer is the new SCSI-3
specification for the SCSI architecture, which is very different
from its predecessors. One of the main differences is that the SCSI-1
and SCSI-2 specifications were laid out in a single document each,
where the SCSI-3 specification consists of more than 30 documents
describing a multi-layered architecture.
This allowed the specification of other layers, as long as each
new layer followed the communication specifications of the existing
layers above and below it. The Fibre Channel specifications (FC
and FC-2) were then added to the SCSI-3 specification. (Read more
about this at http://www.t10.org. T10 is the ANSI committee
that develops the SCSI standard.) Since Fibre Channel SCSI traffic
travels across only two wires (transmit and receive), we could refer
to it as serial SCSI. (To avoid confusion, I will only use this
term when necessary, and only when comparing Fibre Channel to parallel
SCSI.) Fibre Channel solves a number of problems with parallel SCSI,
Address limitations -- Parallel SCSI is limited to
15 devices per HBA. Fibre Channel can address 16 million devices
per HBA in a switched fabric configuration.
Logistical limitations -- You cannot easily share storage
resources via parallel SCSI. Although you can connect hosts to a
single SCSI bus, it can be quite complicated to do so, and many
operating systems may not support this. Fibre Channel, on the other
hand, can connect hundreds or thousands of hosts to the same bus,
allowing all of them to share the same storage resource. Fibre Channel
devices can also be located up to 10 km apart, whereas parallel
SCSI is limited to only 25 meters.
Speed limitations -- With 640 MB/s parallel SCSI in
progress, this is less true than before. However, faster Fibre Channel
speeds are in progress as well, and Fibre Channel can also be aggregated,
whereas parallel SCSI cannot. In other words, you can "trunk"
several 100-MB connections together, giving much more throughput
than is possible with one connector.
Backup/restore limitations -- Being able to backup
large amounts of data without using the LAN is one of the main advantages
to Fibre Channel and SANs. This is because of what can be done when
a single device is accessible by more than one computer, which is
really only possible with Fibre Channel.
I also covered the three different topologies of Fibre Channel
(point-to-point, arbitrated loop, and switched fabric), and explained
that switched fabric is the most expensive topology, but also the
fastest. It could also be said that switched fabric is becoming
more competitive with arbitrated loop in the cost area. Therefore,
one should seriously consider installing a new SAN using switched
fabric, if at all possible.
The Building Blocks
Now that I've reviewed why we are here, and how the different
SAN elements communicate, I will discuss the elements of a SAN,
and how they work together. The main elements of a SAN are servers,
HBAs, switches, hubs, routers, disk systems, tape systems, cabling,
and software. These are all illustrated in Figure 1.
No storage area network would have any reason for being if there
weren't servers connected to it. The servers will use the SAN
to share storage resources.
Host Bus Adapters (HBAs)
Servers connect to the SAN via their Host Bus Adapter, or HBA.
This is often referred to as a "Fibre Channel card," or
"Fibre Channel NIC." It is simply the SAN equivalent of
a SCSI card (i.e., a SCSI HBA). Some HBAs may use fiber, and some
HBAs may use copper. Regardless of the physical layer, the HBA is
what is used to connect the servers to the SAN.
In Figure 1, you will see that two of the servers actually have
two connections to the SAN via two HBAs. Although this configuration
is used quite a bit, I should explain that just because you are
using Fibre Channel and multiple connectors does not mean that you
will have redundant paths to a given device. This is true for a
lot of reasons. Notice in Figure 1 that, since the storage resources
are not connected to multiple switches, each path coming out of
each server can only access one device. Another reason that you
may not have redundant paths is because of the limitations of the
drivers. Please realize that Fibre Channel disks are simply running
the SCSI-3 protocol that has been adapted to work in a serial architecture
(see above). Since SCSI was historically written to understand one
device that plugs into one bus, it is understandable that SCSI-3
does not understand the concept of a storage device that could appear
on more than one HBA. Therefore, if you would like to have redundant
paths, you will also need some sort of redundant pathing software.
Its job is to stand between the kernel and the SAN, so that requests
for storage resources are monitored, and directed to the appropriate
path. (These software products are covered again later in this article.)
Note that there are two main types of lasers in today's HBAs:
OFC and Non-OFC. OFC (optical fiber control) devices use a hand-shaking
method to ensure that they do not transmit a laser pulse if there
is nothing connecting to the HBA. (This is for safety reasons, since
a high-powered laser can cause permanent damage to your eyesight.)
Non-OFC devices employ no such handshaking, and will transmit even
if a device is not connected. Believe it or not, Non-OFC devices
are actually quite common, due to the cost associated in making
an OFC device. Therefore, please do not look directly into an HBA.
You may regret it!
Figure 1 shows two servers connected to two switches. Remember
that when you are connecting to a switch, you are using the switched
fabric topology -- not the arbitrated loop topology. (That is
unleess the device that we are connecting to the switch does not
support fabric login. If the switch supports arbitrated loop, it
will create a private arbitrated loop on the port to which you connect
Switches are "intelligent," and have many possible configurations.
Using software provided by the switch vendor, you could create zones
that allow only certain servers to see certain resources. This configuration
is usually done via a serial or Ethernet interface.
This brings up an interesting and important topic -- security.
One major difference between parallel SCSI and Fibre Channel is
that most Fibre Channel devices have an RJ-45 port, allowing you
to connect your SAN devices to a LAN. This setup allows for much
easier configuration than what is possible through the serial interface.
It also allows your SAN devices to be monitored via SNMP-capable
monitoring software. However, it also opens a major security hole.
If you simply connect the RJ-45 port of your SAN devices directly
to your corporate LAN (or even worse, the Internet), then you have
a new way that "black hats" can take down your enterprise.
I suggest placing all LAN connections for SAN devices on a separate,
well-protected, LAN. To do otherwise is to invite disaster. Also,
remember to change the default administrator passwords on these
Another interesting security ramification of SANs is the configuration
software that runs on servers connected to the SANs. Depending on
the product and its capabilities, a black hat breaking into the
wrong box can also wreak havoc on your SAN. Ask your configuration
software vendor how you can protect yourself from such a disaster.
The vendor will probably tell you to limit the number of boxes that
run the configuration software and to isolate them on a separate
LAN and secure them as much as possible. It's tempting to put
your management and configuration software in multiple places, because
it makes management of the SAN much easier; however, think about
the security implications before doing so!
Hubs only understand the arbitrated loop topology. When you connect
a device to a hub, it will cause the arbitrated loop that the hub
is managing to re-initialize. The device will be assigned an AL_PA
(arbitrated loop physical address), and it will begin arbitration
when it needs to communicate with another device on the loop.
There are managed (i.e., "smart") hubs and unmanaged
(i.e., "dumb") hubs. An unmanaged hub is unable to close
the loop when a device on the loop is malfunctioning; therefore,
a single bad device can disable the entire loop. A managed hub could
detect the bad device and remove it from the loop, allowing the
rest of the loop to function normally. Although there are plenty
of unmanaged hubs available, the cost difference between managed
and unmanaged hubs is minimal, and the functionality difference
between them is quite great. When considering a new SAN, one should
also consider whether a hub is even appropriate. The cost difference
between hubs and switches gets smaller and smaller every day, and
the functionality difference is even greater than the difference
between managed and unmanaged hubs. Since arbitrated loop is cheaper
than fabric, I have seen a number of sites build SANs based on hubs
and arbitrated loop -- only to rip out the hubs and replace
them with switches a year or two later. Consider purchasing a switch
if at all possible.
Routers and Bridges
There are two different types of routers. The first is one that
is sometimes referred to as a bridge (1) and is what is depicted
in Figure 1. This type of router converts the serial data stream
into a parallel data stream, and vice versa. It allows you to plug
parallel SCSI devices, such as tape and optical drives, into your
SAN. Once you have done so, you can share them just as you would
share a device that speaks serial SCSI natively. That is why, in
Figure 1, you see a tape library connected to the SAN via a router.
The second type of router (not pictured) goes between the HBAs
and the switches. This type of router can actually route traffic
based on load, and finds alternate paths when necessary. This is
a relatively new type of router.
As shown in Figure 1, disk systems come in many shapes and sizes.
While many people think it is necessary to buy a high-end disk array
to enter the SAN space, there are two other types of disk systems
on the SAN in Figure 1. The first is a "disk array," sometimes
referred to as "RAID in a box." These types of arrays
can typically be configured as a RAID 0+1, RAID 1+0, or RAID 5 array,
and can present the disks to the SAN in a number of ways. They will
often automatically pick a hot spare and perform other tasks that
JBOD just can't do.
The second main type of disk system is JBOD, which stands for
Just a Bunch Of Disks. These disks would either be parallel SCSI
disks plugged into a SAN router, or Fibre Channel disks plugged
directly into the switch. You can also plug several JBOD disks into
a hub, and then plug the hub into the switch. This is a more cost-effective
way to plug several smaller disks into the SAN. However, as discussed
above, you should perform a cost-benefit analysis when deciding
whether to just plug the disks into the switch, or to plug them
into a hub that gets plugged into the switch.
The final type of disk system is the high-end disk array. These
typically offer significant advantages over JBOD or "RAID in
a box" systems, but they do cost quite a bit more than the
other systems. Features that may be available in such systems are:
- Creation of additional mirrors that can be split for backup
- Proactive monitoring and notification of failed (or failing)
- Multiple server connections (32, 64, or more servers connected
to a single array)
- Internal zoning capabilities
- Multi-pathing and failover software
Although some of these features may be available in the "RAID
in a box" products, a high-end array will probably offer all
of them in one box.
Although cabling is often overlooked in discussions about SAN
architecture, it's obviously a very important part of the system.
These cables are typically fiber optic cables with SC connectors.
(This is the same type of cables used for Gigabit Ethernet.) As
discussed in previous articles, there are also DB9-style connectors,
which are less expensive, and may be more appropriate for some environments.
Please remember that fiber optic cables are very fragile, and should
be treated as such. I have heard this described more than once as
an advantage over SCSI. Fiber optic cables either work, or they
don't. Either no data gets through, or all the data gets through.
In contrast, a SCSI cable may work fine under some conditions, but
There are many products in this category, and this is one of the
fastest growing areas of SAN products. Among other things, these
products offer the following features:
Suppose you'd like to address SSA disks (2) and Fibre Channel
disks from a single host. A product offering protocol conversion
could make this happen.
Zoning is a very important aspect of SANs. Without zoning, every
host connected to the SAN can read and write to every disk in the
SAN. By separating the servers and disks into zones, you solve this
Suppose you do have multiple Fibre Channel paths to the same device.
By default, Fibre Channel will not use one of those links as a failover
link if the other one fails. Software can make this happen.
This is very similar to the failover feature. If you have multiple
paths to a single storage resource, wouldn't it be nice to
distribute the load between those paths? Often this is combined
with the failover feature, where traffic will be load balanced during
normal operations, but will failover in case of device failure.
There's So Much More
There is so much more to cover in an article about SAN building
blocks. For one thing, you will notice that I have hardly mentioned
any vendors' names. The reason for that is that the SAN industry
is moving very fast right now. Given that this article is actually
written several months before you see it, it would be out of date
before it gets to you. Therefore, please go to: http://www.backupcentral.com/
hardware-san.html for an updated list of SAN vendors, separated
by which element(s) of the SAN that they can provide. (Please let
me know if I am missing anyone!)
In the coming months, I will explore what you can do with a SAN,
including backup and recovery, storage consolidation, and high-availability
applications. I'll see you soon!
1 In my opinion, router is a more appropriate name. A bridge communicates
on Layer 2 and a router on Layer 3. When mapped to the OSI model,
the Fibre Channel specification puts SCSI at Layer 3. The vendor
that owns the lion's share of the market (Crossroads) calls
2 Serial Storage Architecture. This is a competing architecture
to Fibre Channel, but it has been around for a while and has not
gained much acceptance. That is not to say that there aren't
SSA devices out there, though!
W. Curtis Preston is a principal consultant at Collective Technologies
(http://www.colltech.com), and has specialized in designing
and implementing enterprise backup systems for Fortune 500 companies
for more than 7 years. Portions of this article are excerpted from
his O'Reilly book UNIX Backup & Recovery. Curtis
may be reached at: email@example.com.