Storage
Consolidation -- Part 2 -- Product Selection
Peter Baer Galvin
Last month, the Solaris Companion covered the reasons to consider
storage consolidation. A storage evaluation can reveal short- and
long-term savings in cost as well as management time. The needs-analysis
phase leads to a design for a consolidated storage facility. This
month, I'll discuss product selection. There is quite a bit
to consider about storage arrays, SAN switches, and tape drives
and libraries. The wrong choices among those options could cause
serious pain in your SAN down the road.
Product Selection
Our theoretical design from last month calls for a storage SAN
with network attached storage (NAS) interfaces. This design allows
for servers needing maximum performance and dedicated storage to
attach to the storage via SAN switches. Hosts needing to share file
systems, or that are not capable of SAN attachment, can take advantage
of the SAN through the NAS heads. All of the storage is thus consolidated
into a set of RAID arrays that can be allocated to SAN-attached
or NAS-using servers. Further, backups use the SAN switching infrastructure
to allow hosts to copy their data from their allocated storage to
the tape libraries without using the TCP/IP network to carry the
data. Finally, replication of data is done from the production site
to the disaster recovery site via storage array-based or host-based
replication.
As it turns out, the design is one of the easier steps of a storage
consolidation process. It is based on existing storage and future
storage needs. Product selection is all about attention-requiring
details.
Array Selection
The design provides the requirements for storage arrays. Selecting
an array that meets those requirements may seem easy, but when intangibles
such as future maintenance costs and disk drive evolution get weighed,
choices get more complicated. Add to that the tangibles of cost
and scalability, and the choices are narrowed.
Performance analysis of array choices leads to another quandary.
There are no industry-standard storage benchmarks. Some companies
choose to believe vendor performance information, which of course
conflicts with claims by competing vendors. However, quite a bit
can be learned from the details of the storage array architecture.
Table 1 shows some technical array criteria, with choices sorted
in order of preference.
The choice of spindle size and RAID level is a tricky one, but
the more choices provided by the array, the more you can tune the
array to suit your needs. For example, to maximize performance of
a database, 36-GB drives with RAID 1+0 or 0+1 would be best. For
a data warehouse, 144-GB drives with RAID 5 might be the best choice.
An array that provides a choice of disk sizes and a choice of RAID
levels and allows for them to be intermixed within the array will
provide the best solution. Note that disks can vary in more than
just size. Seek times, latencies, transfer rates, interface (e.g.,
dual-active fibre vs. dual fibre with one active), cost, and even
MTBF can vary along with drive sizes even within an array. All those
aspects must be considered for each when comparing arrays.
The "paper-based" comparison is a useful exercise, but
it can only tell so much. How manageable is the array? How well
does the software work? Does the actual performance bear out what
is on paper? These questions can only be answered via hands-on experience.
If the choice is down to two or three competing technologies, and
time, effort, and facility are available, consider performing a
"bake-off" between them. A bake-off can be an effective
means of judging not only performance, but also manageability and
general livability.
Unfortunately, performing such a bake-off is never as easy as
it seems. Benchmarking requires load-generation machines, loaned
or rented SAN infrastructure to test, and space and effort to set
up the test environment. It also requires a controlled environment
and knowledge of systems and storage. For example, if a host is
not rebooted and storage re-initialized between tests, false readings
can result from the use of already-cached data or fragmented file
systems. So, while a bake-off is the most effective tool for judging
performance, consider the time, effort, and cost it can incur.
On the financial side, array (and other technology) costs go beyond
the purchase price. How much is maintenance? How much is software
that you may want to add down the road? How much is additional cache?
Consider expandability as well. If the array can "only"
go to 72-GB drives, how long will it be useful in your environment?
Of course, the cost of downtime should be factored as well. What
are the single points of failure? What claims does the manufacturer
make for the average reliability of the device? Planned downtime
also has a cost in sys admin time and effort, if not also in money
lost to the company from lost business or worker production. When
is planned downtime needed (e.g., for LUN creation, configuration
changes, firmware upgrades)? How often will it be needed and at
what impact?
NAS Selection
Most of the above points pertain to all technology selection,
not just arrays. But for selecting network-attached storage, there
are some other factors to consider.
Currently, most NAS solutions include the compute component and
the disk component. These all-in-one appliances excel in functionality
and ease of management, but are contrary to storage consolidation
efforts. If separate NAS servers are included in a facility, then
they bring another set of spindles to design, configure, manage,
capacity plan, and maintain. And what happens if the disk capacity
of an all-in-one NAS is reached? Another NAS appliance will need
to be added, further fragmenting storage. A recent trend is toward
NAS heads that attach to a SAN rather than include their own spindles.
These devices talk networking out the front-end, and have fibre
to attach to a SAN out the back-end.
Here are some questions to consider when choosing a NAS solution:
- Is it highly available (including at the disk tray and cache
level)?
- How does it perform? The SPECSFS benchmarking at http://www.spec.org
is a good place to start.
- How many network interfaces are there, and of what type?
- How many fibre interfaces are there?
- What network file systems are supported (SMB and NFS are the
minimum)?
- What naming services are supported (LDAP, NIS, NIS+, DNS, Active
Directory, etc.)?
Switch Selection
Switches are literally at the heart of a SAN. While some of their
features can be quantified, others again are intangible.
Obvious points to consider include how many ports are available
in a given model. 2 Gb is becoming the norm for SAN infrastructure
components, so be sure the switch has 2-Gb ports, auto senses between
1 Gb and 2 Gb, or can be easily upgraded. Supportability (and support
from the vendors of the other products that are being evaluated)
is an important factor, but more on that next month.
Unfortunately, it is hard to get a grasp on performance differences
between switches. Again, there are vendor claims and counter claims,
but the truth remains elusive, and there are no benchmarks. Looking
into the technical details of the switch and its backplane, or testing
it out under controlled benchmarking conditions is the only recourse.
Tape Drive and Tape Library Selection
Tape libraries can be used for backup, data interchange, and hierarchical
storage (HSM). Tape drives and tape libraries come in many variations.
Some vendors have created worksheets into which backup parameters
are entered and out of which tape and library suggestions are generated.
Tape libraries can be evaluated based on the number of drives
they can hold and the number of tape slot cartridges they can control.
Also, some number of cleaning cartridges can be loaded into a library
for drive maintenance. Library robotic methods vary, and vendors
will be glad to debate which is best. Floor space should be a consideration,
as some libraries can be huge. "Pass through" is a method
used to automatically move cartridges between multiple robots. "Mailslots"
are externally accessible slots that can be used to import and export
cartridges to the library. For instance, a 10-mailslot library allows
10 tapes to be added or removed at once. Large libraries include
some method (scanner or camera) of recognizing cartridge identifiers,
so the library knows exactly which tape is in a particular slot
or drive.
A cautionary note about sharing libraries: since a library has
but one robot control port, only one program is allowed to control
it. If you want to share a library between two programs (say for
backup and HSM), software is needed to play the intermediary. Be
sure to factor this into your SAN design plans.
The choice of drives includes consideration of speeds-and-feeds,
tape capacity, compression availability and effectiveness, and size
of the drive and tapes (how many can fit into a library). Current
common tape-drive choices include LTO, SDTL, and ADIC. Some drives
have special features that could drive the decision their way (i.e.,
9840 drives have a WORM mode). Most libraries allow use of multiple
drives and cartridges, so a hybrid approach is quite feasible.
Another important aspect is the drive's interface. Some libraries
include bridges, so they may have fibre externally to hosts (or
a SAN switch) and SCSI internally to the drives. Bridging is a workable
solution, but native fibre externally and internally reduces failure
points. Check for vendor recommendations on how to attach drives
to hosts. For example, three drives may be allowed per host-SCSI
connection, while only one fibre drive per host connection is supported.
With the advent of SANs, there are now new ways to architect backup/HSM
solutions. Just as a SAN switch can allow hosts to connect to arrays,
the same (or dedicated) switches can allow hosts to attach to tape
drives within libraries. Backup/HSM data can move from the SAN storage,
through the host, to the library, and never hit the network. Dedicated
backup servers are now optional. Software such as Veritas NetBackup
can manage not only the backups but also tape drive allocation.
For example, two hosts could attach to a library via a SAN switch.
Backup software can allocate some number of drives to one server,
perform the backup, release the drives, and allocate them to the
other server. Or both servers could do backups concurrently to half
of the drives. For restores, all drives could be allocated to one
host to minimize restoration time.
This "Backup SAN" involves choosing tapes and tape libraries,
and devising methods for backups, restores, and HSM storage. It
can be as complex, and as rewarding, as a SAN. If you are contemplating
or executing a storage consolidation, be sure to consider a tape
consolidation at the same time.
Replication
Replication of data between sites has the added complication that
it can be accomplished via hardware or software. The choice of product
here is very needs-dependent. It also depends greatly on the choices
made in the areas above (or can drive the choices above). For example,
if an array vendor has an array-to-array replication product that
meets your needs, it could override other areas where the array
is a weaker choice than its competitors.
With replication, there are several factors to consider:
- What distance does the data need to travel? Some solutions
are limited to a small radius, or require expense and complexity
to go beyond that radius.
- How much data needs to move at peak time? The bandwidth has
a huge impact on replication design.
- Does replication have to occur over dedicated connections (storage
or WAN), or can it use the Internet?
- Is the replication synchronous or asynchronous? Synchronous
replication assures that the data reaches the remote site before
the application gets its transaction acknowledged. Asynchronous
reaches further distances, but some data can be lost depending
on the applications running.
- Does the replication implement "write ordering"?
If write order is preserved, and asynchronous mode is used, then
data may be lost should the primary site fail, but the data will
be consistent. For example, a database would receive its transactions
in order, but some transactions may be lost. This is better than
out-of-order writes in which the database may be very unhappy
with the state of its replicated data.
Conclusions and Next Month
Design and product selection is important, but the rubber meets
the road in implementation. Next month, I'll conclude the storage
consolidation project by describing the implementation of the facility.
Implementation is full of details, and that is where the devil lies.
Several demons will be described, and maps of routes to take to
avoid them will be shown.
Peter Baer Galvin (http://www.petergalvin.org) is the
Chief Technologist for Corporate Technologies (www.cptech.com),
a premier systems integrator and VAR. Before that, Peter was the
systems manager for Brown University's Computer Science Department.
He has written articles for Byte and other magazines, and
previously wrote Pete's Wicked World, the security column,
and Pete's Super Systems, the systems management column for
Unix Insider (http://www.unixinsider.com). Peter is
coauthor of the Operating Systems Concepts and Applied
Operating Systems Concepts textbooks. As a consultant and trainer,
Peter has taught tutorials and given talks on security and systems
administration worldwide.
|