Storage
Consolidation -- Part 1-- Design
Peter Baer Galvin
Regardless of the new technologies that promise decreasing storage
needs, those needs keep increasing. Whether it is user files, databases,
human resources information, or research results, disks keep getting
filled. The trend toward consolidated storage is strong and getting
stronger. This month in the Solaris Corner, I'll discuss the
business needs for consolidated storage, provide a process for designing
a storage solution, and look at the two leading solutions: SAN and
NAS. Next month, in part 2, I will cover the implementation of consolidated
storage, including some complexities you may not expect.
Unfortunately for facilities managers, but fortunately for technology
providers, storage solutions keep advancing and increasing in complexity.
Not long ago, a site would run a combination of direct-attached
storage (DAS) and network file services (NFS or CIFS). Occasionally
there was dual-attached storage to enable high-availability clustering.
Anyone requiring more connections generally installed a proprietary
central-storage unit (e.g., an EMC Symmetrix). Disk spindles were
spread across these devices, leading to the need for separate management
and capacity planning. Spindles were wasted because extra capacity
was needed in each set of storage.
The Storage Evaluation
With the advent of storage switches, there is a new solution where
DAS would have been used. A SAN consists of one or more storage
switches, connecting hosts and storage. The benefits of this new
model are many, as are the complications. The first step in a storage
consolidation project is a storage evaluation:
- Tools such as the Sun HighGround can be installed on all storage-centric
systems to evaluate how much space is used and free throughout
a facility. It can also tell how much is "old" and "new"
to determine whether HSM would be useful.
- ROI determination comes from the space information garnered,
plus a thorough look at maintenance costs of existing storage,
floor space, power and cooling savings from consolidating, and
management costs of running disparate storage vs. consolidated
storage.
The Needs Analysis
Based on the evaluation, you will have more complete knowledge
of your current environment and its costs. Next, each of these key
points needs careful consideration:
- A SAN solution for systems needing "dedicated" storage.
You'll need a separate SAN for each distant physical location.
Some sites also implement separate SANS for production and QA/Development.
- NAS is still a good solution where multiple machines need access
to the same LUNs (logical units, or mount points). For example,
multiple Web servers could mount their HTML contents from a single
NAS mount, so changes need only be made at one place.
- DAS is still useful for spot solutions, where low cost and
low capacity are the driving factors. Care must be taken, or your
facility will end up needing consolidation again!
For each server in the facility, also consider these needs:
- How many LUNs are needed per server?
- Capacity need of each LUN?
- RAID level of the LUN (RAID 1, 0+1, 1+0, or 5?). Keep in mind
the performance and failure recovery impact of the choice. Certainly,
most sites run with a variety of RAID levels.
- Caching need, especially if the LUN is performance-oriented
and will receive random writes. Also consider whether it is a
small LUN that would greatly benefit performance if it were entirely
in memory (either locked into cache or on a RAM disk).
- The need for further data protection via "snapshots",
either full mirrors or copy-on-write block copies. Snapshots can
be used for backups, or to allow a recent history of the LUN to
be kept for fast restoration or testing.
- The need for single or redundant paths from the host to the
LUN. Generally, production LUNs need redundant access paths (for
SAN or DAS), or a highly available networking (for NAS).
- The need for dynamic multipathing, in which redundant paths
are used until one fails, and then all the data is seamlessly
routed via the remaining paths. Note that some multipath solutions
will dynamically reuse the failed path when it again becomes available,
while others require a reboot before full multipathing can be
restored.
- The throughput needed between the host and the LUN, both at
peak use and on average. If the use is "peaky", when
are the use peaks per LUN?
- Replication need (to mirror data off-site) for disaster recovery.
Replication has its own complexities. Chiefly, how much data needs
to be replicated per second, both peak and on average?
- Backup requirements can involve their own analysis project.
Interesting data points include the downtime allowed, the type
of data to be backed up, the restoration time limits, the volume
of data to back up during backup window, and the communications
capacity needed to do so. SANS can be leveraged to provide LAN-free
backups. This can be thought of as a tape-drive consolidation
project as well.
- Should HSM (hierarchical storage management) be implemented
for the LUN or server-wide? HSM holds promise but again, adds
its own complexity. If large amounts of data are going to be write-once
read-seldom, then HSM is worth considering.
- How secure do the contents need to be? Because NAS is network-based,
and SANs can have network-based management, security must be considered
when doing storage consolidation planning.
- For NAS use, also consider the NAS protocol needed and the
file access controls needed. For example, if a volume will be
accessed via NFS and CIFS, do you need file access control lists
to work under both protocols to the same file?
The end result of this analysis is the raw data needed for storage
architecture design.
The Design
Based on the above analysis, a core storage configuration can
be determined. However, there are still other aspects to consider
before settling on a design. Although individual scalability needs
are not important in a consolidated environment, overall scalability
is still of interest. How much growth will there be in the total
storage needed? Until recently, NAS and SAN storage were two separate
islands, needing their own capacity management. The latest NAS appliance
technologies allow a NAS head to be connected to the SAN infrastructure.
Spindles can be shared, and technologies such as storage-based replication
can be unified. True high-availability NAS heads are not quite ready
yet, but should be in the near future. When that occurs, consolidation
of NAS and SAN will begin in earnest, with the ensuing management
and planning benefits.
Taken together, the information determined above can generate
a complete storage design. Of course, product function and limitations
can cause a redesign, or at least a modification to the design,
if the best solution is not attainable.
Figure 1 depicts a sample storage consolidation design. At the
production site, there are redundant switches to servers that used
to have DAS storage to connect to the disk array pool. That pool
consists of two storage arrays (they need not be from the same manufacturer,
for example). By having servers with redundant connections to switches,
and switches with redundant connections to the storage arrays, the
entire SAN is fault-tolerant. A NAS head gives network access to
files stored on any LUNs mounted to the NAS device. For example,
user files could be made accessible via the net to all clients.
Also in Figure 1 is a tape library attached to the SAN switch.
HSM and backup software could transfer files between the storage,
the servers, and the tape drives, without ever having the data traverse
any network. Some software, such as Veritas NetBackup, provides
for tape sharing among servers. For example, rather than having
each server connect to a specific tape drive via the SAN, the tape
drives are a pool, and the software dynamically allocates tape drives
to a server, performs the backup, and de-allocates the drives so
they can be used for backup of other servers. This kind of tape
consolidation is very helpful for restores, in which case all tape
drives can be dynamically attached to a single server to drives
restores at maximum throughput.
The disaster recovery site, in this example, does not have redundancy
of servers or switches. It could still have RAID-protected spindles,
of course. Replication between the two sites can be done from array
to array (via proprietary protocols over storage links like ESCON
or Fibre), or via host software. Host-based replication software
can be at the block level or application level. For instance, Sun's
SNDR copies any changed block across a WAN to a mirror disk. In
this manner, the disks at the remote site look exactly like those
at the source, but with a time delay dependent on the width of the
WAN pipe and the latency caused by the distance the bits have to
travel. Application-level replication, such as Oracle Advanced Data
Replication, understands transactions within the application and
duplicates them to a similar application running at the remote site.
All of these forms of replication have their pros and cons, depending
on your needs.
Next Month
Next month in the Solaris Corner, SAN consolidation is completed
with information on product selection, implementation planning,
implementation methodologies, and some final notes designed to help
you assure a successful consolidation project.
Peter Baer Galvin (http://www.petergalvin.org) is the
Chief Technologist for Corporate Technologies (www.cptech.com),
a premier systems integrator and VAR. Before that, Peter was the
systems manager for Brown University's Computer Science Department.
He has written articles for Byte and other magazines, and
previously wrote Pete's Wicked World, the security column,
and Pete's Super Systems, the systems management column for
Unix Insider (http://www.unixinsider.com). Peter is
coauthor of the Operating Systems Concepts and Applied
Operating Systems Concepts textbooks. As a consultant and trainer,
Peter has taught tutorials and given talks on security and systems
administration worldwide.
|