What
I Did Instead of Buying a SAN
Adam Anderson
The key concept of a SAN is that storage hardware (i.e., disk
and tape devices) is abstracted away from the server and, more importantly,
the network operating system (NOS). For instance, it is not unusual
to have ample aggregate disk space in an enterprise, but have free
blocks in all of the wrong partitions. A SAN solution could remedy
this problem by aggregating all storage independently of server
and partition conditions. Further complicating matters are the issues
of sparing, testing, and purchasing additional disks and arrays
for the typical mesh of different hardware platforms. The promise
of a SAN is to allow a centrally managed, fault-tolerant, and standardized
set of arrays to fulfill all the storage needs of an enterprise.
So, instead of learning various RAID management interfaces, sparing
many types of drives, and having all your free disk space attached
to the wrong server/partition/NOS, SAN technology promises a fully
independent and standardized deployment of storage.
Actual SAN solutions, at the price points that fit budgets in
typical environments, have not delivered on these promises. Often
requiring forklift upgrades and extensive re-engineering of the
existing storage model, SAN technology has always been well out
of reach of most environments. In situations where SAN technology
is not cost-effective, I have used two alternative storage technologies
that deliver much of the value of the "ultimate SAN" at
reasonable cost and fit well into typical server-centric storage
models.
These solutions fall into two categories, each of which I will
examine via description of the actual design and deployment process
of a representative product. The first type is a multi-port disk
array that uses a SCSI-to-SCSI interface to achieve NOS independence
and has excellent centralization and standardization benefits. The
second type is a network-attached server appliance that uses a stripped-down
NOS and speaks one or more standard file-service protocols. This
article will focus on the design and deployment considerations of
the type of device and some aspects of its operation, rather than
an in-depth discussion of each product's merits or shortcomings.
Winchester Flashdisk: Design Considerations
One approach to storage is to consolidate several disks and SCSI
interfaces in a single device. The Flashdisk from Winchester Systems
integrates multiple SCSI interfaces into a single disk array, allowing
several servers to attach simultaneously. Since the interface to
the server is SCSI based, a vast array of NOS flavors and versions
are supported without the need for specific RAID drivers. The configuration
I have used is the Flashdisk OpenRAID Rackmount, configured with
twelve 36.4-GB disks and two power supplies. The RAID partitions
on the unit are managed independently from the SCSI interfaces,
so the flexibility in configuration is very high. Figure 1 shows
a typical deployment of the device.
Because the interface to the remainder of the environment is SCSI,
the target NOS needs only to support an approved SCSI interface
card (such as the Adaptec 2940U2W or 29160). The first environment
in which I deployed this solution had Windows NT 4.0, Netware 4.2,
and various UNIX flavors in active use. In almost every instance,
the servers attached to each Flashdisk were of dissimilar NOS types.
We installed the boot partition and data partitions of each server
to the Flashdisk, so we could recover from a failed non-disk component
by simply swapping out the server itself. The entire install and
configuration of the server itself resided on the array. Additionally,
for high-availability situations, it is simple to deploy the backup
or cluster partner of a primary server to a second, separate Flashdisk
unit.
Servers can be added and removed from the Flashdisk while it is
operational if you're careful not to disturb the SCSI cabling.
Essentially, the unit provides a centrally managed "virtual"
disk to each server and its installed NOS. At boot time, the BIOS
of the SCSI adapter identifies a disk of type "Flashdisk"
as attached, and the geometry of this pseudo-disk is simulated by
the array. The server, NOS, and SCSI interface card have no indication
that an array, rather than a disk drive, is actually attached. The
sparing and maintenance are significantly simplified, because there
is only one RAID interface to learn, one type of disk drive and
chassis to keep on hand, and no tight dependency of the set of servers
on multiple hardware drive and controller types. It is not unusual
to discard or fail to reuse proprietary disks and array components
when a server's processor or memory expandability has been
exhausted. Because the consolidated approach decouples the storage
from the server itself, both the ease of reuse and the total usable
lifetime of the array are significantly increased.
Winchester Flashdisk: Deployment Experience
The rack-mountable version of the array ships in a 5U enclosure
with 8 full- or 12 half-height slots, 2 power supplies, and a variety
of hardware cache size and controller redundancy options. Ours were
configured for RAID 1+0 and yielded usable space of ~180 GB with
two drives configured as in-chassis hot spares. In a RAID 5 configuration
with no spares, the usable space would be ~400 GB. Our applications
utilized both file-based and client-server RDBMS systems, where
the file-based DB was our legacy application. The advantages of
RAID 1+0 are several with only one drawback -- a 50% usable
space to installed disk ratio. Performance on long reads is superior
to RAID 5, and overall fault tolerance is improved because of the
fully redundant mirror/stripe nature of RAID 1+0. Another advantage
is that the replacement of a failed disk requires I/O only from
its mirrored partner for rebuild under RAID 1+0, and not all of
the other disks as in RAID 5.
After we physically racked the array, and configured and initialized
it, we subdivided the usable, raw space into partitions and assigned
them to the SCSI interfaces for server connections. Each partition
can be assigned to one or more interfaces. Thus, this unit can support
shared-disk clustering schemes, although I have not used it this
way. As noted previously, this assignment can occur dynamically,
so, in use, it is possible to assign some of the raw disk space
to a partition and interface and leave the remainder unassigned.
As servers are added to the environment, each can be added and space
apportioned without disturbing the others. It is not, however, possible
to extend an existing partition without dropping it at the server/NOS
level and reformatting or otherwise remaking the file system.
In one case, we did add a new partition to an existing and in-use
SCSI interface. After a reboot and quick format, the space was available.
Since we needed it only temporarily, the partition was returned
to the unused pool a few weeks later. If we had purchased a new
drive cage, RAID card, and drives and then had to configure and
deploy them, more staff time would have been consumed, and later
there would have been excess capacity on a platform that could not
easily be leveraged. This solution is essentially a SAN with limited
expandability and simple configuration options. The lack of exotic
Fibre Channel or SCSI switching and absence of a high-priced software
management license are a welcome change from "high-end"
SAN solutions. All of the technologies used should be very familiar
to administrators and engineers, since the interconnection is SCSI,
and the RAID controller presents each partition to the BIOS and
NOS as a single SCSI disk.
Winchester Flashdisk: Performance
Along with data growth considerations, disk I/O performance is
an acutely pressing issue for administrators. As data stores grow
and applications become more complex, solutions to I/O-bound applications
become more critical. By pooling the disk spindles and creating
partitions across them, the consolidated approach of the Flashdisk
and similar devices seeks to improve performance/cost tradeoffs.
If budgets were not a constraint, each server could have a dedicated
array of many, many disks and a massive read-ahead cache to itself.
Before the deployment of the Flashdisk arrays, most servers had
three or four disks each and very little cache on the RAID controller
(due to cost considerations). The belief was that the consolidated
array would better leverage a large cache and 12 disks at RAID 1+0
than 4 separate smaller arrays at RAID 5. We carefully balanced
the expected load from each server and application against the projected
deployment to the arrays, to prevent overloading any one of them.
We also found that our I/O load patterns were bursty enough to allow
very high-performance sharing of the array between our servers.
The performance we observed after consolidating 12 large disks in
the array and sharing it across servers was a dramatic improvement
over having a few disks attached to each individual server. A strong
advantage of the Flashdisk is that the cost of 12 disks and 1 high-end
chassis and controller is roughly comparable to the cost of 12 disks
and 4 low-end controllers and drive cages for individual servers.
In many cases, the cost/GB was actually lower for the Flashdisk
than for the array option for the servers we were using.
In actual testing, via the simple tactic of copying a 4.5-GB directory
from the server to five other servers simultaneously, steady state
performance was just above 12 MB/s and saturated a 100-Mb/s Switched
Fast Ethernet link. The files were copied repeatedly from the server
to drive the data out of the 256-MB cache and fully exercise the
array itself. More basic testing yielded 15-16 MB/s copying data
from one partition to another on the same unit, between two servers
with gigabit network interfaces. The largest improvement was in
backup speed, however, and we cut the total time nearly in half.
The most significant improvement was a server that went from an
average of 50 MB/minute to more than 200 MB/minute after the upgrade.
By no means are these numbers based on any disciplined methodology
or testing suite, but they do give a sense of the improvement in
applications and especially backup times that can be gained by going
to a dedicated high-performance storage platform.
Network Appliance Filer: Design Considerations
Another approach to consolidated storage is to eliminate the server
altogether. An alternative to SCSI-based disk arrays, the set of
appliances, network-attached storage (NAS) or filer devices uses
a stripped down NOS to provide the protocol over which file service
occurs. Many of these devices also support mounting of a partition
or directory on the device for use as host to the database devices
of an RDBMS platform. The key consideration here is that the appliance
you choose must support all of the file-service protocols you need,
such as CIFS using NETBIOS over TCP, NFS, and HTTP. The Network
Appliance Filer is an example of this approach and speaks all of
the protocols above. CIFS is used for integration into a Windows
NT/2000 domain and allows the Filer to appear to the other clients
and servers as if it were a server itself. NFS support is, of course,
for UNIX environments, and the HTTP functionality allows integration
of the device into large-scale Web hosting and e-commerce environments.
Since the Filer integrates as a sort of doppelganger into the
existing NOS environment, the deployment is not as simple as the
SCSI interface approach with consolidated disk storage. In contrast
as well, the Filer offers SnapMirror, a real-time replication product
for use between two filers and Snapshots, a backup technique that
produces several read-only copies of the data online each day for
rapid file recovery. Also, the available space can be grown easily
via the addition of disks and is available without any system interruption.
Because the customized NOS of the Filer provides a layer of abstraction
between the CIFS, NFS, and HTTP interfaces, many operations that
require reformatting of a partition or reboot of the server can
be performed transparently.
All of these features come at roughly twice or more $/GB above
the server-attached solution described above. Other critical considerations
are the existence and quality of support for the NOS file-service
protocols you require. If Netware is your platform, this type of
storage may well be useless, because I am not aware of any vendor
that provides network-attached storage via the NCP protocol. The
deployment of a very large data store on a Filer also all but mandates
the use of a very high-speed (gigabit or ATM) interface to allow
the other clients and server to make use of its I/O prowess. An
example deployment is shown in Figure 2.
Network Appliance Filer: Deployment Experience
The Filer is simple to deploy and configure, and the management
tools and design of the device are geared heavily toward this goal.
Setup is accomplished via the command line or a Web-based setup
wizard. The experience is very similar to turning up a server, although
care must be taken during the initial IP setup since the Filer will
use DHCP to get an address. The wrinkle is that the Filer will not
renew the DHCP lease, so it is necessary to either assign another
IP address statically to the Filer, or use the DHCP management tools
to dedicate the initial IP from the DHCP assignment to the Filer
permanently.
For CIFS/Windows environments, the Filer receives a domain account
to allow integration and management of security via the domain tools.
Under the CIFS/SMB file-service model, the server authenticates
users and enforces permissions, so the domain account is necessary
to allow access to the domain-level authentication and authorization
information. For NFS environments, the Filer supports NIS as a client
to allow centralized administration of the files that control access
permissions. With an NFS environment, unlike CIFS, the Filer exports
NFS mounts and relies on the client to perform authentication of
users accessing the exported directories from that machine. Additionally,
if multiple protocols are in use, the differences in the protocol's
treatment of file system dates, case-sensitivity, file access, and
other issues must be dealt with.
In our environment, only the CIFS protocol is in use, and the
predominant function of the Filer is to provide the storage platform
for the applications hosted for our clients. Each Web server uses
directories on the Filer for its Web root, allowing synchronization
of content and applications across the load-balanced Web servers.
Additionally, the SnapMirror data replication feature is used to
provide a backup of all Web content and data at a second data center.
Figure 3 shows this configuration.
Network Appliance Filer: Performance
Both the command-line and SNMP performance management functions
are excellent. The Filer command-line tool systat provides
CPU, disk I/O, network interface I/O, and cache aging information
via the console interface at a user-configurable interval. All of
this information at a greater level of detail and granularity is
also available via custom SNMP MIBs for use with a network management
framework.
Our Filer provides Web content and data storage for our online
banking applications in a hosted data center environment. At the
time of this writing, there were 186 customers hosted in the primary
data center with their data on the Filer, all of which are mirrored
via SnapMirror to the Filer in the redundant data center. An analysis
of the performance data from the 1 p.m.-5 p.m. CST period on a typical
weekday showed 30-50% CPU utilization to provide an average of 2526
CIFS operations per second. The average disk channel and network
interface utilization were 1.13 MB/s and .62 MB/s. This puts the
average CIFS operation at roughly 256 bytes, which is indicative
of the nature of our application. The CPU utilization is high given
the relatively low total disk throughput, and assessment of the
performance data shows an extremely strong correlation between CPU%
and CIFS Ops/s.
As the peak load period passed, CPU load fell with CIFS Ops/s
and the average cache sitting time increased dramatically, nearly
doubling as the overall load on the Filer decreased roughly 33%.
One last interesting observation is that while network utilization
and disk read/write counters are almost perfectly correlated (not
a surprise), the throughput across the network interface is almost
twice that of the disk I/O. I expected to see some encapsulation
and protocol inefficiency with CIFS, but a 2:1 ratio of network
traffic to disk I/O was much more than I expected. Again, the small
average read/write size is almost certainly the culprit here. As
always, since each environment is unique, this performance discussion
is geared more toward demonstrating the quality of the Filer's
toolset than providing any useful benchmarking information to a
prospective user. Both Network Appliance and Winchester Systems
have excellent Web sites with extensive information about the performance
of their products, which I encourage anyone who is interested to
read and draw their own conclusions as a result.
Conclusion
Storage infrastructure choices are complicated by both the large
fixed costs associated with high-availability/performance offerings
and the seemingly boundless appetite of a typical enterprise for
disk space and throughput. A bewildering array of non-compatible
solutions further challenges IT planners and engineers alike. The
goal of this discussion has been to present two alternatives as
representative of typical vendor approaches to the set of storage
problems enterprises face. I hope the information presented here
will assist those facing this difficult set of decisions.
Adam Anderson is an IT Manager who knows the pain of slow backups
and shrinking free partition space. He can be contacted at: adam_d_anderson@hotmail.com.
|