sep2001.tar

What I Did Instead of Buying a SAN

Adam Anderson

The key concept of a SAN is that storage hardware (i.e., disk and tape devices) is abstracted away from the server and, more importantly, the network operating system (NOS). For instance, it is not unusual to have ample aggregate disk space in an enterprise, but have free blocks in all of the wrong partitions. A SAN solution could remedy this problem by aggregating all storage independently of server and partition conditions. Further complicating matters are the issues of sparing, testing, and purchasing additional disks and arrays for the typical mesh of different hardware platforms. The promise of a SAN is to allow a centrally managed, fault-tolerant, and standardized set of arrays to fulfill all the storage needs of an enterprise. So, instead of learning various RAID management interfaces, sparing many types of drives, and having all your free disk space attached to the wrong server/partition/NOS, SAN technology promises a fully independent and standardized deployment of storage.

Actual SAN solutions, at the price points that fit budgets in typical environments, have not delivered on these promises. Often requiring forklift upgrades and extensive re-engineering of the existing storage model, SAN technology has always been well out of reach of most environments. In situations where SAN technology is not cost-effective, I have used two alternative storage technologies that deliver much of the value of the "ultimate SAN" at reasonable cost and fit well into typical server-centric storage models.

These solutions fall into two categories, each of which I will examine via description of the actual design and deployment process of a representative product. The first type is a multi-port disk array that uses a SCSI-to-SCSI interface to achieve NOS independence and has excellent centralization and standardization benefits. The second type is a network-attached server appliance that uses a stripped-down NOS and speaks one or more standard file-service protocols. This article will focus on the design and deployment considerations of the type of device and some aspects of its operation, rather than an in-depth discussion of each product's merits or shortcomings.

Winchester Flashdisk: Design Considerations

One approach to storage is to consolidate several disks and SCSI interfaces in a single device. The Flashdisk from Winchester Systems integrates multiple SCSI interfaces into a single disk array, allowing several servers to attach simultaneously. Since the interface to the server is SCSI based, a vast array of NOS flavors and versions are supported without the need for specific RAID drivers. The configuration I have used is the Flashdisk OpenRAID Rackmount, configured with twelve 36.4-GB disks and two power supplies. The RAID partitions on the unit are managed independently from the SCSI interfaces, so the flexibility in configuration is very high. Figure 1 shows a typical deployment of the device.

Because the interface to the remainder of the environment is SCSI, the target NOS needs only to support an approved SCSI interface card (such as the Adaptec 2940U2W or 29160). The first environment in which I deployed this solution had Windows NT 4.0, Netware 4.2, and various UNIX flavors in active use. In almost every instance, the servers attached to each Flashdisk were of dissimilar NOS types. We installed the boot partition and data partitions of each server to the Flashdisk, so we could recover from a failed non-disk component by simply swapping out the server itself. The entire install and configuration of the server itself resided on the array. Additionally, for high-availability situations, it is simple to deploy the backup or cluster partner of a primary server to a second, separate Flashdisk unit.

Servers can be added and removed from the Flashdisk while it is operational if you're careful not to disturb the SCSI cabling. Essentially, the unit provides a centrally managed "virtual" disk to each server and its installed NOS. At boot time, the BIOS of the SCSI adapter identifies a disk of type "Flashdisk" as attached, and the geometry of this pseudo-disk is simulated by the array. The server, NOS, and SCSI interface card have no indication that an array, rather than a disk drive, is actually attached. The sparing and maintenance are significantly simplified, because there is only one RAID interface to learn, one type of disk drive and chassis to keep on hand, and no tight dependency of the set of servers on multiple hardware drive and controller types. It is not unusual to discard or fail to reuse proprietary disks and array components when a server's processor or memory expandability has been exhausted. Because the consolidated approach decouples the storage from the server itself, both the ease of reuse and the total usable lifetime of the array are significantly increased.

Winchester Flashdisk: Deployment Experience

The rack-mountable version of the array ships in a 5U enclosure with 8 full- or 12 half-height slots, 2 power supplies, and a variety of hardware cache size and controller redundancy options. Ours were configured for RAID 1+0 and yielded usable space of ~180 GB with two drives configured as in-chassis hot spares. In a RAID 5 configuration with no spares, the usable space would be ~400 GB. Our applications utilized both file-based and client-server RDBMS systems, where the file-based DB was our legacy application. The advantages of RAID 1+0 are several with only one drawback -- a 50% usable space to installed disk ratio. Performance on long reads is superior to RAID 5, and overall fault tolerance is improved because of the fully redundant mirror/stripe nature of RAID 1+0. Another advantage is that the replacement of a failed disk requires I/O only from its mirrored partner for rebuild under RAID 1+0, and not all of the other disks as in RAID 5.

After we physically racked the array, and configured and initialized it, we subdivided the usable, raw space into partitions and assigned them to the SCSI interfaces for server connections. Each partition can be assigned to one or more interfaces. Thus, this unit can support shared-disk clustering schemes, although I have not used it this way. As noted previously, this assignment can occur dynamically, so, in use, it is possible to assign some of the raw disk space to a partition and interface and leave the remainder unassigned. As servers are added to the environment, each can be added and space apportioned without disturbing the others. It is not, however, possible to extend an existing partition without dropping it at the server/NOS level and reformatting or otherwise remaking the file system.

In one case, we did add a new partition to an existing and in-use SCSI interface. After a reboot and quick format, the space was available. Since we needed it only temporarily, the partition was returned to the unused pool a few weeks later. If we had purchased a new drive cage, RAID card, and drives and then had to configure and deploy them, more staff time would have been consumed, and later there would have been excess capacity on a platform that could not easily be leveraged. This solution is essentially a SAN with limited expandability and simple configuration options. The lack of exotic Fibre Channel or SCSI switching and absence of a high-priced software management license are a welcome change from "high-end" SAN solutions. All of the technologies used should be very familiar to administrators and engineers, since the interconnection is SCSI, and the RAID controller presents each partition to the BIOS and NOS as a single SCSI disk.

Winchester Flashdisk: Performance

Along with data growth considerations, disk I/O performance is an acutely pressing issue for administrators. As data stores grow and applications become more complex, solutions to I/O-bound applications become more critical. By pooling the disk spindles and creating partitions across them, the consolidated approach of the Flashdisk and similar devices seeks to improve performance/cost tradeoffs. If budgets were not a constraint, each server could have a dedicated array of many, many disks and a massive read-ahead cache to itself.

Before the deployment of the Flashdisk arrays, most servers had three or four disks each and very little cache on the RAID controller (due to cost considerations). The belief was that the consolidated array would better leverage a large cache and 12 disks at RAID 1+0 than 4 separate smaller arrays at RAID 5. We carefully balanced the expected load from each server and application against the projected deployment to the arrays, to prevent overloading any one of them. We also found that our I/O load patterns were bursty enough to allow very high-performance sharing of the array between our servers. The performance we observed after consolidating 12 large disks in the array and sharing it across servers was a dramatic improvement over having a few disks attached to each individual server. A strong advantage of the Flashdisk is that the cost of 12 disks and 1 high-end chassis and controller is roughly comparable to the cost of 12 disks and 4 low-end controllers and drive cages for individual servers. In many cases, the cost/GB was actually lower for the Flashdisk than for the array option for the servers we were using.

In actual testing, via the simple tactic of copying a 4.5-GB directory from the server to five other servers simultaneously, steady state performance was just above 12 MB/s and saturated a 100-Mb/s Switched Fast Ethernet link. The files were copied repeatedly from the server to drive the data out of the 256-MB cache and fully exercise the array itself. More basic testing yielded 15-16 MB/s copying data from one partition to another on the same unit, between two servers with gigabit network interfaces. The largest improvement was in backup speed, however, and we cut the total time nearly in half. The most significant improvement was a server that went from an average of 50 MB/minute to more than 200 MB/minute after the upgrade. By no means are these numbers based on any disciplined methodology or testing suite, but they do give a sense of the improvement in applications and especially backup times that can be gained by going to a dedicated high-performance storage platform.

Network Appliance Filer: Design Considerations

Another approach to consolidated storage is to eliminate the server altogether. An alternative to SCSI-based disk arrays, the set of appliances, network-attached storage (NAS) or filer devices uses a stripped down NOS to provide the protocol over which file service occurs. Many of these devices also support mounting of a partition or directory on the device for use as host to the database devices of an RDBMS platform. The key consideration here is that the appliance you choose must support all of the file-service protocols you need, such as CIFS using NETBIOS over TCP, NFS, and HTTP. The Network Appliance Filer is an example of this approach and speaks all of the protocols above. CIFS is used for integration into a Windows NT/2000 domain and allows the Filer to appear to the other clients and servers as if it were a server itself. NFS support is, of course, for UNIX environments, and the HTTP functionality allows integration of the device into large-scale Web hosting and e-commerce environments.

Since the Filer integrates as a sort of doppelganger into the existing NOS environment, the deployment is not as simple as the SCSI interface approach with consolidated disk storage. In contrast as well, the Filer offers SnapMirror, a real-time replication product for use between two filers and Snapshots, a backup technique that produces several read-only copies of the data online each day for rapid file recovery. Also, the available space can be grown easily via the addition of disks and is available without any system interruption. Because the customized NOS of the Filer provides a layer of abstraction between the CIFS, NFS, and HTTP interfaces, many operations that require reformatting of a partition or reboot of the server can be performed transparently.

All of these features come at roughly twice or more $/GB above the server-attached solution described above. Other critical considerations are the existence and quality of support for the NOS file-service protocols you require. If Netware is your platform, this type of storage may well be useless, because I am not aware of any vendor that provides network-attached storage via the NCP protocol. The deployment of a very large data store on a Filer also all but mandates the use of a very high-speed (gigabit or ATM) interface to allow the other clients and server to make use of its I/O prowess. An example deployment is shown in Figure 2.

Network Appliance Filer: Deployment Experience

The Filer is simple to deploy and configure, and the management tools and design of the device are geared heavily toward this goal. Setup is accomplished via the command line or a Web-based setup wizard. The experience is very similar to turning up a server, although care must be taken during the initial IP setup since the Filer will use DHCP to get an address. The wrinkle is that the Filer will not renew the DHCP lease, so it is necessary to either assign another IP address statically to the Filer, or use the DHCP management tools to dedicate the initial IP from the DHCP assignment to the Filer permanently.

For CIFS/Windows environments, the Filer receives a domain account to allow integration and management of security via the domain tools. Under the CIFS/SMB file-service model, the server authenticates users and enforces permissions, so the domain account is necessary to allow access to the domain-level authentication and authorization information. For NFS environments, the Filer supports NIS as a client to allow centralized administration of the files that control access permissions. With an NFS environment, unlike CIFS, the Filer exports NFS mounts and relies on the client to perform authentication of users accessing the exported directories from that machine. Additionally, if multiple protocols are in use, the differences in the protocol's treatment of file system dates, case-sensitivity, file access, and other issues must be dealt with.

In our environment, only the CIFS protocol is in use, and the predominant function of the Filer is to provide the storage platform for the applications hosted for our clients. Each Web server uses directories on the Filer for its Web root, allowing synchronization of content and applications across the load-balanced Web servers. Additionally, the SnapMirror data replication feature is used to provide a backup of all Web content and data at a second data center. Figure 3 shows this configuration.

Network Appliance Filer: Performance

Both the command-line and SNMP performance management functions are excellent. The Filer command-line tool systat provides CPU, disk I/O, network interface I/O, and cache aging information via the console interface at a user-configurable interval. All of this information at a greater level of detail and granularity is also available via custom SNMP MIBs for use with a network management framework.

Our Filer provides Web content and data storage for our online banking applications in a hosted data center environment. At the time of this writing, there were 186 customers hosted in the primary data center with their data on the Filer, all of which are mirrored via SnapMirror to the Filer in the redundant data center. An analysis of the performance data from the 1 p.m.-5 p.m. CST period on a typical weekday showed 30-50% CPU utilization to provide an average of 2526 CIFS operations per second. The average disk channel and network interface utilization were 1.13 MB/s and .62 MB/s. This puts the average CIFS operation at roughly 256 bytes, which is indicative of the nature of our application. The CPU utilization is high given the relatively low total disk throughput, and assessment of the performance data shows an extremely strong correlation between CPU% and CIFS Ops/s.

As the peak load period passed, CPU load fell with CIFS Ops/s and the average cache sitting time increased dramatically, nearly doubling as the overall load on the Filer decreased roughly 33%. One last interesting observation is that while network utilization and disk read/write counters are almost perfectly correlated (not a surprise), the throughput across the network interface is almost twice that of the disk I/O. I expected to see some encapsulation and protocol inefficiency with CIFS, but a 2:1 ratio of network traffic to disk I/O was much more than I expected. Again, the small average read/write size is almost certainly the culprit here. As always, since each environment is unique, this performance discussion is geared more toward demonstrating the quality of the Filer's toolset than providing any useful benchmarking information to a prospective user. Both Network Appliance and Winchester Systems have excellent Web sites with extensive information about the performance of their products, which I encourage anyone who is interested to read and draw their own conclusions as a result.

Conclusion

Storage infrastructure choices are complicated by both the large fixed costs associated with high-availability/performance offerings and the seemingly boundless appetite of a typical enterprise for disk space and throughput. A bewildering array of non-compatible solutions further challenges IT planners and engineers alike. The goal of this discussion has been to present two alternatives as representative of typical vendor approaches to the set of storage problems enterprises face. I hope the information presented here will assist those facing this difficult set of decisions.

Adam Anderson is an IT Manager who knows the pain of slow backups and shrinking free partition space. He can be contacted at: adam_d_anderson@hotmail.com.