Cover V11, I06

Article
Figure 1

jun2002.tar

Storage Consolidation -- Part 1-- Design

Peter Baer Galvin

Regardless of the new technologies that promise decreasing storage needs, those needs keep increasing. Whether it is user files, databases, human resources information, or research results, disks keep getting filled. The trend toward consolidated storage is strong and getting stronger. This month in the Solaris Corner, I'll discuss the business needs for consolidated storage, provide a process for designing a storage solution, and look at the two leading solutions: SAN and NAS. Next month, in part 2, I will cover the implementation of consolidated storage, including some complexities you may not expect.

Unfortunately for facilities managers, but fortunately for technology providers, storage solutions keep advancing and increasing in complexity. Not long ago, a site would run a combination of direct-attached storage (DAS) and network file services (NFS or CIFS). Occasionally there was dual-attached storage to enable high-availability clustering. Anyone requiring more connections generally installed a proprietary central-storage unit (e.g., an EMC Symmetrix). Disk spindles were spread across these devices, leading to the need for separate management and capacity planning. Spindles were wasted because extra capacity was needed in each set of storage.

The Storage Evaluation

With the advent of storage switches, there is a new solution where DAS would have been used. A SAN consists of one or more storage switches, connecting hosts and storage. The benefits of this new model are many, as are the complications. The first step in a storage consolidation project is a storage evaluation:

  • Tools such as the Sun HighGround can be installed on all storage-centric systems to evaluate how much space is used and free throughout a facility. It can also tell how much is "old" and "new" to determine whether HSM would be useful.
  • ROI determination comes from the space information garnered, plus a thorough look at maintenance costs of existing storage, floor space, power and cooling savings from consolidating, and management costs of running disparate storage vs. consolidated storage.
The Needs Analysis

Based on the evaluation, you will have more complete knowledge of your current environment and its costs. Next, each of these key points needs careful consideration:

  • A SAN solution for systems needing "dedicated" storage. You'll need a separate SAN for each distant physical location. Some sites also implement separate SANS for production and QA/Development.
  • NAS is still a good solution where multiple machines need access to the same LUNs (logical units, or mount points). For example, multiple Web servers could mount their HTML contents from a single NAS mount, so changes need only be made at one place.
  • DAS is still useful for spot solutions, where low cost and low capacity are the driving factors. Care must be taken, or your facility will end up needing consolidation again!

For each server in the facility, also consider these needs:

  • How many LUNs are needed per server?
  • Capacity need of each LUN?
  • RAID level of the LUN (RAID 1, 0+1, 1+0, or 5?). Keep in mind the performance and failure recovery impact of the choice. Certainly, most sites run with a variety of RAID levels.
  • Caching need, especially if the LUN is performance-oriented and will receive random writes. Also consider whether it is a small LUN that would greatly benefit performance if it were entirely in memory (either locked into cache or on a RAM disk).
  • The need for further data protection via "snapshots", either full mirrors or copy-on-write block copies. Snapshots can be used for backups, or to allow a recent history of the LUN to be kept for fast restoration or testing.
  • The need for single or redundant paths from the host to the LUN. Generally, production LUNs need redundant access paths (for SAN or DAS), or a highly available networking (for NAS).
  • The need for dynamic multipathing, in which redundant paths are used until one fails, and then all the data is seamlessly routed via the remaining paths. Note that some multipath solutions will dynamically reuse the failed path when it again becomes available, while others require a reboot before full multipathing can be restored.
  • The throughput needed between the host and the LUN, both at peak use and on average. If the use is "peaky", when are the use peaks per LUN?
  • Replication need (to mirror data off-site) for disaster recovery. Replication has its own complexities. Chiefly, how much data needs to be replicated per second, both peak and on average?
  • Backup requirements can involve their own analysis project. Interesting data points include the downtime allowed, the type of data to be backed up, the restoration time limits, the volume of data to back up during backup window, and the communications capacity needed to do so. SANS can be leveraged to provide LAN-free backups. This can be thought of as a tape-drive consolidation project as well.
  • Should HSM (hierarchical storage management) be implemented for the LUN or server-wide? HSM holds promise but again, adds its own complexity. If large amounts of data are going to be write-once read-seldom, then HSM is worth considering.
  • How secure do the contents need to be? Because NAS is network-based, and SANs can have network-based management, security must be considered when doing storage consolidation planning.
  • For NAS use, also consider the NAS protocol needed and the file access controls needed. For example, if a volume will be accessed via NFS and CIFS, do you need file access control lists to work under both protocols to the same file?

The end result of this analysis is the raw data needed for storage architecture design.

The Design

Based on the above analysis, a core storage configuration can be determined. However, there are still other aspects to consider before settling on a design. Although individual scalability needs are not important in a consolidated environment, overall scalability is still of interest. How much growth will there be in the total storage needed? Until recently, NAS and SAN storage were two separate islands, needing their own capacity management. The latest NAS appliance technologies allow a NAS head to be connected to the SAN infrastructure. Spindles can be shared, and technologies such as storage-based replication can be unified. True high-availability NAS heads are not quite ready yet, but should be in the near future. When that occurs, consolidation of NAS and SAN will begin in earnest, with the ensuing management and planning benefits.

Taken together, the information determined above can generate a complete storage design. Of course, product function and limitations can cause a redesign, or at least a modification to the design, if the best solution is not attainable.

Figure 1 depicts a sample storage consolidation design. At the production site, there are redundant switches to servers that used to have DAS storage to connect to the disk array pool. That pool consists of two storage arrays (they need not be from the same manufacturer, for example). By having servers with redundant connections to switches, and switches with redundant connections to the storage arrays, the entire SAN is fault-tolerant. A NAS head gives network access to files stored on any LUNs mounted to the NAS device. For example, user files could be made accessible via the net to all clients.

Also in Figure 1 is a tape library attached to the SAN switch. HSM and backup software could transfer files between the storage, the servers, and the tape drives, without ever having the data traverse any network. Some software, such as Veritas NetBackup, provides for tape sharing among servers. For example, rather than having each server connect to a specific tape drive via the SAN, the tape drives are a pool, and the software dynamically allocates tape drives to a server, performs the backup, and de-allocates the drives so they can be used for backup of other servers. This kind of tape consolidation is very helpful for restores, in which case all tape drives can be dynamically attached to a single server to drives restores at maximum throughput.

The disaster recovery site, in this example, does not have redundancy of servers or switches. It could still have RAID-protected spindles, of course. Replication between the two sites can be done from array to array (via proprietary protocols over storage links like ESCON or Fibre), or via host software. Host-based replication software can be at the block level or application level. For instance, Sun's SNDR copies any changed block across a WAN to a mirror disk. In this manner, the disks at the remote site look exactly like those at the source, but with a time delay dependent on the width of the WAN pipe and the latency caused by the distance the bits have to travel. Application-level replication, such as Oracle Advanced Data Replication, understands transactions within the application and duplicates them to a similar application running at the remote site. All of these forms of replication have their pros and cons, depending on your needs.

Next Month

Next month in the Solaris Corner, SAN consolidation is completed with information on product selection, implementation planning, implementation methodologies, and some final notes designed to help you assure a successful consolidation project.

Peter Baer Galvin (http://www.petergalvin.org) is the Chief Technologist for Corporate Technologies (www.cptech.com), a premier systems integrator and VAR. Before that, Peter was the systems manager for Brown University's Computer Science Department. He has written articles for Byte and other magazines, and previously wrote Pete's Wicked World, the security column, and Pete's Super Systems, the systems management column for Unix Insider (http://www.unixinsider.com). Peter is coauthor of the Operating Systems Concepts and Applied Operating Systems Concepts textbooks. As a consultant and trainer, Peter has taught tutorials and given talks on security and systems administration worldwide.