Building and Using a SAN: Part I

W. Curtis Preston

In the past few months, I've discussed many things about storage area networks. I first talked about what life has become like for those who do not have a SAN, and how a SAN could make things a whole lot better. Then I talked about Fibre Channel, the underlying technology behind SANs (see Figure 1). I followed that with a summary of the different building blocks that are used to build a SAN, which include the elements in the following list:

Servers -- The servers are what give a SAN its purpose. Each server will connect to the SAN by one or more Fibre Channel connections.

Host Bus Adaptors (HBAs) -- An HBA is the proper term for what is commonly called a "Fibre Channel card." The Fibre Channel cable connects to the server via the HBA, which resides in the server.

Hubs -- By connecting a server to a hub, you are connecting it to an arbitrated loop. An arbitrated loop can contain only 126 devices, but a public loop can be connected to a switched fabric, just as a shared Ethernet network can be connected to a switched Ethernet backbone by connecting the hub into a port on a switch.

Switches -- In a pure switched fabric topology, each server will connect to the SAN via its own port on a Fibre Channel switch. Multiple switches can be cascaded together to create larger SANs, up to 16 million nodes.

Routers -- A router (sometimes referred to as a bridge) bridges the gap between serial SCSI (i.e., Fibre Channel) and parallel SCSI, allowing you to connect legacy SCSI devices, such as tape drives and older disk drives, to your SAN.

Disk Systems -- Disk systems come in all shapes and sizes, including single Fibre Channel disks, JBOD arrays, RAID in a box, and enterprise-level storage arrays. With the proper hardware and software, disk systems connected to a SAN can be much easier to manage, allocate, and repair.

Tape systems -- Connecting tape systems to a SAN (usually via a SAN router) allows you to share these tape systems with multiple servers, significantly increasing the usability of large tape libraries.

Cabling -- Although it may seem obvious, the cabling is an essential part of the SAN. Most Fibre Channel cabling uses fiber optic cables, although copper cables can be used where distances are shorter and cost is an issue. One interesting difference between fiber optic cables and legacy SCSI cables is that, while they may be a bit more fragile, they either work or they don't. Think about the number of times that you held a SCSI cable in your hand that looked fine but performed poorly, and you should be able to appreciate this difference!

Software -- Without software, much of the functionality made possible by SANs would actually result in chaos. Software does a number of things, including controlling access to the hundreds or thousands of devices connected to the SAN.

Using the SAN

Why are we talking about how to use the SAN before talking about how to build the SAN? The answer to that question is simple. Until you know what you can do with a SAN, you shouldn't be building one in the first place! Therefore, I'm going to talk about the things that a SAN makes possible to give you some ideas of how you might use one to make your life easier. Then, I'll explain how to make these things a reality by actually buying and building the SAN.

There are essentially three things that the SAN in Figure 1 makes possible:

1. Offline storage consolidation

2. Online storage consolidation

3. Truly highly available systems

(This is speaking very broadly, of course.) Offline storage consolidation is just a different way of saying that you can share a tape library between multiple servers. Online storage consolidation allows you to create a pool of disk systems that can be allocated to the servers that need them. Because a SAN allows for multiple servers to access multiple physical disks via multiple physical paths, it also allows for truly highly available systems. Because this lost+found column is dedicated to backup and recovery, I will cover the topic of offline storage consolidation first. Also, when covering the other two topics, I will concentrate on the ways in which they benefit or affect backup and recovery.

Offline Storage Consolidation

In the upper right-hand corner of Figure 1, you can see a tape library connected to the SAN via a SAN router. Assuming that there is a zone that contains the tape library and the three servers, Server-A, Server-B, and Server-C will all see each of the tape drives in that tape library as if they were physically attached. In this configuration, the robotic device is also connected to one of the SCSI buses on the back of the SAN router. This makes the robotic device also visible to all three servers. Each server will then be able to build device files just as they would for a parallel SCSI-attached tape drive or robotic device.

Now what do we do? This allows each server to attempt to take control of both the robotics and the tape drives! If we actually allowed each server to do so independently, chaos would result. This is where the robot and drive-sharing software from your backup software vendor comes into play. It will act as a "traffic cop," allocating tape drives to the servers that need them. How this actually works will vary greatly between vendors. The description below is valid for several of the large vendors.

One server is configured to be the main server (e.g., NetWorker server, or NetBackup Master Server). It will hold the backup database and scheduling information. (The backup database will keep track of what has been backed up to what tape.) This server is also typically dedicated as the robotic control server for the robot in the tape library. (This will cause it to act as the "traffic cop" discussed above.) Each of the other two servers are then configured as media servers (e.g., NetWorker Storage Node, or NetBackup Media Server). Each drive within the tape library is then configured as a "virtual" drive that each server must request to use.

When one of the servers needs to back up or recover, it requests a drive from the main server. If the main server has an available tape drive, it may immediately assign that drive for use by the requesting server. Once that has been done, the drive will no longer be available to other servers until it has been released by the server that is using it. If another server requests a drive, and there are no available drives, a server requesting a tape drive will be told to wait until a drive becomes available. When another server finishes with its drive, it tells the main server that it no longer needs the drive. The main server can then allocate that drive to another server that is waiting for a drive.

As you can see, this setup results in dynamic allocation of tape drives to the servers that need them. This also allows each server connected to the SAN to back up their entire data set directly to its own tape drive. It can do this without transferring the data via the LAN, and without forcing the administrator to allocate one or more tape drives to its exclusive use.

This setup allows you to really maximize the return on investment for tape libraries. Instead of buying many smaller libraries and connecting a few drives to each server, you can buy one or two really large tape libraries, and dynamically allocate the tape drives to the servers that need them. What a savings! By buying two libraries and connecting them both to the SAN, you even increase the availability of tape drives to all servers. In the pre-SAN days, a given media server could only rely on the tape library that was physically attached to it. Now, every time you buy a new library, you can add a new level of availability to your backup system, because every server can use every library.

Online Storage Consolidation

Many large data centers have thousands of discrete disks between all of their servers. There are a lot of disadvantages to this that can be solved with SANs. When one disk fails, the server must be taken down to repair it. Although this can be solved by buying RAID arrays with hot-swappable disks and at least one hot spare for each array, allocating a hot spare for each array can be expensive. Also, if the hot spare has already been used and has not been replaced, you cannot automatically use the spare from one server to fix a bad drive in another server. You wouldn't even want to do this manually, because doing so would degrade the integrity of the server from which you borrowed the drive.

Consider aging or decrepit disk drives. I remember one client that had an entire set of servers that could not afford down time. Behind these servers were disk drives with a known problem that wasn't discovered until the servers had been in production for a long time. They would occasionally leak swag oil on the disk, rendering it useless. The vendor was offering free replacements, but doing so took over a year due to the downtime and hassle required to make the swap. Although this is similar to the last problem, it's slightly different. Using discrete disks does not easily allow for preventive replacement of drives that you know are old or potentially faulty.

Another manageability issue is space allocation. It is a common to have one server starving for disk space, while another has disks to spare. I know of one very large data center where they had individual servers that were out of space, but the entire data center was actually only using 10% of its total capacity. They had over 30 TB of available spinning disk, but couldn't allocate a single disk to the servers that needed them. Doing so would have required physically moving the disk from one server to another and downtime for both servers.

Putting your disks behind a SAN can solve all three of these problems as well as others. Consider the first two problems. One way to solve them would be to put all of your storage on a rather large storage array behind the SAN. This array could have redundant power supplies, paths, and disks. This array can also contain a small pool of hot spare disks. This would require far fewer disks than having a hot spare for each system. When any disk in the array fails, it can be automatically replaced by the storage array, and the array would notify you that you need to replace the bad drive. The bad drive could then be replaced online. This would be the "hardware" solution, and is the most tested method of solving those problems. However, you don't have to have an expensive array. You could simply place your JBOD disk behind your SAN, and use enterprise volume management software to manage your disks and hot spares.

Either of these solutions also makes proactive maintenance much easier. You simply buy the new disk, connect it to the array (or the SAN), and use the appropriate volume manager to make the new disks active and the old disks inactive. The old or failing disks can then be moved out of service. Perhaps the greatest benefit of consolidating your storage behind a SAN is dynamic space allocation. Just as tape drives can be allocated to the servers that need them, disk drives need only be allocated to the servers that need more storage. If you find that you've allocated too much storage for a given server, simply deallocate that storage from that server, and return it to the pool of storage that is available for servers that need it. If a given server is out of storage, and you don't have any more disk available in the spare storage pool, you can just buy more disk, and attach it to the SAN! Buying disks this way would also allow you to buy bigger arrays that are cheaper per GB. If you were using discrete disks and only needed 10 GB, you might be able to justify a 20-GB array, but not a 200-GB array, which would be much cheaper per GB. Someone would ask you to prove that the server will need soon need 200 GB. However, if you were putting that 200 GB behind the SAN, allowing any server to use it when it needs it, you could do just that! It would be much easier to justify that the entire data center will soon need 200 new GB of disk.

In Part II of "Building and Using a SAN", I will cover:

How online storage consolidation simplifies backups.
How SANs make creating highly available systems easier.
What's involved in building a SAN.

W. Curtis Preston has specialized in storage for over eight years, and has designed and implemented storage systems for several Fortune 100 companies. He is the owner of Storage Designs, the Webmaster of Backup Central (http://www.backupcentral.com), and the author of two books on storage. He may be reached at curtis@backupcentral.com. (Portions of some articles may be excerpted from Curtis's books.)