Migrating Solaris to a Storage Area Network
Greg Schuweiler
I work on a team of three UNIX systems administrators in a premier medical clinic, hospital, and research institute located in the Midwest. We were looking for a better way to manage our growing storage requirements than continually requesting money from management for drives for yet another machine. Our environment consists of Solaris, HP, AIX, Linux, and a SCO box or two. For this reason, we wanted a general solution to our storage situation. We don't manage workstations, and most of the servers are used as database servers, mainly Sybase, a few Oracle installations, and an un-substantiated rumor of a big Informix database moving over from VMS.
We decided to start with a subset of our Solaris boxes for two reasons. First, the HP/9000s were running on a proprietary Fibre Channel arbitrated loop and as solid as a rock. And second, it was much easier getting down time on the smaller Solaris servers.
Before I discuss this any further I will define what we think a Storage Area Network (SAN) is and what it should provide. The best definition for a SAN is a pool of storage that many servers can access. Figure 1 depicts a simple SAN. I left out the switches, hubs, redundancy, and multiple paths for clarity. Ideally, any vendor's server could be connected into an existing or new SAN and play nice with the others. The server being added would not scribble on on LUNs that it did not own. With this type of configuration, I can put all newly purchased storage into this pool allocating the new storage as needed to each server.
The SAN offers some other advantages too:
The storage pool is centrally managed. Thus, space can be allocated, deallocated, and moved between various servers.
Disks can be placed in various RAID configurations, which provide reliability without placing RAID boxes all over the place.
Different RAID levels for different LUNs depending on the applications' requirements.
The disk arrays and servers can be separated by up to 10 kilometers. This can provide some disaster preparedness when installed correctly.
High availability is easier to configure.
LAN-free backups (See Sys Admin April 2000).
So, the end goal of a SAN, as we see it, is to provide storage for the business that can be accessed by Sun, HP, IBM, SCO, Linux, and others, including NT. We do not ask to share file systems between different operating systems (although it would be nice) but to share the same storage arrays, slicing off chunks as needed with little or no hardware configuration and no reconfiguration of systems already using the shared storage. We thought that if we relied on the Fibre Channel standards in designing our SAN and then found one or more vendors who could help put it together we would have a working solution. This working solution should then meet our needs well into the future with few changes -- just firmware upgrades and larger drive capacities.
After speaking at length with engineers at various vendors, our first pass at a SAN is shown in Figure 2. This SAN consisted of two E-3500s, one Ultra-2, two MTI 2700 JBODs, and two Gadzoox Gibraltar hubs configured into an arbitrated loop. We used Veritas Logical Volume Manager (LVM) to provide software RAID on the JBODs. The distance between the two hubs was approximately 1.2 kilometers. When it worked (which was most of the time), the results were quite good. When it failed (which was only after the on-call person had fallen into a deep sleep), it was difficult to troubleshoot.
Because of the nature of an arbitrated loop, a failure at some point in the loop will open that loop, preventing all devices on that loop from communicating. However, finding that opening can be difficult. It often entailed shutting down all three servers and starting them up one by one, or unplugging servers from the hub and waiting for things to clear up. When that happens, I sometimes wonder why I didn't just become a fry cook. Other concerns surrounding an arbitrated loop configuration were that the bandwidth is shared by all the devices on the loop (not entirely true, but Fibre Channel protocol and SAN topologies cannot be covered here), an arbitrated loop is limited to 126 devices (NL_Ports) and one FL_Port, which can limit growth. For those in a small environment looking to provide a SAN, an arbitrated loop with managed hubs should provide a satisfactory solution. The managed hub should prevent the open loop problem.
Because our end goal was to be in a fabric topology, we started to actively investigate Fibre Channel switches. (Note that in its original meaning, fabric meant a true switched Fibre Channel topology, but somewhere along the line, marketing and sales people decided fabric meant anything with the words Fibre Channel in it.) Sun partnered with Ancor, and an MTI drive array that we were investigating was shipping with Ancor Switches (in a crippled mode). This made the decision of picking a switch vendor a little easier. When MTI ships their newer drive array, it is shipped with Ancor switches with firmware that configures each port to be an SL_Port, which is a Segmented Loop Port. Thus, each port can support from one to 126 devices and is considered a private loop.
Nodes on one private loop can communicate with nodes on another private loop by address translation, which the Ancor switch handles. Problems, such as an open loop on port 6, will not have any affect on the other ports. Ideally, this would prevent a whole SAN from grinding to a halt because of failure on one SL_Port. The MTI crippled Ancor switch would still not provide full bandwidth between devices. But since this was a second attempt at a working SAN, we decide to give it a try. Figure 3 shows our second approach to a SAN. I have added our dual paths from each system to a separate switch. Notice that each switch has dual paths to the disk array. There is actually some other hardware associated with the MTI V20, but it is sufficient to say that we could lose one Ancor switch and still maintain full operations through the remaining path.
Once the hardware was set up as in Figure 3, and because we are using Veritas LVM, we were able to mirror the internal drives or SCSI-attached multi-packs to the LUNs we defined on the MTI V20. We actually built two separate SANs in two different data centers that looked like Figure 3. The number of machines is different, but everything else is the same. Once this was finished, we completed the SAN by connecting the two data centers with two longwave (9-micron fibre) lines. Systems that do not need high availability will use a Raid 5 LUN in the local V20; whereas, systems that require high availability will have two Raid 0 LUNs mirrored between the data centers. Figure 4 shows an abbreviated diagram of our current SAN for our low-end Sun servers.
The Sun systems in this SAN are E250s, E450s, E420Rs, and E3500s, running Solaris 2.6 and Solaris 7.0 build 105181-17 and 106541-09, respectively. All the systems are using JNI Host Bus Adapters with driver version 2.3. The disk array that we select is an MTI V20 with 24 18-GB drives. As a note, MTI does not support the JNI PCI card, but we have had excellent results with both the JNI Sbus and PCI cards and our engineering contact at JNI assured us that the drivers are the same, aside from being compiled for the different buses. The crippled Ancor switches from MTI are working and will prevent problems as discussed above. However, to bring them to a fabric topology, newer firmware will need to be downloaded into the switch -- a fairly simple task with the interface provided by Ancor. And, each Solaris box will need to have its dot conf file changed and rebooted.
At this point, a SAN is something you definitely do not want to stick right into production, no matter how convincing the salesperson is. I have built a mini-SAN with an 8-port Ancor switch and one of our original Fibre Channel JBODs for running tests. It is a great place to check out the latest firmware level for the switches, HBAs, and Solaris. If you are considering building a SAN, I suspect that by late summer or early fall of 2000 there will be a greater selection of disk arrays, newer HBA, and perhaps lower cost of everything.
About the Author
Greg (shoe) Schuweiler has worked in the friendly Midwest for the last ten years as a consultant, an embedded software designer, Oracle DBA, and a host of other strange titles. He has been in the UNIX systems administration area for the past four years. He can be reached at: gshoe@xadd.org.
|