Cover V08, I08
Article
Figure 1
Figure 2
Figure 3
Figure 4

aug99.tar


Fibre Channel Unraveled

Steve Nelson

Fibre Channel (FC) is not really an emerging technology, since it is based on mainframe fiber directors for ESCON that were developed more than ten years ago. It's the speed, implementation, and "openness" that makes FC new. This article will provide a brief overview of the technology, which will include a view into what FC is all about, the FC protocol that makes it work, the devices from which you build FC-based networks, and some sample topology philosophies that will provide starting points from which to design flexible layouts.

Most vendors tout FC's speed (100 MB/sec or 1 Gb/sec), and distance capability (anywhere from 500 m to 10 km, depending on implementation). Based on this, the inclination is to simply substitute fiber for SCSI cables and build point solutions to ever increasing amounts of storage. However, FC also represents a new way to look at your data storage in the "open" environment. FC allows a system/storage administrator to build networks of storage nodes, accessible by any other node attached to the network. These networks of storage have various names depending on vendor; however, the most common are the Storage Area Network (SAN) and the Enterprise Storage Network (ESN). The difference between the two lies in the scope of the implementation: the SAN covers a point, or small-scale solution to an immediate problem, while the ESN is a broader solution covering most, if not all, CPUs within a given environment.

Looking at the name of the technology, it would be easy to conclude that this only runs on fiber optic cabling. In fact, the FC protocol runs over single mode and multi-mode fiber cabling, as well as various copper implementations, including both Twinax and Twisted Pair. The spelling difference (Fibre vs. fiber) is intended to portray this variety of media available. However, the most common implementations are multi-mode fiber for local (under 2 km) implementations, and single mode for campus area (2 km to 10 km) implementations. In addition to these media types, there are a number of interface types for each media type. The most common implementations for fiber are the GLM (Gigabit Link Module) and the GBIC (Gigabit Interface Converter). The difference between the two implementations is in the interface card. The GLM implementation has the fiber interface permanently attached to the interface card, fixing the type of media that the card can utilize. The GBIC is a module that is inserted into an interface card allowing different types of media to be utilized on the same interface card.

FC has two basic topologies - loop and fabric. The FC loop topology, or FC Arbitrated Loop (FC/AL), is a basic loop topology that works a lot like FDDI. A token is established on a loop of connected nodes. Transmission on the loop is only possible when the node has control of the token. Once a node releases the token, it is passed to the next node on the loop. If a node in a loop fails, or is removed, the entire loop fails and must be re-established. Each loop can have a maximum of 127 devices per loop. Additionally, since the token passes from node to node, the total bandwidth is shared between all nodes on the loop. Because of the characteristic of loop failure upon failure of a node, true FC/AL loops are rarely constructed. Instead, hubs are used to build loop topology SAN/ESNs. Hubs are basically self-sealing loops in a box. Nodes can come and go from the loop, and the hub will handle the dynamic removal and addition of nodes.

The second topology is the FC fabric topology, or FC Switched Fabric (FC/SW). This topology is based on the connection of nodes to switching devices - much like switched Ethernet. The switch maintains an internal table of the nodes attached to each port and delivers traffic to only those nodes. While bandwidth is shared between any nodes attached to the same switch port (e.g., via a loop), each port on the switch guarantees the same bandwidth to each node. All ports are 1 Gbps ports on a FC/SW switch. Additionally, many switch vendors are integrating "zoning" features into their switches. Zoning occurs when the switch limits the ports that are visible from any single port. For instance, when a CPU is connected to a port on such a switch, it is possible, using the control software for the switch, to limit the ports that are visible to that CPU, thus providing an element of security and network traffic control.

Before implementing either type of topology, it is important to have a base understanding of the underlying protocol. The most atomic part of a FC transmission is a frame. A frame is very much like a packet in network terminology, with a 28-byte header, 2112 bytes of "payload", and an 8-byte trailer. The payload section can contain either 64 bytes of additional routing/network information then 2048 bytes of actual data, or be completely used for "useful" data. Each node is identified by a World Wide Number or WWN. The WWN specifies a "unique" identifier for each node on the SAN. The WWN, as it identifies a node, is used in the establishment of connections and routing for sending and receiving frames (think ARP), as well as by various devices in the establishment of secure connections between source/destination pairs. However, the "World Wide" uniqueness of this number is not strictly enforced. The number is generated by a variety of means, depending on the vendors and topologies, and is only truly guaranteed at this point to be unique within a given SAN.

The protocol model is similar to the ISO-OSI 7-layer model that systems/network administrators all know and love. The FC model is a 5-layer model that defines layers FC-0 through FC-4. The two models are compared in Figure 1. Note that there is not an analogous layer to FC-3, nor are there analogies for the Application or Session layers.

FC-0 defines the physical layer, describing the physical characteristics of the transmission media. FC-0 also describes the transmission speed of the media. These speeds range from 133 Mbps to 4.25 Gbps. The most common FC-0 layer is 1.025 Gbps (100 MB/sec) multi-mode fiber optic. However, rumors abound regarding the wide availability of 4.25 Gbps fiber within the year.

FC-1 is known as the Encode/Decode layer. This layer is used for lower level frame identification and serialization, similar to layer 2 (link) in the OSI model. Data encoded at this layer utilizes a 10-bit encoding scheme to ensure that the link-level connection is maintained without error, or at least can detect and correct any link errors that arise.

FC-2 is where the interesting stuff occurs. At this layer, the frames are constructed and the management of frames sequences is performed. Also at this layer, the "service classes" are determined. FC-2 is roughly analogous to the pieces of layers 3 (network) and 4 (transport) of the OSI model. The frames are constructed with the data presented to them from FC-3 and FC-4 layers. This data contains the upper level (FC-4) protocol to use, the actual data itself, and any network information from the FC-4 layer.

FC-2 breaks the data into chunks according to a specific hierarchy of objects: Frames, Sequences, and Exchanges. Frames, as described above, are the basic building blocks of data movement in FC. They are the objects that actually contain the data being transported. The next layer, Sequences, is a collection of related and sequential frames that have been either sent or received, depending on the direction of the intended operation. An Exchange is a collection of sequences that complete an entire operation. For instance, if one wants to place data on a storage area somewhere on the SAN/ESN, there would be a sequence for initiating the connection, one to transfer the data, and one to close the connection. All of the events would be termed an Exchange. Within the Exchange, there would be a number of sequences containing the required number of frames to transfer the data as determined by algorithm contained in the FC-2 layer driver for the particular interface. The Exchange contains frames originating from both the "source" and "destination" nodes, and represents a complete record of the transaction.

As was stated before, FC-2 is also where the "service class" is determined. Within FC there is the concept of classes of service between nodes. There are five classes of service: Class 1, 2, 3, 4, and 6. The most commonly used are Classes 1, 2, and 3. Class 1 is a direct-connect, or point-to-point delivery with delivery of frames guaranteed. Additionally, each frame will be delivered in the order sent. Many vendors use this class for clients who want to be "absolutely, positively" sure that all frames are received and in the exact order sent. This direct connection may be a physical point-to-point connection, or it may be established through an intermediary device. The connection, once established, uses the same path between the nodes, regardless of cost or bandwidth considerations. Class 1 service works similarly to a phone - call a number, establish a connection, and keep that connection and route until complete.

Class 2 service provides a delivery service on the order of TCP. In other words, each frame sent receives an "ACK" frame in return. If an ACK is not received, the frame is retransmitted until complete (assuming no break in the connection, in which case the entire Exchange can be discarded). The advantage of Class 2 over Class 1 is that Class 2 can operate on a SAN talking to many different nodes on the same connection. However, since Class 2 does recovery of lost frames, the order of the frames is not always guaranteed. This is not usually a problem for most applications, as FC-2 takes care of re-assembly prior to presentation to upper layers. Remember, TCP works very similarly, and no one complains about their data being in the wrong order when moved over an NFS mounted file system!

Where Class 2 service provides an analog to TCP, Class 3 service is analogous to UDP. When using Class 3 service, delivery of each frame is assumed. Class 3, like UDP, depends on reliable connections and upper level handshaking/error checking to ensure accurate delivery of data. On the surface, this may seem to be the class of service to avoid; in reality it is the most common transport mechanism for storage applications. This is because a) FC connections are assumed to be very reliable, and b) the upper level protocols (SCSI and IP) can, and do, handle a large amount of the error checking. Additionally, Class 3 tends to be faster than Class 2 because of the lack of protocol overhead associated with the reliable protocol.

Classes 4 and 6 are not widely used, but they deserve mention. Class 4 is used for fractional allocation of connection bandwidth. For instance, if one needs to be positive that a particular connection between nodes receives a particular portion of the available transmission time, then a Class 4 connection would be established between the nodes in question. Class 6 is a uni-directional connection with frame and frame-order delivery guarantee. This is the class of service that is used for applications such as multi-node video streaming and others that are frame-order sensitive. It just wouldn't do to have frames 2, 3, and 4 of your favorite Star Wars movie appear on your monitor after frame 5.

FC-3 is not yet widely implemented. Its intention is to provide "common services" accessible between FC-2 and FC-4, such as naming service for WWN to "human readable" names, name aliasing, and other services. Some switch vendors currently make use of FC-3 for local name aliasing, as well as for embedding security features within their switches. FC-4 is where the upper level protocols fit. This is where the hooks to your standard SCSI driver live, as well as connections to things like IP (see Figure 2). FC-4 is what makes FC special. Since FC allows already established standard protocols to ride on top, the OS does not need to understand yet another method of accessing storage, or accessing networks - it simply uses the existing protocols that it already understands, and is (generally) not aware that a different connection method is being used. The OS simply thinks it has a really fast disk attached.

Nodes on FC are connected via ports. A port is simply the connection type of the node being reviewed. However, the port type can change depending on the type of device to which the node is attached. There are six port types: N, NL, F, FL, E, and G. An N port is a simple device that does not advertise other FC connected devices - in other words, a (N)ode. A simple example of an N port is two nodes connected directly together - each port will be configured as an N port. NL ports are connected via a loop topology. All nodes plugged into a hub, or connected together to form a loop will configure as NL ports. F ports are switch ports that accept connections from N ports only (e.g., connecting a CPU or drive array directly into a switch). On the other hand, FL ports configure only to NL ports. In this case, the switch will either be a "node" on a loop, or be connected to a hub. E and G ports are specific to switches. An E port, or (E)xtension port is used to connect two switches together, while a G or (G)eneric port is a port that can be either an E port or an F port, depending on the equipment attached.

Implementation

So, how is this all implemented? Is there a really complex piece of software out there with which you have to manually establish all of these classes of service and connection types for every node? The answer is that most of the protocol negotiation is automatic and based largely on what the node has, what it is plugged into, and what is on the other end, much like in networking. The class of service is really set in the driver for the interface card that is contained within the node. There is only limited generally available support for the various classes of service available. While there are implementations of Class 6 for selected applications, the vast majority of the card drivers only support Class 3 service. Most hubs and switches support Classes 2 and 3, and while this may sound impressive (according to the marketing people), Classes 2 and 3 on a hub/switch are really just pass-throughs. Remember, Class 2 implements error correction based on ACK/NAK of received frames, and Class 3 does not provide frame level acknowledgement at all! Port type assignment is completely automatic as described above - Nx ports for point-to-point, hubs or node to switch, and Fx ports for switching applications.

While the standards described above are set by ANSI, each vendor has its own interpretation. The result is that, at this time, many switch vendors cannot interact. Be wary of mixing products from different vendors, particularly switch vendors. Port standards are also not fully implemented. Many vendors do not have true E or G ports, as interface protocols have not been fully developed for the switch software. Most vendors seem to be concentrating on implementing E ports but only between their own switches. For Sun systems, both SBus and PCI based systems, these cards are primarily from Jaycor and Q-Logic. HP systems use only HSC and PCI cards offered by HP. Third party FC cards are not currently available for HP systems. To add to all of this, there are two standards for the laser - OFC and non-OFC lasing. (OFC (Open Fibre Control) is a mechanism that controls the state of the laser when it detects a broken or powered off connection. An OFC compliant laser will shut down when there is not a returned signal, while a non-OFC laser will run continuously.) The two are not compatible and, if mixed inadvertently, can damage equipment!

OS vendors, such as Sun and HP are also marketing their own solutions. While systems from both vendors can be connected to the same switch, depending on the switch vendor, they cannot coexist in a FC/AL environment. Sun in particular has its own proprietary FC/AL offering in addition to its support for third party ANSI FC/AL solutions. This uses the A5x00 drive array series and the native interfaces on the I/O boards for the Enterprise EX000 series. On the other hand, HP, while complying with the standard, does not have "Fabric Login" built into their driver at the time of this publication. This is a necessary component for support of FC/SW. "Fabric Login" support will be included in a future version of thei r FC driver. Vendors on both sides of the storage equation, both OS and array vendors, are taking the position that only products that are certified by both sides will be supported in their SAN/ESN configurations. This means that while an OS vendor may certify a product (switch, hub, etc.), the storage vendor may not support the configuration because it has not certified the product. This may lead to some confusion when talking to third party switch, hub and card vendors, as they actually may be supplying the OS or storage vendors and feel that their product is certified by both sides.

When implementing a SAN or ESN, it is important to select or develop a model of implementation prior to installation of hardware. I have come up with five basic models of implementation: Point, Functional, Departmental, Mixed, and Organic. The Point model, direct connections between servers and storage units, is the simplest to design and implement. However, a point topology provides the least flexibility, does not use the capabilities of FC, and does not scale well. The Point model is simply to treat FC as traditional SCSI and establish direct connects between CPU nodes and storage nodes. This creates "islands" of storage, which are difficult to share using the inherent properties of FC.

One of the major advantages of using FC over traditional SCSI is the ability to manage storage over multiple platforms and to "arbitrarily" locate storage when required. For instance, let's say a Point model is used to connect CPU A to Storage A and CPU B to Storage B. If there is extra storage available on Storage A that can fulfill a need on CPU B, it is not able to be used because CPU B is not on the "network" with Storage A. To use Storage A in the Point model, a separate direct connection would have to be made to Storage A, thus adding to the number of required interfaces on Storage A. Consider another scenario: both Storage A and B are filled and additional storage is needed. However, the amount of required storage for both CPU A and B does not fill a hypothetical disk array for either CPU A or B.

In the traditional SCSI or Point model, either a single array with interfaces for both CPUs would be added, or two arrays would be purchased. Additional interfaces on each CPU would also be required to make the connections. Before long, the cabling of the arrays becomes complex as additional arrays are added, the number of physical arrays is increased, and management of the storage is worse than the traditional SCSI method. This is with only two CPUs - as additional CPUs are added, the complexity of the topology increases exponentially. This setup is completely against the whole philosophy of FC - the sharing and management of storage that is "arbitrarily" located on a network. In short, the Point model should not be used except in cases where it is well known that the numbers of nodes will be very small, that the interaction between nodes is minimal or non-existent, and that the configuration is static.

Flexible Topologies

The remaining models are more interesting and are related. All of the models assume an Enterprise focus, but do not assume an immediate Enterprise implementation. In other words, the model does not have to be implemented all at once in order to work. The key is consistent implementation - once a model is chosen, stick to it as much as possible unless there is a compelling reason to change. All of the models also require a good working knowledge of the Enterprise in which they are implemented. Each model also assumes that the complete ANSI specifications with regard to Class 2 and 3 services and FC/SW are implemented and available for use, including the availability of E ports on the switches. However, in order to implement pieces, only currently available capabilities are required. With that said, here are some definitions that will clarify the models:

Function - A common set of tasks that are grouped together. This would group together such areas as all Accounting, Sales, Marketing, and Design functions within a given Enterprise.

Department - A Department represents a grouping of functions that produces a given product or group of products within a given Enterprise. This would be similar to a company that competes in several areas and has different business units that handle each area. For example, a tire company may be organized into Truck, Auto, Bicycle, and Airplane business units, along with a central organization. Each unit has its own design, manufacturing, marketing, and other functions.

The Functional model analyzes the storage requirements for each Function within the Enterprise, groups those storage requirements together, and manages them as a complete unit. For instance, all storage requirements for Accounting would be contained in either a physical or logically defined SAN/ESN that would generally be available only to Accounting servers. Interactions with other Functions would be through a "master" or "hub" switching unit that would limit access to the "subnet" of Accounting storage. This provides a number of advantages. First, the Function has sole blame for running out of storage, or for having huge storage requirements that other Functions would not necessarily have. The Functional model also assists in troubleshooting problems or storage hogs. It is a lot easier to identify which application is taking storage if all of the applications are directly related in performing a common task. In the Accounting example, Accounts Payable is hogging storage through the generation of large logs, which prevents Accounts Receivable from completing their work. It is relatively simple for the System/Storage Administrator to identify the recalcitrant application, bring the necessary technical people to the table, and work on the common problem.

The Departmental model is very similar to the Functional model, with the exception of substituting Departments for Functions. In this model, each Department would have a physically/logically defined SAN/ESN that contains the data for each function within the Department. For instance, all storage requirements for the Auto tire department (design, marketing, etc.) would have its own group of storage. Other Departments could again access the storage via a "master" switch, but primary access would be for the Auto group only. This model has the advantage of concentrating all of the functions of a Department into a single area, and managing the storage between functions within the Department. With the Functional model, many different Departments may share the Functions storage, thus creating confusion about who owns what storage and where problems really are. In the Departmental model, this is alleviated, but the new problem of having functions that are common between Departments sharing data is raised. It also becomes more difficult to identify a storage problem, because there are now disparate applications utilizing the same SAN/ESN.

The Mixed model (see Figure 3) is an amalgam of the Functional and Departmental models. The Mixed model starts with either the Functional or Departmental model, and then adds direct connections (shown in blue in Figure 3) between the various nodes, bypassing the "master" switch. The purpose of the direct connections is to allow for direct access by selected other areas without competing for bandwidth through the "master" switch. This is accomplished using the zoning features found on more advanced switches. This model, however, requires as many E ports on each switch as connections between various SAN/ESNs. E ports are usually limited in quantity, as they tend to consume switch resources.

The Organic model (Figure 4) takes a different view of growth from any of the three previous models. This model starts with a single application or group of applications that establishes the SAN/ESN. From this initial group, additional CPU or Storage nodes are added to the switch until the capacity of the switch is reached. Then additional switches are added as necessary. Storage is assigned to CPUs on an as necessary basis from the available storage at that point in time. The eventual result is a SAN/ESN that has grown in proportion and in reaction to the needs of the Enterprise, has almost no excess capacity at any point in time, but can add capacity to meet needs. This model is very attractive in that it requires almost no analysis and planning of Functional or Departmental groups, and is very flexible and responsive to major changes in business models of the Enterprise. However, with this model it is difficult to plan ahead for storage, and management of storage to a particular application or CPU can be complex. Efficiencies of storage also may not be realized in the combination to meet business needs. Multiple CPUs, Functions, and Departments may be sharing the same storage node, and attempts to combine the storage of one of the above objects would most likely affect all. This organic model is good for smaller organizations, matrix organizations, organizations that shift their structure on a frequent basis to meet business needs, or to accommodate rapid growth. (Note that credit for this model goes to a former boss - Dave Thomson. Dave pointed the basis of this model out to me when we were discussing the others as models for planning an SAN/ESN within our organization. Thanks, Dave!)

FC is an exciting "new" technology for the open systems world. When fully implemented, it will allow System Administrator more flexibility in managing how and where their end users store data. One of the exciting possibilities of FC is the introduction of TCP/IP over FC. While vendors, such as HP, are using TCP/IP over FC for clustering applications, the protocol has not been extended to storage. If a modified TCP/IP stack is extended to cover storage as well as the data networks, it will allow systems administrators true flexibility in developing very highly available, fault tolerant networks of information storage and processing. Ideas such as Wide Area Storage Networking, that could allow the distributed, directly accessible storage over a state, a continent, or the world, can become possible. For now, it is important that systems administrators begin looking at implementing the basic levels of SANs and ESNs, for no other reason than to improve the speed and reliability of locally attached storage. Implemented correctly, these small starts will lead to major improvements in availability in almost any environment.

References and Resources

The Fibre Channel Association, Fibre Channel - Connection to the Future, 2nd Ed., 1998, LLH Technology Publishing.

Craig Hunt, TCP/IP Network Administration, 2nd Ed., 1998, O'Reilly and Associates.

http://www.fibrechannel.com/showcase/index_showcase.htm - refcard.pdf - Fibre Channel Association FC Reference Card, 1997, Fibre Channel Association.

Storage Networking Industry Association - http://www.snia.org/ (Fibre Channel standards efforts)

About the Author

Steven B. Nelson has been a System Administrator and Engineer, Network Engineer, data center designer and manager, and application developer for a number of companies over the last 12 years. He is currently a Sr. Consultant for Sysix Technologies. He has worked with a number of different operating systems and platforms, primarily Solaris, HP-UX and NT. He can be reached at steve.nelson@sysix.com.