Cover V11, I09

Article
Figure 1

sep2002.tar

A ping Utility for Fibre Channel SANs

Bill Pierce

I love ping. I use ping when working with IP networks more than any other networking utility. After working for a few years in the storage-networking field, I began to ask myself why such a basic networking utility was not available for Fibre Channel (FC) Storage Area Networks (SANs). Was the technology really so different that the concept of a ping was not meaningful? Did the underlying protocols make it impossible? Was there no uniform API to the Host Bus Adapters (HBAs) used to connect a host to storage over the SAN? When the answer to all these questions became "No," I started to write fcping, a ping utility for Fibre Channel SANs, so systems administrators around the world could use an old friend in a new environment.

How fcping Works

Once you understand a few of the differences between IP and Fibre Channel networking, fcping will be as easy to use as ping. The idea behind a ping command, regardless of the underlying protocol, is pretty simple -- you can send a message to a remote node and see how long it takes for the node to answer back. If the node never answers, ping will timeout and give up. Nodes on an IP network tend to be peers, where any node can try to ping any other node. In the case of a SAN, there are SCSI initiators (the host HBAs) and SCSI targets (disks, RAIDs, and tape libraries) that are driven by the initiators. fcping allows hosts to ping storage targets.

One important difference between IP and Fibre Channel networking is the addressing model. In IP networks, a single-homed node has an IP address that is assigned directly or indirectly (through DHCP) by an administrator. Correctly or incorrectly, we view the IP address as a surrogate name for the host, rather than simply the address of one of its interfaces. Only rarely do we pay attention to the MAC address of the network card in the host because Address Resolution Protocol handles the IP-to-MAC conversion for us.

In a Fabric (Fibre Channel network composed of one or more switches and edge nodes, e.g., hosts and storage, that use Fabric protocols), nearly the opposite is true. The addresses you interact with are the port MAC addresses assigned to the device ports at the factory. Communication in Fibre Channel is port-to-port rather than node-to-node, so when you want to communicate with another entity on the SAN, you specify the Worldwide Port Name (WWPN) with which you want to communicate. There is also the concept of a Worldwide Node Name (WWNN) that can be used to identify a Fibre Channel node that has multiple ports. The WWNN used is often the one assigned at the factory, and it often bears a striking similarity to the WWPNs, but this is not a requirement. For example, WWNNs on HBAs within the same host are often soft-assigned to be equal, so the host can appear as a single Fibre Channel node with multiple ports. Worldwide node and port names are usually 8-byte numbers written as hex digits separated by colons (e.g., 10:00:00:E0:02:21:AE:68). See Figure 1.

Under the covers, the fabric routes Fibre Channel frames across the fabric using a 3-byte address called the Destination ID (D_ID). This routing address is assigned dynamically to the edge node when it connects to the fabric and performs a process called Fabric Login. A fabric service called Simple Name Server (SNS) maintains the mappings between worldwide names and D_IDs across the fabric. Login with SNS also allows targets to advertise their presence on the fabric and initiators to discover these targets. Because the D_ID of a device can change depending on when and where it logs into the fabric, users almost always refer to Fibre Channel nodes and ports by their Worldwide Names.

The next difference is in the protocol used. Ping uses the ECHO_REQUEST/REPLY messages over the ICMP protocol. Similar protocols have been developed for Fibre Channel but were not widely implemented at the time of writing. Instead, fcping works by issuing the SCSI-3 (the SCSI developed for serial transports) primary command INQUIRY across the fabric. Like its IP cousin, fcping times how long it takes for a response from the SCSI target device to be received. If there is no response, it will timeout after the specified time interval before sending the next request. Because a timestamp is not encoded in or decoded from the INQUIRY packet, fcping measures response time using system time, which is inherently less accurate than ping. The payload of the INQUIRY reply command contains some basic information about the vendor and model of the target device. As shown below, fcping will also display this information when a reply is received.

Building and Installing fcping

The fcping utility is written in C and can be compiled on Solaris and Win32 systems. Porting it to other flavors of UNIX should not be difficult, and ports can be expected in future versions. You can download the source and binary kits from:

http://www.sysadminmag.com/code/
The code in this kit is distributed under the GNU General Public License and the GNU Lesser General Public License, so fcping is guaranteed to remain a public, open source tool.

Included in this kit are source and binary forms of the Common HBA API library V1.0, which can also be obtained from:

http://www.sourceforge.net/projects/hbaapi
This library serves as an adaptor between fcping and the HBA driver, providing a uniform HBA interface across vendor's hardware. This API is the result of an unusually productive collaboration among various HBA and Management Software vendors. Version 1.0 was released and implemented last year. Version 2.0 is currently being finalized and HBA vendors should release implementations of it by the end of 2002. The Common HBA API Library is distributed under the Storage Networking Industry Association (SNIA) Public License, which is included along with its source code in the kit. fcping also uses another Open Source library called timeout_action, which allows arbitrary functions to be executed in a thread that will timeout if the action takes longer than expected. This library uses the pthreads library on UNIX, and Win32 threads on Windows.

To build the kit, simply unpack the zip file, cd to the fcping directory, uncomment the makefile to suit your environment, and type make (nmake on Win32). The makefile is composed of four platform-dependent sections that specify the build environment (you uncomment the one that applies) and one platform-independent section at the bottom that contains the make rules. Since make can be persnickety, you may need to run a unix2dos conversion on the makefile if you intend to build on Windows. I have built fcping on Linux and Solaris using gcc V2.9x and the Forte compiler V6, and on Windows using MS Visual C++ 6.0. The build should take less than half a minute.

Once you have compiled a binary image of fcping, become a superuser and copy it to the directory of your choice and place the Common HBA Library, libHBAAPI, in your path to shared libraries or type make install (nmake install on Win32) to execute the copy commands in the install.sh (install.bat) scripts. These simple install scripts are provided for convenience, but are not currently very sophisticated and may only apply to some environments. If in doubt, look at the script source.

Next, you must be sure you have a vendor-specific HBA library installed, which is provided by your HBA vendor. These sometimes install when you install your HBA device driver and sometimes must be installed separately. To tell whether the vendor library has been installed, check the /etc/hba.conf file on UNIX or the HKEY_LOCAL_MACHINE\SOFTWARE\SNIA\HBA Registry key on Win32. There you will find the mappings the SNIA common library uses to find and load the vendor-specific libraries. The most common reason for fcping to fail is that the vendor library has not been installed or has been incorrectly registered.

If you do not find what you're looking for, a trip to your HBA vendor's Web site should do the trick. The HBA API FAQ at:

http://hbaapi.sourceforge.net/faq.html
contains a table showing which HBAs are supported and has links to HBA vendor sites where the vendor-specific libraries may be downloaded.

Using fcping to Troubleshoot SANs

The fcping utility has been written to work as much like ping as possible. To ping a port on a storage device from the host, simply issue the fcping command followed by the WWPN of the port. You can get the WWPN of the target device from the device's management application, from the SNS table of the fabric switch to which the device is connected, or from a commercial SAN-management application that has discovered the storage. For example:

%>fcping 10:00:00:E0:02:21:AE:68

Pinging port 10:00:00:E0:02:21:AE:68, LUN 0 with SCSI Inquiry:

Port 10:00:00:E0:02:21:AE:68 replies in 0.010 s as ADIC  FCR-1 Module.
Port 10:00:00:E0:02:21:AE:68 replies in 0.000 s as ADIC  FCR-1 Module.
Port 10:00:00:E0:02:21:AE:68 replies in 0.000 s as ADIC  FCR-1 Module.

3 successful and 0 unsuccessful pings.
Average ping time: 0.003 s.
fcping will respond by trying to send a SCSI Inquiry command to LUN (Logical Unit Number) 0 of the specified port 3 times in a row. It will report whether a response was received, how long it took to receive that response, and the Vendor ID (ADIC in this case) and Product ID (FCR-1 Module) the storage device responded with. If the port does not respond within the specified interval (1 sec by default), you will get the familiar "Request timed out" message for each ping that fails. At the end, a tally of the successful and unsuccessful pings is given along with the average response time of those pings that were answered within the specified interval. If all pings were successful, fcping will return an exit code of EXIT_SUCCESS, as defined in the stdlib.h.

The command options available from fcping are also consistent with ping's, allowing you to specify the number of pings to send, the FCP-LUN to ping (if other than LUN 0 is desired) and the time interval between pings. The -h option will remind you of the usage syntax and these options:

%>fcping -h
Usage: fcping [OPTION]... WORLD_WIDE_PORT_NAME
WORLD_WIDE_PORT_NAME in upper case hex XX:XX:XX:XX:XX:XX:XX:XX

Options:
 -v           print the version and exit
 -h           print usage and exit
 -c count     number of pings before exiting
 -i interval  interval between pings in seconds
 -l lun       FCP LUN to ping in decimal (default is LUN 0)
 -q           quiet operation, no output to stdout
You have the option to specify which Fibre Channel LUN you want to ping at the specified port address. To better understand this, consider how Fibre Channel LUNs are mapped to SCSI disk devices. Each port on an FC storage device exposes a set of FCP LUNS (typically in the range of 0-10). Your HBA and its device driver will map these WWPN/FCP-LUN tuple addresses into more standard SCSI bus-target-lun addresses (X-Y-Z) that are exposed as device files such as /dev/dsk/cXtYdZs0 to the operating system or Logical Volume Manager.

You can use fcping to check for physical and logical connectivity between a host initiator and a target device across the SAN. Physical connectivity is a combination of physical connections between SAN components and the operational state of devices and links along the possible physical connection paths. Logical connectivity is a combination of physical connectivity and the Host Affinity, Zoning, and LUN Masking rules that have been applied to your storage, fabric, and HBA, respectively. These zoning and masking facilities are used on a SAN when you want to restrict communication between certain hosts and target LUNs. This might be done to prevent different hosts from interfering with each other's LUNs or for security reasons. fcping is a great tool for verifying the visibility, or lack thereof, that a given HBA port has of a particular WWPN/FCP-LUN once zoning/masking rules have been applied.

You can also use fcping to periodically check connectivity between a host and a target across a fabric and the operational state of the storage device. Because fcping acts just like ping, it should be easy to turn your IP network and host monitoring scripts (e.g., see Randal Schwartz's "Doing Many Things, Like pings" Sys Admin, May 2002: http://www.samag.com/documents/s=7121/sam0205g/sam0205g.htm) into SAN monitoring scripts.

fcping can be used as a crude monitor of SAN performance. You can check how significantly your ping times increase when your SAN is heavily loaded, such as during the weekly backup. Pinging different storage devices from different hosts could give you an idea of which routes are most heavily loaded during these times. Another use for fcping is to measure how long it takes your SAN to recover from failure of a critical but redundant link. Such a failure should cause a wave of Registered State Change Notifications (RSCNs) that propagate across your fabric, causing new routes to be established. By consecutively pinging at some interval and watching the response times, you should be able to determine how long it takes your fabric to re-establish the connection and for ping latency to return to normal. The many uses of such a simple tool convinced me that ping had to be brought to storage networking.

Conclusion and Dedication

This article describes V1.0 of fcping. My goal was to create a simple tool that would be immediately useful. Features to look for in future versions include:

1. More statistics on pings collected (MIN, MAX, STDEV).

2. Support for other UNIX flavors.

3. Support for V2.0 of the HBA API (currently only V1.0 is supported).

4. Ability to specify which HBA/Port the ping should go out through (currently the first adaptor and port found are used).

5. More standard ping features.

In the spirit of many of the IP network troubleshooting tools we know and love (e.g., traceroute, netstat), fcping has come into, and will remain in, the public domain. With time, Fibre Channel equivalents of these other old friends may come too. fcping is dedicated to the memory of Mike Muuss, the author of ping. Shipping with most operating systems, ping is one of the most widely distributed programs in the world.

Bill Pierce is another physicist turned software engineer. He pioneered the use of the Web for geophysical science applications and cut his teeth as a systems administrator at Northwest Research Associates in Microsoft's backyard, when Internet Explorer was a gleam in another Bill's eye. He started developing SAN management software in 1998 at Vixel Corp. and is currently an architect on the SANView team at Fujitsu Softek. Bill can be contacted at: systems_r_up@yahoo.com.