A
ping Utility for Fibre Channel SANs
Bill Pierce
I love ping. I use ping when working with IP networks
more than any other networking utility. After working for a few
years in the storage-networking field, I began to ask myself why
such a basic networking utility was not available for Fibre Channel
(FC) Storage Area Networks (SANs). Was the technology really so
different that the concept of a ping was not meaningful?
Did the underlying protocols make it impossible? Was there no uniform
API to the Host Bus Adapters (HBAs) used to connect a host to storage
over the SAN? When the answer to all these questions became "No,"
I started to write fcping, a ping utility for Fibre
Channel SANs, so systems administrators around the world could use
an old friend in a new environment.
How fcping Works
Once you understand a few of the differences between IP and Fibre
Channel networking, fcping will be as easy to use as ping.
The idea behind a ping command, regardless of the underlying
protocol, is pretty simple -- you can send a message to a remote
node and see how long it takes for the node to answer back. If the
node never answers, ping will timeout and give up. Nodes
on an IP network tend to be peers, where any node can try to ping
any other node. In the case of a SAN, there are SCSI initiators
(the host HBAs) and SCSI targets (disks, RAIDs, and tape libraries)
that are driven by the initiators. fcping allows hosts to
ping storage targets.
One important difference between IP and Fibre Channel networking
is the addressing model. In IP networks, a single-homed node has
an IP address that is assigned directly or indirectly (through DHCP)
by an administrator. Correctly or incorrectly, we view the IP address
as a surrogate name for the host, rather than simply the address
of one of its interfaces. Only rarely do we pay attention to the
MAC address of the network card in the host because Address Resolution
Protocol handles the IP-to-MAC conversion for us.
In a Fabric (Fibre Channel network composed of one or more switches
and edge nodes, e.g., hosts and storage, that use Fabric protocols),
nearly the opposite is true. The addresses you interact with are
the port MAC addresses assigned to the device ports at the factory.
Communication in Fibre Channel is port-to-port rather than node-to-node,
so when you want to communicate with another entity on the SAN,
you specify the Worldwide Port Name (WWPN) with which you want to
communicate. There is also the concept of a Worldwide Node Name
(WWNN) that can be used to identify a Fibre Channel node that has
multiple ports. The WWNN used is often the one assigned at the factory,
and it often bears a striking similarity to the WWPNs, but this
is not a requirement. For example, WWNNs on HBAs within the same
host are often soft-assigned to be equal, so the host can appear
as a single Fibre Channel node with multiple ports. Worldwide node
and port names are usually 8-byte numbers written as hex digits
separated by colons (e.g., 10:00:00:E0:02:21:AE:68). See Figure
1.
Under the covers, the fabric routes Fibre Channel frames across
the fabric using a 3-byte address called the Destination ID (D_ID).
This routing address is assigned dynamically to the edge node when
it connects to the fabric and performs a process called Fabric Login.
A fabric service called Simple Name Server (SNS) maintains the mappings
between worldwide names and D_IDs across the fabric. Login with
SNS also allows targets to advertise their presence on the fabric
and initiators to discover these targets. Because the D_ID of a
device can change depending on when and where it logs into the fabric,
users almost always refer to Fibre Channel nodes and ports by their
Worldwide Names.
The next difference is in the protocol used. Ping uses
the ECHO_REQUEST/REPLY messages over the ICMP protocol. Similar
protocols have been developed for Fibre Channel but were not widely
implemented at the time of writing. Instead, fcping works
by issuing the SCSI-3 (the SCSI developed for serial transports)
primary command INQUIRY across the fabric. Like its IP cousin,
fcping times how long it takes for a response from the SCSI
target device to be received. If there is no response, it will timeout
after the specified time interval before sending the next request.
Because a timestamp is not encoded in or decoded from the INQUIRY
packet, fcping measures response time using system time,
which is inherently less accurate than ping. The payload
of the INQUIRY reply command contains some basic information
about the vendor and model of the target device. As shown below,
fcping will also display this information when a reply is
received.
Building and Installing fcping
The fcping utility is written in C and can be compiled
on Solaris and Win32 systems. Porting it to other flavors of UNIX
should not be difficult, and ports can be expected in future versions.
You can download the source and binary kits from:
http://www.sysadminmag.com/code/
The code in this kit is distributed under the GNU General Public License
and the GNU Lesser General Public License, so fcping is guaranteed
to remain a public, open source tool.
Included in this kit are source and binary forms of the Common
HBA API library V1.0, which can also be obtained from:
http://www.sourceforge.net/projects/hbaapi
This library serves as an adaptor between fcping and the HBA
driver, providing a uniform HBA interface across vendor's hardware.
This API is the result of an unusually productive collaboration among
various HBA and Management Software vendors. Version 1.0 was released
and implemented last year. Version 2.0 is currently being finalized
and HBA vendors should release implementations of it by the end of
2002. The Common HBA API Library is distributed under the Storage
Networking Industry Association (SNIA) Public License, which is included
along with its source code in the kit. fcping also uses another
Open Source library called timeout_action, which allows arbitrary
functions to be executed in a thread that will timeout if the action
takes longer than expected. This library uses the pthreads library
on UNIX, and Win32 threads on Windows.
To build the kit, simply unpack the zip file, cd to the
fcping directory, uncomment the makefile to suit your environment,
and type make (nmake on Win32). The makefile is composed
of four platform-dependent sections that specify the build environment
(you uncomment the one that applies) and one platform-independent
section at the bottom that contains the make rules. Since
make can be persnickety, you may need to run a unix2dos conversion
on the makefile if you intend to build on Windows. I have built
fcping on Linux and Solaris using gcc V2.9x and the Forte
compiler V6, and on Windows using MS Visual C++ 6.0. The build should
take less than half a minute.
Once you have compiled a binary image of fcping, become
a superuser and copy it to the directory of your choice and place
the Common HBA Library, libHBAAPI, in your path to shared libraries
or type make install (nmake install on Win32) to execute
the copy commands in the install.sh (install.bat)
scripts. These simple install scripts are provided for convenience,
but are not currently very sophisticated and may only apply to some
environments. If in doubt, look at the script source.
Next, you must be sure you have a vendor-specific HBA library
installed, which is provided by your HBA vendor. These sometimes
install when you install your HBA device driver and sometimes must
be installed separately. To tell whether the vendor library has
been installed, check the /etc/hba.conf file on UNIX or the
HKEY_LOCAL_MACHINE\SOFTWARE\SNIA\HBA Registry key on Win32.
There you will find the mappings the SNIA common library uses to
find and load the vendor-specific libraries. The most common reason
for fcping to fail is that the vendor library has not been
installed or has been incorrectly registered.
If you do not find what you're looking for, a trip to your
HBA vendor's Web site should do the trick. The HBA API FAQ
at:
http://hbaapi.sourceforge.net/faq.html
contains a table showing which HBAs are supported and has links to
HBA vendor sites where the vendor-specific libraries may be downloaded.
Using fcping to Troubleshoot SANs
The fcping utility has been written to work as much like
ping as possible. To ping a port on a storage device
from the host, simply issue the fcping command followed by
the WWPN of the port. You can get the WWPN of the target device
from the device's management application, from the SNS table
of the fabric switch to which the device is connected, or from a
commercial SAN-management application that has discovered the storage.
For example:
%>fcping 10:00:00:E0:02:21:AE:68
Pinging port 10:00:00:E0:02:21:AE:68, LUN 0 with SCSI Inquiry:
Port 10:00:00:E0:02:21:AE:68 replies in 0.010 s as ADIC FCR-1 Module.
Port 10:00:00:E0:02:21:AE:68 replies in 0.000 s as ADIC FCR-1 Module.
Port 10:00:00:E0:02:21:AE:68 replies in 0.000 s as ADIC FCR-1 Module.
3 successful and 0 unsuccessful pings.
Average ping time: 0.003 s.
fcping will respond by trying to send a SCSI Inquiry command
to LUN (Logical Unit Number) 0 of the specified port 3 times in a
row. It will report whether a response was received, how long it took
to receive that response, and the Vendor ID (ADIC in this case) and
Product ID (FCR-1 Module) the storage device responded with. If the
port does not respond within the specified interval (1 sec by default),
you will get the familiar "Request timed out" message for
each ping that fails. At the end, a tally of the successful
and unsuccessful pings is given along with the average response time
of those pings that were answered within the specified interval. If
all pings were successful, fcping will return an exit code
of EXIT_SUCCESS, as defined in the stdlib.h.
The command options available from fcping are also consistent
with ping's, allowing you to specify the number of pings
to send, the FCP-LUN to ping (if other than LUN 0 is desired) and
the time interval between pings. The -h option will remind
you of the usage syntax and these options:
%>fcping -h
Usage: fcping [OPTION]... WORLD_WIDE_PORT_NAME
WORLD_WIDE_PORT_NAME in upper case hex XX:XX:XX:XX:XX:XX:XX:XX
Options:
-v print the version and exit
-h print usage and exit
-c count number of pings before exiting
-i interval interval between pings in seconds
-l lun FCP LUN to ping in decimal (default is LUN 0)
-q quiet operation, no output to stdout
You have the option to specify which Fibre Channel LUN you want to
ping at the specified port address. To better understand this,
consider how Fibre Channel LUNs are mapped to SCSI disk devices. Each
port on an FC storage device exposes a set of FCP LUNS (typically
in the range of 0-10). Your HBA and its device driver will map these
WWPN/FCP-LUN tuple addresses into more standard SCSI bus-target-lun
addresses (X-Y-Z) that are exposed as device files such as /dev/dsk/cXtYdZs0
to the operating system or Logical Volume Manager.
You can use fcping to check for physical and logical connectivity
between a host initiator and a target device across the SAN. Physical
connectivity is a combination of physical connections between SAN
components and the operational state of devices and links along
the possible physical connection paths. Logical connectivity is
a combination of physical connectivity and the Host Affinity, Zoning,
and LUN Masking rules that have been applied to your storage, fabric,
and HBA, respectively. These zoning and masking facilities are used
on a SAN when you want to restrict communication between certain
hosts and target LUNs. This might be done to prevent different hosts
from interfering with each other's LUNs or for security reasons.
fcping is a great tool for verifying the visibility, or lack
thereof, that a given HBA port has of a particular WWPN/FCP-LUN
once zoning/masking rules have been applied.
You can also use fcping to periodically check connectivity
between a host and a target across a fabric and the operational
state of the storage device. Because fcping acts just like
ping, it should be easy to turn your IP network and host
monitoring scripts (e.g., see Randal Schwartz's "Doing
Many Things, Like pings" Sys Admin, May 2002: http://www.samag.com/documents/s=7121/sam0205g/sam0205g.htm)
into SAN monitoring scripts.
fcping can be used as a crude monitor of SAN performance.
You can check how significantly your ping times increase when your
SAN is heavily loaded, such as during the weekly backup. Pinging
different storage devices from different hosts could give you an
idea of which routes are most heavily loaded during these times.
Another use for fcping is to measure how long it takes your
SAN to recover from failure of a critical but redundant link. Such
a failure should cause a wave of Registered State Change Notifications
(RSCNs) that propagate across your fabric, causing new routes to
be established. By consecutively pinging at some interval and watching
the response times, you should be able to determine how long it
takes your fabric to re-establish the connection and for ping
latency to return to normal. The many uses of such a simple tool
convinced me that ping had to be brought to storage networking.
Conclusion and Dedication
This article describes V1.0 of fcping. My goal was to create
a simple tool that would be immediately useful. Features to look
for in future versions include:
1. More statistics on pings collected (MIN, MAX, STDEV).
2. Support for other UNIX flavors.
3. Support for V2.0 of the HBA API (currently only V1.0 is supported).
4. Ability to specify which HBA/Port the ping should go out through
(currently the first adaptor and port found are used).
5. More standard ping features.
In the spirit of many of the IP network troubleshooting tools
we know and love (e.g., traceroute, netstat), fcping
has come into, and will remain in, the public domain. With time,
Fibre Channel equivalents of these other old friends may come too.
fcping is dedicated to the memory of Mike Muuss, the author
of ping. Shipping with most operating systems, ping
is one of the most widely distributed programs in the world.
Bill Pierce is another physicist turned software engineer.
He pioneered the use of the Web for geophysical science applications
and cut his teeth as a systems administrator at Northwest Research
Associates in Microsoft's backyard, when Internet Explorer
was a gleam in another Bill's eye. He started developing SAN
management software in 1998 at Vixel Corp. and is currently an architect
on the SANView team at Fujitsu Softek. Bill can be contacted at:
systems_r_up@yahoo.com.
|