Application Level ping: alp
The ability of users to access systems and the applications that reside on those systems is a fundamental concern in any organization, and ensuring such access is the most basic task of all systems administrators. The organizational concern is sufficient, in fact, that many, more sophisticated organizations have established service level agreements (SLAs) under which a specified level of access is guaranteed by the IT staff as part of the departmental charge-back arrangement.
What can you, as a systems administrator, do to address user concerns about system uptime? One approach is to utilize SNMP to log which hosts are up. This requires SNMP monitoring to be installed, which is expensive, complex, and can be inefficient. A less expensive solution is to set up a computer as a logging unit that uses the ping command to query each significant host and network device (e.g., every 10 minutes). This will allow you to demonstrate the uptime of your hardware and linkages.
Such a system-response arrangement is simplistic, however, in that it only shows half of the picture. What about the user who cannot access email because the IMAP server crashed? To that user, the system is still down, and your demonstrated uptime appears suspect.
One approach to monitoring applications is to run a script on the server to check the results of the ps command every 10 minutes or so. But what if you do not have control over that host? The answer is a single-sided socket client that tries to connect with the server application on the host. This amounts to an application level ping, or alp. While ping works at the lower regions of the OSI Reference Model, testing basic network connectivity, alp works at the higher regions of the OSI Reference Model, testing the ability to connect with applications.
The November issue of Sys Admin contained an article entitled "The Art and Detection of Port Scanning" by Arthur Donkers. In it, the nmap program is described as a tool with which hackers can determine which ports are active and thus concentrate attacks on known bugs. That is essentially the theory here, but this is a legitimate use of that information. alp was written as a straightforward Perl program to demonstrate real-world socket programming in Perl and to monitor which services are up. Either tool can be useful to either hackers or system administrators. Use of these types of programs involves the classic trade off between security and information.
To understand how alp works, one must understand a bit about UNIX network programming, especially the client-server model and socket programming. Usually client-server programs are built in two halves. The server component typically sits on a UNIX host or NT server. The client portion often but not always runs on PC clones or Macintoshes. The two halves communicate over the network using sockets. This style of programming allows us to write one server program with a well-defined application programming interface (API). We can write a Mac client or a PC client. From the server's perspective, it doesn't matter as long as both clients correctly use the API defined by the server.
One way to implement a client-server application is to cause the server application to create a socket and listen for clients attempting to connect to that socket. Some sockets are universally defined, while others depend on a particular operating environment. The active sockets on most UNIX systems are listed in /etc/services.
Sockets come in many flavors. The two most common in the IP world are Transmission Control Protocol (TCP) sockets and User Datagram Protocol (UDP) sockets. Our discussions will only cover these types. TCP is connection-based and is reliable; in other words, its header contains enough information to ensure delivery of the segment. UDP, on the other hand, is connectionless and unreliable. Why would anyone use a network communication protocol that is unreliable? Let's examine IP, TCP, and UDP a little more closely.
As you can see from Figure 1, an IP header contains at least 20 bytes of information. This includes a source IP address, a destination IP address, and a protocol. The protocol is usually TCP or UDP, but there are several other choices. In UNIX, the protocols are usually listed in /etc/protocols along with their numeric representation. This information, coupled with the other housekeeping information is enough to route IP datagrams and verify that the delivered datagram is valid. However, there is no acknowledgement that the datagram was received at the destination. Verification is left up to higher layers.
Figure 2 shows the IP header in outline and TCP header in more detail. Since the IP header contains the information necessary to get the packet to its destination, the TCP header contains information necessary to assemble the segments into a cohesive, correctly sequenced whole. It also contains the controls necessary to acknowledge the receipt of a segment. Additional bookkeeping functionality is built into the header. Common TCP applications include ftp, telnet, and rlogin.
In an effort to improve upon the speed of processing TCP segments, we see a much smaller UDP header in Figure 3. It includes none of the sequencing or acknowledgement information that was found in the TCP header. Why? Because if a layer above the UDP layer must intelligently examine the data in the packet anyway, why bother checking it at a lower layer too? This also results in a stateless connection. Some UDP applications are TFTP, BOOTP, and SNMP. DNS can use both UDP and TCP, but usually it uses UDP.
Although NFS is an RPC application, it frequently runs on top of UDP. By being stateless, if you have an NFS-mounted network filesystem and your access to that system goes down and then comes back up, the NFS client should not notice the interruption. If you are running NFS over a wide area network (WAN), you may find better performance by using TCP for NFS traffic since there is more transmission reliability. Figure 4 summarizes the relationship between the various discussed headers and the nomenclature used to refer to the networking elements.
To actually do client/server programming, you must set up a server that uses a predefined socket to communicate through. Then you develop a client that connects to the server's socket. The calls necessary to do this are illustrated in Figure 5, which compares a standard client with the alp client. This illustration is based on the illustration in UNIX Network Programming which shows a connection -oriented client-server exchange in its most basic form.
alp was implemented as a Perl script (Listing 1) to be a UNIX style command. (Listings for this article can be found at: www.samag.com or ftp.mfi.com in /pub/sysadmin.) It requires three command line arguments. The host is specified by IP address or name, the socket is specified by number because there is no way to correlate socket naming conventions between hosts. Finally, the service is specified by name for reporting purposes. There is a fourth, optional, command-line argument that allows specifying command-line switches. Getting and processing the command-line arguments is relatively straightforward; I will concentrate on the setup and use of the socket code.
alp only works with TCP sockets, so $protocol is set to 'tcp'. We then get the protocol number by using the getprotobynumber() call. We need a destination host that can be either an IP address or name. We use gethostbyname() or gethostbynumber() to fill in the missing host variable.
The socket address information is set up for the appropriate protocol family, service, and destination host. This is used in the socket() call.
Next, you call socket() to create a socket on your computer. The socket is S. The PF_INET identifies the protocol family as Internet. The common alternative is PF_UNIX or the UNIX protocol family, which is used for interprocess communication (IPC) on exactly one host. The equivalent, alternative naming is AF_INET and AF_UNIX - AF stands for address family. For TCP sockets, the socket type is SOCK_STREAM. For UDP, the socket type is SOCK_DGRAM. These are the most common types of sockets. This call should succeed; if it doesn't, there is a problem at the local end. A somewhat helpful message is printed if you are running the command in the verbose mode.
Next, we call connect() to attempt to connect to the socket. This determines whether or not the application is accessible. An appropriate response is printed. The alarm() call sets up for a 5-second time out so that if the connect() call does not return an answer, we can proceed with an error message.
Finally, we call close() to close the socket. If the socket opened correctly and stayed open, the close() should be successful. In any case, a message is printed if you are running the command in the verbose mode.
alp was developed on a 7-year-old Gateway 2000 486-66 with 64MB running Linux 1.2.13. It was run without conversion on a Sun running Solaris 2.6 and a second PC running Linux 2.0.34. It should be highly portable if you have the proper C header files converted for use with Perl, and if you have the Socket.pm file in the proper place.
Three examples of alp execution are shown in Listing 2. I show the command syntax if the wrong number of command-line arguments are provided. In the basic form, the command returns a single line response, much like a basic ping command. In verbose mode, additional information is provided. This is often useful for debugging problems.
Some common TCP applications that one might test include:
||File Transfer Protocol
||Simple Mail Transport Protocol
||Domain Name Service
||RPC bind, necessary for NFS as well as RPC
||NFS Lock Manager
There are two ways you might want to utilize alp. It makes a good addition to the ping command in that once basic network connectivity is established, the viability of network applications can be confirmed. The second way to use this command is to use a central computer to monitor various servers to determine a realistic picture of application, as well as network and host, uptime.
Remember, alp only determines if a socket connection can be established to the server. This is a good indicator that the server is up, but it may give false positives when the server is establishing new connections, but not servicing those connections after they are established. Also, remember that alp only works with TCP sockets, because it is single sided and testing UDP sockets cannot be done in an easily transparent manner with a single-sided client. This can be done using nmap by doing some fancy checking of error information. Since I discovered nmap, I probably won't bother implementing a UDP version of alp.
I used a sequence of alp commands in a cron job to gather information on many of our significant Internet services. A number of changes were made to make the alp output more useful for reporting purposes, all of which are reflected in this article. We are collecting data every 10 minutes but may modify the interval to reflect our environment.
These data will be displayed in a modified horizontal bar chart with cells filled in with black when the service is up. We will use colors to indicate the reason for the outage. If the outage is caused by routine scheduled maintenance such as monthly upgrades, hardware installation, etc., we will use green. If the outage is sudden but planned (e.g., installing security patches that cannot wait until the next scheduled downtime), we will use yellow. Finally, the unexpected service interruptions such as power failures will be in red. This will clearly identify the reason for the outage, along with an explanation.
UNIX Network Programming by W. Richard Stephens. Prentice Hall, 1990. Note: This is now in second edition, with two of the three volumes now available.
Advanced Programming in the UNIX Environment by W. Richard Stephens. Addison-Wesley, 1992.
TCP/IP Illustrated (The Protocols), Volume 1 by W. Richard Stephens. Addison-Wesley, 1994.
TCP/IP Illustrated (The Implementation), Volume 2 by Gary Wright and W. Richard Stephens. Addison-Wesley, 1995.
Programming Perl, Second Edition by Larry Wall, Tom Christiansen, and Randal L. Schwartz, with Stephen Potter. O'Reilly & Associates, 1996.
Advanced Perl Programming by Sriram Srinivasan. O'Reilly & Associates, 1997.
About the Author
Ron Jachim is Manager of Systems for the Barbara Ann Karmanos Cancer Institute, where he is responsible for the systems half of the Information Systems Group. He has 14 years of networking experience and both a BA and an MS in Computer Science. His thesis was on fuzzy queries. He can be reached at: firstname.lastname@example.org.