TCP/IP Troubleshooting in AIX Using Network Packet Traces
Have you ever wanted to troubleshoot network problems and found that the tools at your disposal fell short of giving you the details? Or have you ever wanted to buy that Protocol Analyzer but could not afford to do so? Many systems administrators have encountered these obstacles.
Some UNIX systems provide utilities that can function as a poor man's protocol analyzer. These utilities put an interface into promiscuous mode, thus allowing the capture of frames. AIX provides the iptrace utility. This utility, combined with ipreport, provides a formatted and readable network-packet trace output. In this article, I discuss how you can use AIX's iptrace and ipreport to troubleshoot network problems. I provide a sample output of the program and a brief interpretation of this output. Additionally, I provide a shell program for monitoring some netstat statistics over time.
In this article, I assume that the reader is familiar with IP, TCP, and UDP packet formats. For convenience, I have illustrated the packet structures for each of the three protocols in Figure 1. (For those who need more detailed information, Douglas Comer's Interconnecting with TCP/IP, Volume I, 3rd Edition and W. Richard Stevens' TCP/IP Illustrated, Volume 1 are excellent references.)
iptrace and ipreport
The iptrace utility allows you to record tcp/ip network packets going through your network interfaces. The program gives some flexibility by providing command flags that act as filters to capture specific packets that meet your criteria. The syntax for running the command is as follows:
/usr/sbin/iptrace [ -a ] [ -P Protocol ] [ -i Interface ]
[ -p Port ] [ -s Host [ -b]] [ -d Host [ -b ]] LogFile
Please refer to the man page for more details on using the command.
iptrace on its own does not produce human-readable text. You must also run ipreport, giving it the output name of iptrace as a parameter.
In this article, we will run iptrace using the startsrc command. In AIX, you use startsrc and the corresponding stopsrc to start and stop a subsystem. Listing 1 shows a script that gives you an example on how iptrace and ipreport can be invoked.
Running the following:
# hostname -s
# iptracethis.ksh xxx.xxx.30.90 5
0513-059 The iptrace Subsystem has been started. Subsystem PID is 16197.
0513-044 The stop of the iptrace Subsystem was completed successfully.
will capture the network traffic between hosts dev01 and the host with an address of xxx.xxx.30.90 for five minutes. Figure 2 is an example of the packet trace (the first two bytes of the IP address has been changed to protect the innocent):
The previous trace provides a lot of information. You can see, from packets one to three, the establishment of a TCP connection, also known as the three-way handshake. Looking at the TCP portion of Packet 1, the source host (SRC =xxx.xxx.30.90), or Host A, sends a SYN segment with its initial sequence number (th_seq=d930100) to the destination host (DST = xxx.xxx.23.19), or Host B. Host A also indicates the port number that it wants to communicate with (3300) in Host B, as well as the port number it is using (1902). Notice also that Host A is the one that initiates the connection. In this particular scenario, Host A is a PC client (MS Windows) that is trying to establish a connection to Host B, which is an RS/6000 server (AIX 4.1.4) running a Sybase database server. The database server in this case listens on port 3300.
In Packet 2, notice that Host B is now the source of this packet (SRC = xxx.xxx.23.19). In this packet, Host B responds to Host A with its own SYN segment, which contains its initial sequence number (th_seq=4e5c4401). Also in this segment, Host B acknowledges Host A's SYN segment (Packet 1) by sending an ACK (th_ack=d930101) that is 1 plus the initial sequence number sent by Host A from the previous packet.
In Packet 3, Host A acknowledges Host B's SYN segment by sending an ACK (th_ack=4e5c4402) that is 1 plus the initial sequence number of Host B. Thus, the three-way handshake is established (three packets are needed to establish the connection).
Continuing the example, it turns out that the connection between Host A and Host B is unpredictable. There were times during the conversation when the connection seemed very unstable. Further analysis of the packet trace was necessary.
In Packets 23 and 24, you can see that the TCP sections are identical, indicating a possible retransmission of the same packet. This is not necessarily bad because the retransmission might have been done on purpose by TCP, if it hadn't received a response from the other host. But in this example, this is not a retransmission as TCP defines it. Furthermore, Packet 24 was sent almost immediately after Packet 23. This can be shown by subtracting the transmission time of Packet 23 from that of Packet 24, as follows:
12:33:57.971356416 (Packet 24)
- 12:33:57.967711744 (Packet 23)
This does not indicate a TCP retransmission because TCP sends its first packet retransmission only after a full second has elapsed.
For some reason, probably because of a bug in a PC device driver, the client PC sent identical TCP packets one after another. Looking at the rest of the trace (not shown) shows this happening frequently. This is a waste of computer resources and network bandwidth and caused unpredictable behavior of the PC application. After changing the device driver, the problem disappeared. Figure 3 shows the network trace done with the new driver installed in the client PC:
This example shows how to use iptrace, coupled with a good understanding of the TCP/IP protocol suite, to troubleshoot nagging network problems.
Listing 2 is the listing of netstatus.sh. Using this program, you can monitor the number of incoming and outgoing packets to your host, as well as the number of errors encountered. This program runs netstat using a 20-second interval and averages out the results, which are then saved to a file. The output shows netstat statistics per second. Figure 4 is a sample output of this program. The colls column in the output is meaningless if you are using token-ring.
The syntax for running the shell program is as follows:
where interface is the network interface that you want to monitor.
An effective way of using netstatus.sh is to schedule it using cron at a selected interval, as in the following:
0,15,30,45 8-22 * * 1-5 /home/santosj/bin/netstatus.sh \
tr0 > dev/null 2>&1
This example will run the program (monitoring interface tr0) on the hour, and every 15, 30, and 45 minutes after the hour. It will run it from 8 am to 10 pm, Monday through Friday. See Figure 4.
In addition to monitoring errors, you can use the program to do some benchmarking. I have used it to monitor the amount of traffic that the host is receiving (and transmitting) given a certain increase in the number of business transactions generated by a certain application. Also, if you have a Web server, you can determine at what time of day your site gets the most traffic.
As a Sys Admin reader, you are free to use and modify the shell programs in this article (at your own risk, of course), but you may not sell the code or incorporate it into a commercial product without my consent. You can download the listings from Sys Admin's Web site at: www.sysadminmag.com or from ftp.mfi.com in /pub/sysadmin/.
About the Author
James M. Santos is an independent consultant specializing in systems administration and performance tuning. He has worked in the computer industry for over thirteen years. He teaches UNIX at Columbia University on a part-time basis. He can be reached at: email@example.com.