Article

Isolating Performance Problems

Steve Nice

Due to the complexity of today's computing environment, isolating performance problems can be difficult and time consuming. If you mix in hardware from a variety of vendors, add a pinch of software from various sources, and stir in support from third party suppliers, you've created a recipe for disaster. When trying to pinpoint a performance problem, all these ingredients must be considered. It is the server? Is it the client? Is it the network? In this article, I'll discuss ways to find the anwer.

In a recent scenario, users at a remote location were complaining that performance was poor when using a particular application. However, the same application did not experience performance problems at the local site. The application was critical to the business, thus the problem had a high profile. The only way to deal with problems of such complexity is to break them down into components and eliminate possibilities one by one. Figure 1 shows the high-level components. These components can be broken down further to identify each object the application uses.

The Client

Figure 2 shows the client objects -- a mix of Windows 95 and Windows NT 4 Service Pack 4 clients running on a variety of hardware platforms with varying amounts of memory. Standardization is the key here. This potpourri of hardware and software does not help when trying to pinpoint a performance problem. Ensuring that the software is the same version across all platforms is the first task. Microsoft Office 97 contains a useful program called MSINFO32.EXE. It's tucked away in Program files\common files\Microsoft shared\Msinfo. It can be started from the RUN command or from MSWord via the Help - About menu. It allows you to view the active modules and confirm version numbers of active programs (see Figure 3).

To find the module you're looking for, click on the Module Name heading to sort it into alphabetical order. Scroll down the module list to identify the name of the application. In this case, the OBDC drivers were identified as version 3.510.3711.0. Normally, the module name is similar to the name of the application you're looking for. Once you've identified the module version number, it can be compared with the same module on another PC. The module on the remote PC was an earlier version, so I downloaded the latest ODBC administrator and drivers from Microsoft's Web site. The manufacturer's Web site would be the first place to look for patches or updates.

Instead of trying to identify each version of every module, I relocated a local PC, which was performing fine, to the remote office. As expected, the PC performed poorly at the remote site. This indicated a network problem of some sort, so concentrating on network-related issues became a priority. Network performance for Windows 95/98 and Windows NT4 can be improved by adjusting the TCP Large Windows (TCPLW) and Selective Acknowledgements (SACK).

Selective Acknowledgements (as documented in RFC 2018) allows TCP to recover from IP packet loss without re-sending packets that were already received by the receiver. TCPLW (as documented in RFC1323) allow packets larger than the default 64K to be transmitted. A patch is available for Windows 95 to allow TCPLW and SACK. It is called Windows Socket 2 Update and can be downloaded from: http://www.microsoft.com/windows95/downloads/contents/ \
wuadmintools/s_wunetworkingtools/w95sockets2/default.asp. Support for TCPLW and SACK is already enabled in Windows 98. Windows NT clients must have TCPLW enabled by editing the registry. Using Regedt32, navigate to: HKLM\CurrentControlSet\Services\TCPIP Parameters. Add valueTcpWindowSize, if it's not already there, and set to 0x4470.

The Network

Figure 4 shows the network objects. It's a heterogeneous network running at 10/100 MHz and supporting TPC/IP, SPX/IPX, and NetBUIE protocols. Cisco routers connect the LANS over a 2 Meg SMDS line and ISDN2 for backup. SMDS lines offer a cost-effective alternative to leased lines. Ethernet LAN switches are employed at each site. This is basically a type of fast, multi-port Ethernet bridge, designed to increase the performance of your LAN. Ethernet LAN switches also provide segmentation of the network, meaning that collision domains are separated. All other applications were running fine over the network, so network hardware could be dismissed.

The Server

Figure 5 shows the Server objects. The servers comprise an HP K and D class running HP-UX 10.20, Open Ingres, and SCO's Vision ODBC Server. The server's kernel had been tuned to a database parameter set. In addition to the kernel, certain network parameters can be tuned (see Figure 6). We're interested specifically in tcp_send and tcp_receive. This is the default socket buffer size in bytes for inbound and outbound data. If you have large data packets, then this can be set to 32768. Use:

nettune -s tcp_send 32768
nettune -s tcp_receive 32768

Also, the tcp_pmtu parameter, once enabled, allows the system to use the largest possible packet size, rather than the default of 512 bytes. On most systems, the administrator must explicitly enable this. Use:

nettune -s tcp_pmtu 1

Finding a network problem is simply a process of elimination. In Figures 2 and 4, there are three layers to the client and server. Each of these layers has its own network functions that filter through each layer. Use:

ping ip-address

to view network response times. It's difficult to put a number on acceptable response time. It depends on the speed of your link. Response time in my example varied between 20ms and 30ms, which was acceptable. In some cases, if the ping fails, there may be a firewall between you and the server that is blocking the ping, or perhaps the server may have a network interface card problem. Or, you may have a routing problem. Use:

netstat -r

to view your routing table. Table 1 shows an example. From this example, you can see that to get to destination 192.168.0.24, the packets must go through gateway 192.168.0.100. If there is a problem with the gateway, you will not be able to get to destination 192.168.0.24. Thus, you may think the problem lies with the destination when actually it's the gateway that's down. Test the gateway by doing a ping and ensure you get a reply.

Having established that physical communications are in working order, the next task is to test the network using another application. This test will help eliminate the network as the potential problem. Using ftp, I transferred a file from the remote client to the server and then did the same ftp from a local client to the server. The ftps finished within seconds of each other, therefore eliminating the physical network as the problem.

Tracing the Route of the Problem

After eliminating the obvious, it's time to do some tracing. On HP-UX, use the nettl command to capture packets from the network. It's a five-step procedure starting with the capture itself.

Step 1: Begin LAN Tracing to a Raw Trace File

nettl -tn 0x30800000 -e driver -size 1024 -tracemax 99999 -f /tmp/raw0

This starts the trace where tn is the abbreviation for traceon. The next argument is a ‘logical AND' of the subsystem values to trace. For example, to trace Inbound Protocol Data Unit and Protocol or connection states, you would ‘AND' 0x20000000 with 0x04000000 to give 0x24000000. The following table shows the possible trace options.

Value Meaning

0x80000000 Inbound Protocol Header

0x04000000 Protocol or connection states

0x40000000 Outbound Protocol Header

0x02000000 Invalid events or condition

0x20000000 Inbound Protocol Data Unit

0x01000000 Special kind of trace that contain a log message

0x10000000 Outbound Protocol Data Unit

0x00800000 Packets whose source and destination system is the same

0x08000000 Protocol or connection states

In the example above, we're tracing Inbound and Outbound Protocol Data Unit and loopback packets. The -e is the appropriate network driver. Use all, if you have no idea what network card your system has.

-size portsize -- This argument sets the size in kilobytes (KB) of the trace buffer used to hold trace messages until they are written to the file. The default size for this buffer is 68 KB. The possible range for portsize is 1 to 1024. Setting this value too low increases the possibility of dropped trace messages from fast subsystems.

-tracemax maxsize -- Tracing uses a circular file method such that when one file fills up, a second is used. Two trace files can exist on a system at any given time. maxsize specifies the maximum size in kilobytes (KB) of both trace files combined. The default value for the combined file sizes is 1000 KB. The possible range for maxsize is 100 to 99999.

-f filename -- The location and filename of the raw trace file.

Step 2: Reproduce the Network “Event”

This is when I get the users to do their normal activity.

Step 3: Stop the Raw Trace

nettl -tf -e all

Do this command as soon as possible. The raw trace files get very large, very fast.

Step 4: Create Filters for the Raw Trace

The trace will capture an enormous amount of data. We're only interested in the IP addresses of the server and the PC that is doing the testing, so a filter file is needed to filter out the interesting bits. The filter file is a plain text file containing some of the following text.

To trace layer 1 (physical) use:

filter source aa-bb-cc-dd-ee-ff
filter dest aa-bb-cc-dd-ee-ff

where aa-bb-cc-dd-ee-ff is the physical MAC address. Use this if you do not know the IP address, but you do know the MAC address.

To trace at layer 3 (network) use:

filter ip_saddr aaa.bbb.ccc.ddd
filter ip_daddr aaa.bbb.ccc.ddd

where aaa.bbb.ccc.ddd is the IP address of the machine you want to trace.

To trace at layer 4 (transport) use:

filter tcp_sport aaaa
filter tcp_dport aaaa

where aaaa is the port address you want to trace. Port numbers are used by applications that communicate across the network. There are predefined port numbers for various applications such as ftp and telnet. Beyond this, applications can use any port they wish. Problems occur when two applications are using the same port number.

To trace at layer 5 (session), use:

filter rpcprogram program id.

The session layer allows users on different machines to establish sessions between them. A session allows ordinary data transport, as does the transport layer, but it also provides some enhanced services useful in some applications.

Here's an example of a filter file that captures IP packets to and from address 192.168.0.2 and 192.168.0.24:

filter ip_saddr 192.168.0.2
filter ip_daddr 192.168.0.2
filter ip_saddr 192.168.0.24
filter ip_daddr 192.168.0.24

Step 5: Use netfmt to Extract Information from the Raw Trace File

Using the netfmt command, you can extract information from the raw trace file using the filter file you've just created.

netfmt -N -n -l -c /tmp/filterfile -f /tmp/raw0.TRC0 > /tmp/trace0

This command reads the raw trace file located at /tmp/raw0. TRC0 uses the /tmp/filterfile and outputs the results to /tmp/trace0. The arguments are for formatting the trace.

To capture on the ‘fly' involves an interactive trace that displays the trace file on the screen and tee's the output to a file. Ensure you've set up the filter file:

nettl -tn 0x30800000 -e lan100 | netfmt -F -N -n -l -c \
 /tmp/filterfile | tee /tmp/fmt0

Ctrl/c to stop the trace and then:

netlt -tf -e all

Examine the Trace

Figure 7 contains an example of a trace. Starting from the top, you can see that TCP puts a header at the front of each segment. The network speeds, timestamp, the subsystem or driver, and the trace kind (Inbound Protocol Data Unit) are inserted by the netfmt command. No filter was specified for this trace, so you can see all the network layers. The Ethernet layer gives the source and destination MAC address and the frame length. Ethernet has its own addressing system, and no two machines have the same MAC (Ethernet) address. Each Ethernet controller comes with an address built in from the manufacturer, which has 48 bits allocated for the Ethernet address.

Following is the IP Header showing the source and destination IP address, length of data, and time to live (ttl). The time to live is a number that is decremented whenever the datagram passes through a system. If the ttl reaches zero, the packet is discarded.

The TCP Header's normal size is 20 bytes. Each TCP segment contains the source and destination port number to identify the sending and receiving application, respectively. The combination of IP address and the port number is commonly called a socket. Each TCP header has a sequence number used by the recipient to make sure that it has received all the datagrams in the correct order. To verify the segment has arrived at its destination, the recipient must return an “acknowledgement”. In the example above, an acknowledgement of 0xd3c49d indicates that the receiving computer has received all the data up to number 0xd3c49d. If the sender doesn't get an acknowledgement within a reasonable amount of time, it sends the data again. The flags and fragment offset are used to keep track of the pieces when a datagram has to be split up. This can happen when datagrams are forwarded through a network for which they are too big.

Six flags are included in a TCP header:

SYN -- Synchronize sequence numbers to initiate a connection. Each time a new connection is established, the SYN flag is turned on.

FIN -- When set, this flag implies that the sender has finished sending data.

ACK -- When the ACK field is on, the acknowledgement number in the corresponding field is valid. Actually, the acknowledgement number contains the next sequence number that the sender of the acknowledgement expects to receive.

RST -- This flag is used to reset the TCP connection.

URG -- The urgent pointer is valid when this flag is set. This pointer is a positive offset that must be added to the sequence number field of the segment to yield the sequence number of the last byte of urgent data.

PSH-- Indicates that the receiver should pass this data to the application as soon as possible.

Identifying the Problem

To resolve the problem in the example scenario, a local user was asked to perform tasks using the application so the packets could be captured using the nettl command. A remote user was then asked to perform the same tasks. These packets were also captured. Once captured, the packets were put through a filter file so that only the server and client were in the formatted trace. How did this help identify the problem and, more importantly, the solution?

When looking for errors and inconsistencies on a trace, keep an eye open for data corruption, sequence number errors, time to live times, and packet times. Using the trace, it was possible to compare the difference between the local capture and the remote capture. A timing problem began to emerge. When examining each trace side by side, everything was the same to begin with. The client would request data, and the server would respond. Identical data being sent back from the server could be seen. However, differences began to appear in the timing of the packets. Each trace contains a time stamp:

Timestamp : Mon Sep 13 BST 1999 12:07:44.715890

After the first batch of data, there was a delay of 1 second before the next batch began. This was shown in the trace as Packet Filtered Out -- basically saying no packets here for 192.168.0.2. After the second batch of data, there was another delay. Only this time, it was 2 seconds, then again with 3 seconds, and finally 4 seconds delay. Examining the packets before and after the delay identified the application. In this case, the ODBC server was causing the problem.

The RPC call section in the header contains information and data relating to the application using the network. This section contains information for the application and will not be decoded by the filter, except for the header. The header information includes an abbreviated name of the application that is utilizing network resources. This should give an idea of the application that is using the network. In the above example, INGSQLD is the ODBC driver name. To confirm that, the ODBC drivers were causing the problem, they were bypassed and the database was accessed directly. This caused no delay. Having identified the ODBC driver as the problem, I implemented other ODBC drivers which proved successful.

If you have a complex problem to solve, try to break the problem into manageable portions, list all components, and eliminate them one by one. Eventually, you'll find the problem and be able to implement an alternative that will prove your findings. Users at the example remote location now have access speeds equivalent to those users in the local office.

About the Author

Steve Nice is a freelance network consultant with over ten years experience in IT. He is a Microsoft Certified System Engineer who is based in the city of London and can be contacted at: steve.nice@rocketmail.com.