Cover V03, I06
Article
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Listing 1

nov94.tar


Monitoring and Optimizing NFS Performance

Robert Berry

Introduction

The network file system (NFS) is an essential part of your LAN communications, and its transparency to the network users is directly proportional to its response time. Finding that optimum performance can be difficult, and therefore, performance tracking and optimization of NFS can be a challenging task for systems administrators. The UNIX environment provides tools that enable you to gather the statistics needed to pinpoint areas in which to improve network response time. As a first step, you must familiarize yourself with the network resources and the users of those resources. Then compile the statistics on the current configuration, and determine if the statistics provide you with good news or bad news. Finally, if you determine that the news is bad, you must identify what the bottlenecks are, based on the statistical results.

Familiarize Yourself with Your Network

You may believe you already know your network inside and out. But take a moment and think. Are you familiar with the type of work conducted by each client on the network? Do you know the types of RPC requests that are typically generated by this work? Do you know which servers provide most of the resources for each client? Do you know the network's busiest time? Its lag times? Essentially, you need to know who does what, where, and when on the network.

Information of this sort helps you to model your networking environment. If you understand the nature of the network's workload, you should be able to develop an extremely accurate representation of your system when you try to create a set of benchmarks.

Gather Stats on Your Current Network Configuration

The UNIX environment supplies you with numerous tools for gathering network statistics. Some of the more useful are netstat, nfsstat, vmstat, iostat, uptime, and spray. The following sections show an example of each (except for vmstat and iostat) and explain their usefulness for collecting information. vmstat and iostat were explained thoroughly in a previous issue (see Bill Genosa's "Monitoring Performance with iostat and vmstat," March/April 1994, p.6).

netstat

The command-line syntax is:

netstat [-n] -i 

Figure 1 demonstrates a sample output.

The netstat utility gives you information on the reliability of your local network interface. The first column of the output is the device name of your network interface. The second column, Mtu, represents maximum transmission unit. The third column, Net/Dest, is the actual network to which your interface is connected; this will be the actual numbered address if the "-n" option is used. The Address column (column four) displays the local host's name or, again, the actual IP address if the "-n" is used. The remaining columns display the number of input and output packets, as well as the number of errors that occurred with each. The Collis column displays the number of times a collision occurred each time the host transmitted.

The input and output error columns are of most concern here. A high number of input errors could result from electrical problems, from corrupt packets being received from another host with a damaged network interface, from damaged cables, or from a device driver that has an improper buffer size. A high number in the output error column may indicate a problem with your own network interface.

This analysis assumes that your network as been up and running for some time. However, a high number of errors could show up in either category if your system has just recovered from a network-wide power outage -- particularly if you have many diskless clients. The key word here is high: both input and output errors should be as close to zero as possible. Still, there will usually be some errors present, especially if you have recently disconnected and reconnected cables or if your network has periods of intense traffic.

The number in the collision column will likely not be zero, but should be a low number relative to the number in the output packet column. You can calculate the percentage of collisions observed by a particular host by dividing the number in the collision column by the number in the output packet column and multiplying the quotient by one hundred. Hal Stern, in Managing NFS and NIS, (O'Reilly & Associates, Inc.), suggests that a collision rate of over 5 percent indicates a congested network in need of reorganizing.

A collision rate can also be obtained for the entire network. To calculate this you would add all hosts' output packet columns and all hosts' collision columns, divide the latter by the former, and multiply by one hundred, as above. This method is more appropriate than taking the sum of all the collision rates for each individual host and dividing by the number of hosts, because by this method the busier hosts will weigh more heavily on the average than the less busy hosts. Again according to Hal Stern, if the rate is greater than 10 percent, your network is ripe for partitioning.

One caveat is in order here. If you notice a host with significantly more collisions than a similar host with similar network usage, this may be an indication of electrical problems rather than network congestion.

nfsstat

nfsstat displays statistical information concerning the status of your NFS and remote procedure calls (RPCs) for both the server and client aspect of your system. Each field in the output is a window into the heart of your network operations.

The command-line syntax has three useful forms:

nfsstat -s

nfsstat -c

nfsstat 

The first form will display server side statistics only; the second will display client side statistics only; and the third will display both server and client side statistics, respectively. Sample output for the first two commands is shown in Figure 2 and Figure 3. The server display indicates how successfully your server is receiving packets from each client. The fields in the display are as follows:

calls -- Indicates the number of RPC calls received.

badcalls -- Indicates the number of calls rejected by the RPC layer. Such a rejection would be generated by an authentication failure. It also includes the combined totals of the badlen and xdrcall fields.

nullrecv -- Indicates the number of times a nfsd daemon was scheduled to run but did not receive a packet from the NFS service socket queue.

badlen -- Indicates that the server received RPC calls that were too short in length.

xdrcall -- Indicates that RPC calls were received that could not decode the XDR headers.

The client display indicates how successful your client is in communicating with all the NFS servers. The fields in this display are as follows:

calls -- Indicates the total number of calls made to the NFS servers.

badcalls -- Indicates the number of RPC calls that returned an error either by timeouts or because of an interruption of the RPC call itself.

retrans -- Indicates the number of times a call had to be retransmitted because there was no response from the server.

badxid -- Indicates the number of times a reply from a server was received which didn't correspond to any outstanding call. When a request is generated it is given an XID. At any one time, there are several calls requesting services on any number of servers. Occasionally, a response is received with an XID that has already been serviced. At this time badxid is incremented. I will discuss the significance of this field later.

timeout -- Indicates the actual number of calls that timed out waiting for a server's response.

wait -- Indicates the number of times a call had to wait because a client handle was either busy or unavailable.

The remaining fields of the client RPC section are not relevant to the current topic, and are omitted from the discussion here.

uptime

uptime is a simple tool that allows you to get the current time, the amount of time the system has been up, the number of users on the system, and the three load averages (see Figure 4). The three load averages are a rough measure of CPU usage over 1-, 5-, and 15-minute intervals.

What's considered high for these three categories depends on the number of CPUs on your system and whether or not your tasks are CPU-intensive. leen Frisch, in Essential System Administration (O'Reilly & Associates, Inc.), notes that any value under 3 would not be critical.

spray

The commandline syntax is:

spray hostname [-c count] [-l length] [-d delay]

spray reports the number of packets sent to a particular host; the time needed to send those packets; the number of packets received by the host; and the number and percent of packets that were dropped by the host (see Figure 5).

spray is a useful but somewhat limited tool. The output gives the number of packets that didn't make the distance, but it doesn't indicate at what point in the network the packets were lost. Another limitation is that in the real world, packet sizes can vary and usually occur in random bursts. But by default, spray sends 1162 packets of 86 bytes in length. With the "-c" and "-l" options, you can minimize this limitation by varying the number and size of packets. With the "-d" option, you can even simulate some delay between packets.

Running spray from each of your machines will give you a good estimate of a server's performance capabilities and of the speed of a particular machine's network interface. You may find that a server that receives a large portion of the network traffic has a slow network interface; you might then decide to move the file systems to a faster machine or provide it with a faster network interface.

Obtaining NFS Benchmarks

You can use the UNIX tools described above to measure your network's performance under normal conditions. This will give you a set of benchmarks by which to judge your system. This will be handy the next time a user comes up and complains about the network being sluggish. Simply run the test again and compare it with past results.

The key here is knowing what "normal" is on your network. This is the point where being completely familiar with the network workload is important. The benchmarks will serve no purpose if they do not accurately represent the type and proper proportions of RPC requests, commonly generated on the network.

To produce benchmarks for your system, you may purchase any one of many NFS benchmark traffic generators or you may build your own using UNIX utilities. I chose the latter in this case.

Certain UNIX commands can generate the same RPC requests that are normally generated by the work conducted on your network. Generating a NFS RPC mixture in this fashion can be far more flexible than using a ready-made package. These packages are inflexible, incapable of changing to fit changing workloads. With a script, as the nature of the workload changes on your network, you can reflect the changes in the script.

NFS Traffic Generation Script

The first step in creating your own NFS traffic generation script is to know which RPC requests are generated by the work conducted on your network. To get a listing of the NFS RPC percentages generated by your network, run the nfsstat utility on each of your servers. This information will help you build a script that comes as close as possible to an accurate representation of your network usage.

Next, you will need to know what UNIX utilities generate what RPC requests. Figure 6 gives a sample of some basic utilities and the NFS RPC requests they generate. Use a combination of these in your script to generate the NFS traffic for your benchmarks, paying close attention to the NFS RPC percentages reported for your network.

Listing 1 provides an example of an NFS traffic-generating script. This example is simple, but keep in mind that an NFS traffic generation script can be whatever you want it to be, as long as it closely represents your network workload. For instance, in this scenario, the network is a UNIX network where large CAD and raster files traverse back and forth across network lines. Under heavy network usage, RPC request percentages on a particular client will be approximately 50 percent reads and 40 percent writes, with the remainder divided among various other RPC requests, such as getattr and lookup.

The sample script starts with an uptime report, to give you an indication of your CPU usage. This is not essential; I added it to give an overall picture of the network. What is necessary is that you become superuser before running this script. The reason for this is that the script will next run the nfsstat utility and display the current RPC requests percentages for the client before reinitializing all the percentages back to zero with the nfsstat "-z" parameter. The nfsstat utility requires you to be superuser in order to use the "-z" parameter.

The meat of the script is the series of cp commands. To generate the 50 percent reads and 40 percent writes, the script copies a large file within an NFS directory and then copies it once from the NFS directory to a local client directory and then once more. You may have to experiment to achieve the desired RPC percentages. For example, when this script was being built, it turned out that the client was able to cache fairly large files. With the file located in the cache, four or no disk reads were being requested. To get around this, the file had to be made very much larger -- in the case here, it was 12Mb in size.

Finally, the script performs some cleanup and generates another uptime report along with the final nfsstat client report to check the RPC request percentages produced. Also added for good measure is the spray utility. The script runs spray on each server to give you some idea of the server's current packet handling capabilities.

Remember that your script can be any sequence of UNIX utilities as long as they reflect the RPC requests generated by your network's workload. I used the cp utility here because it generates read and write RPC requests (see Figure 6). You will need to experiment with combinations of utilities to meet your own requirements. It is also a good idea to run your script at various times over several days to see if it will produce close to the same results each time.

Possible NFS Performance Bottlenecks

When examining possible performance bottlenecks on your network, keep in mind that there are two sides to the network: the server side and the client side. Are the server hardware and software inadequate for the client's jobs, or are the client's jobs too numerous and difficult for the server hardware and software?

Server

On the server side, a number of key hardware components can cause bottlenecks and should be watched closely. I mentioned earlier the network interface itself, but some others you should consider on a server are the CPU, memory, and the hard disk.

Regarding the CPU, the concern is not so much the speed of the CPU, although faster is better, but how fast jobs are scheduled for CPU usage. A potential bottleneck is an increased latency in scheduling NFS daemons. nfsd daemons have kernel process priority, and under normal conditions, nfsd daemons are run by the CPU immediately upon an NFS request. But if the server has a number of I/O interrupts or other kernel priority calls running, NFS requests can build while nfsd daemons are waiting for CPU time. A solution might be to limit local access to a server to reduce the number of I/O and kernel priority system calls.

iostat and vmstat provide useful information on CPU job loading.

The main concern regarding memory as a bottleneck is to ensure that the server has enough to handle all its processes. This will reduce page swapping, which can interfere with NFS services.

With hard disks, as with CPUs, the bottleneck is caused not so much by the speed of the drive (although, once again, the faster the better), but the overloading of NFS disk access requests. If you have a disk that receives more than its share of NFS requests, you might want to consider spreading the heavily used filesystems over several disks.

Client

In some instances you might discover that the server isn't the bottleneck of the network. In fact, it might turn out that there is no bottleneck at all, there is only a client that wants too much in too little time. If this is the case, then some constraints must be placed on that client.

A client sends an NFS request to a particular server. If it doesn't receive a reply within the allotted time period, the request will timeout and be retransmitted. The client does not respect the fact that you've tuned the server to the best of its hardware capabilities. It doesn't care if the request is still queued on the server and will be served eventually. All it knows is that it didn't receive a reply in the allotted time, so it sends the request again. The server will then respond even more slowly as NFS requests build.

You may see an indication of this problem with the nfsstat utility. If you run nfsstat with the "-rc" flag and you notice a large number in the badxid field and an even larger number in the timeout field, then it is likely that your client is demanding too much from your server. A simple correction for this problem is to increase the timeout parameter in the mount utility.

Conclusion

Monitoring and optimizing NFS performance is a challenging process. UNIX provides you with useful tools to perform this task. Each of the tools covered here provides extensive capabilities, of which only a small sample were touched upon in this article. I suggest that you experiment with these tools and develop your own cause-and-effect analysis of NFS performance.

Bibliography

Frisch, leen. Essential System Administration. Sebastapol, CA: O'Reilly & Associates, Inc., 1991.

Peek, Jerry., Tim O'Reilly, and Mike Loukides. UNIX Power Tools. Sebastopol, CA: O'Reilly & Associates/Bantam Books, 1993.

Stern, Hal. Managing NFS and NIS. Sebastopol, CA: O'Reilly & Associates, Inc., 1992.

About the Author

Robert Berry has been working with SunOS and DG/UX since 1991. He received his BS degree from the University of Maryland and is working on an MS degree from the University of West Florida. He is currently the Systems Administrator and Networking Manager at Spectrum Sciences & Software, Inc. His interests are in PC-to-UNIX networking and network programming. Robert Berry can be contacted at 242 Vickie Leigh Rd., Fort Walton Beach, FL 32547. Fax (904) 862-8111.