Article

Monitoring Performance with iostat and vmstat

William Genosa

As a system administrator, I use a variety of tools including sar, ps, trace, iostat, and vmstat, to identify problems related to system performance. In learning how to use iostat and vmstat, I saw that both could be much more useful if they could archive reports in the same fashion as sar. I could then use iostat and vmstat to monitor applications that run for several hours to see where I might be able to improve performance.

sar produces reports based on data collected from the sadc command, which must be set up to run in cron at the desired interval (I run it every 15 minutes on my systems). The data files are stored in the directory /var/adm/sa. The name of each data file includes the day of the month, such that sa15 would be the data file for the 15th of the month. With this scheme in mind, I have written a shell program to collect statistics from iostat and vmstat. This article explains how the program works and provides an example of how the output files look. Also included is a discussion of the iostat and vmstat commands.

The name of the program is sysstat (Listing 1), and there is a cron entry to run this script every fifteen minutes. I begin my program by defining where the output files will be stored in the filesystem. (Note that it is always a good idea to avoid hard-coding file names in your code: instead, assign filenames to variables so that your code can be easily modified.) I chose to place the output files in the directory /var/adm/stat. The TIME variable is used to create a timestamp of each record that is appended to the output files. It is also used to check for midnight, which is the first time cron will run sysstat. If the output files do not exist, they are created. If they do exist, the data in the output files will be from a prior month. That data must be truncated before the program appends the data for the current day.

The program then uses a for loop to collect statistics on every drive on the system. The lspv command (list physical volume) builds the list, and awk extracts the first field, which is the name of the physical volume. The iostat command is run for four iterations, sampling every two seconds. The sampling rate was kept short because there are ten drives on this system. If your machine has fewer disks, you may want to increase the sampling interval to five seconds. The output of iostat is piped into grep to extract only the disk information. There will be four lines of information for each disk, one for each iteration of iostat. The first line is discarded by the tail command (see the description of iostat below). The remaining three lines are piped into awk, which splits each field into an array. Each array will hold three values which must be summed and then divided by three to get an average. awk outputs the average, using the printf statement to preserve the correct number of spaces between fields and to ensure the correct number of digits to the right of the decimal. The TIME variable is output for the first line in each stanza (hdisk0). Note that the same logic applies to the collection of vmstat statistics. Figure 1 is a sample output report for iostat; Figure 2 shows a sample for vmstat.

iostat

The syntax of iostat is:

iostat [sampling interval[number of iterations]]

Thus the command iostat 3 4 samples and produces output every three seconds for four iterations. The first report always provides cumulative statistics since the last system reboot and should be ignored, as it does not accurately represent the current system activity. The remaining reports provide statistics gathered between the sampling intervals and will provide a more accurate snapshot of how the system is managing its resources. The iostat command produces output in a two-part report. The first part reports on cpu activity and the second, on disk I/O activity. My program uses only the disk I/O statistics because the vmstat command produces a more comprehensive report of cpu statistics. The following is a description of each field output by the disk I/O section of an iostat report:

%tm_act -- The percent of time the disk was active or the bandwidth utilization of the drive.

Kbps -- Amount of data read and written in kb/sec for the drive.

tps -- Transfers/sec (i/o requests) made to the disk. A single i/o request can be made up of several logical requests.

msps -- Average milliseconds per seek. Most disks do not report this data.

Kb_read -- The number of kilo bytes read from the drive.

Kb_wrtn -- The number of kilo bytes written to the drive.

These statistics can be used to identify disk I/O delays due to poor load balancing. You will probably observe that the disk which contains the operating system will show higher activity than other disks on the system. This is normal and to be expected, but perhaps applications running on your system can be spread across physical volumes that are separate from the operating system. Strategic placement of executables, data, and temporary work areas can significantly improve system performance. One final note: in order for iostat to function under AIX, the operating system attribute to "continuously maintain disk i/o history" must be set to true. You can check your system with the following command, which will list the effective attribute iostat for logical device sys0:

lsattr -l'sys0' -E -a 'iostat'

If this attribute is set to false on your system, you can set it to true with the following command, which changes logical device attribute iostat:

chdev -l 'sys0' -a'iostat=true'

vmstat

The syntax for vmstat is the same as for iostat. Also similarly, the first report from vmstat contains cumulative statistics from the last system reboot and should be ignored. vmstat reports on 17 statistics grouped under five categories. The categories and statistics are listed below, along with a description.

Processes

The AIX operating system is a multitasking operating system which allows all processes to compete for use of the cpu. The scheduler determines when processes will run. Each process is assigned a priority and a slot in the process table. Processes must be in real memory to run. If a process is scheduled to run but a memory page for part of that process is not in real memory, that process is blocked and placed in the wait queue. Processes ready to run are placed in the runque. vmstat reports on processes which are in the runque and processes that are blocked.

r -- The number of runnable processes in the runque. This number should be a single-digit number on a healthy and stable system.

b -- The number of processes scheduled to be executed but blocked, waiting for the virtual memory manager to page the part of that process which is on disk into real memory. This number should also be a single digit on a healthy and stable system.

Memory

Memory is controlled by the virtual memory manager. Virtual memory includes all of real memory as well as all the paging space. Disk paging space allows the virtual memory manager to overbook real memory. Virtual memory addresses must be translated into real memory addresses by the virtual memory manager. Address translations take time to resolve, so the virtual memory manager caches frequently used memory addresses in Translation Lookaside Buffers. A page fault occurs when the virtual memory manager attempts to access a memory page that is not in real memory. Real memory that is not used is placed in the free list. The virtual memory manager is responsible for maintaining the free list.

avm -- The number of active 4Kb disk blocks being used for page space or back store.

fre -- The number of available 4Kb real memory frames. This number should be high right after you reboot your system. As applications require memory, the virtual memory manager will allocate real memory from the free list to those applications. The virtual memory manager will try to maintain the free list above the operating system parameter MINFREE. If the virtual memory manager needs to free memory, it will page out real memory frames to disk back store.

Paging

Virtual memory address space is partitioned into segments of 256 Mb of contiguous space. Segments are further partitioned into pages. There are different types of segments. A persistent segment is used to permanently store pages that are part of files and executables. Working segments use the paging space or back store for transitory pages with no permanent storage space. Process stack and data regions, as well as shared library text regions, will be paged out to working segments.

re -- The number of currently unused frames reclaimed by the system after they were placed back on the free list. The number of frames accessed as a result of a read ahead pre-fetch from disk is also reported under this column.

pi -- The number of page-ins from disk.

po -- The number of page-outs to disk.

fr -- The number of frames freed to replenish the free list.

sr -- The number of frames examined for page out. The virtual memory manager uses various criteria when selecting the frames which can be placed back on the free list. The idea is not to page out frames that may soon be needed again.

cy -- Real memory frames are referenced by the virtual memory manager through a Page Frame Table. This statistic indicates the number of cycles the virtual memory manager made while scanning the entire Page Frame Table in search of candidates to be placed back on the free list.

Faults

A fault is defined as an interrupt. Interrupts can either be hardware or software interrupts. A disk interrupt would be an example of a hardware interrupt. A system call is an example of a software interrupt implemented with a software interrupt instruction that branches to the system call handler routine.

in -- The number of device or hardware interrupts. This number will never be less than 100 due to the 10-millisecond system clock.

sy -- The number of system calls. System calls allow user processes to exchange data with the kernel and use system resources such as disk I/O.

cs -- The number of context switches. Because AIX is a multitasking operating system, all processes appear to run simultaneously. In actuality, cpu time is given to each process in time slices. When a process has used up its time slice, it must relinquish the cpu to another process. The cpu must save the working environment of the current process and load in a new working environment for the next process to be executed. This is known as a context switch. AIX, in combination with the RS/6000 architecture, handles context switches very efficiently.

CPU

A process that executes within its own code and does not require the system or kernel resources is operating in user mode. While a process is executing system calls, it is operating in kernel or system mode.

us -- The percent of time the cpu is operating in user mode.

sy -- The percent of time the cpu is operating in kernel mode.

id -- The percent of time the cpu is idle with no processes available for execution and no pending i/o.

wa -- The percent of time the cpu is idle with no processes available for execution but with pending i/o requests.

Conclusion

I want to conclude with the advice that it is always easier to diagnose problems if you have a profile of normal system activity. Understanding how your system performs before trouble occurs can help immeasurably in the determination of performance problems.

Bibliography

IBM. Performance Monitoring and Tuning Guide. IBM Publication SC23-2365-01.

Loukides, Mike. System Performance Tuning. Sebastopol, CA: O'Reilly & Associates, 1990.

Heise, Russell. "The vmstat Tool," IBM AIXtra: September/October 1993.

About the Author

Bill Genosa is a systems administrator for American Express, where he has responsibility for RS6000 workstations and servers. He can be reached at 186 Bryant Avenue, Floral Park, NY 11001, or via email as wgenosa@attmail.com.