| Monitoring Performance with iostat and vmstat
 
William Genosa 
As a system administrator, I use a variety of tools
including sar, 
ps, trace, iostat, and vmstat, to 
identify problems related to system performance. In
learning how to 
use iostat and vmstat, I saw that both could be much
more useful if they could archive reports in the same
fashion as sar. 
I could then use iostat and vmstat to monitor applications
that run for several hours to see where I might be able
to improve 
performance. 
sar produces reports based on data collected from the
sadc 
command, which must be set up to run in cron at the
desired 
interval (I run it every 15 minutes on my systems).
The data files 
are stored in the directory /var/adm/sa. The name of
each 
data file includes the day of the month, such that sa15
would 
be the data file for the 15th of the month. With this
scheme in mind, 
I have written a shell program to collect statistics
from iostat 
and vmstat. This article explains how the program works
and 
provides an example of how the output files look. Also
included is 
a discussion of the iostat and vmstat commands. 
The name of the program is sysstat (Listing 1), and
there 
is a cron entry to run this script every fifteen minutes.
I begin my program by defining where the output files
will be stored 
in the filesystem. (Note that it is always a good idea
to avoid hard-coding 
file names in your code: instead, assign filenames to
variables so 
that your code can be easily modified.) I chose to place
the output 
files in the directory /var/adm/stat. The TIME variable
is used to create a timestamp of each record that is
appended to the 
output files. It is also used to check for midnight,
which is the 
first time cron will run sysstat. If the output files
do not exist, they are created. If they do exist, the
data in the 
output files will be from a prior month. That data must
be truncated 
before the program appends the data for the current
day. 
The program then uses a for loop to collect statistics
on 
every drive on the system. The lspv command (list physical
volume) builds the list, and awk extracts the first
field, 
which is the name of the physical volume. The iostat
command 
is run for four iterations, sampling every two seconds.
The sampling 
rate was kept short because there are ten drives on
this system. If 
your machine has fewer disks, you may want to increase
the sampling 
interval to five seconds. The output of iostat is piped
into 
grep to extract only the disk information. There will
be four 
lines of information for each disk, one for each iteration
of iostat. 
The first line is discarded by the tail command (see
the description 
of iostat below). The remaining three lines are piped
into 
awk, which splits each field into an array. Each array
will 
hold three values which must be summed and then divided
by three to 
get an average. awk outputs the average, using the printf
statement to preserve the correct number of spaces between
fields 
and to ensure the correct number of digits to the right
of the decimal. 
The TIME variable is output for the first line in each
stanza 
(hdisk0). Note that the same logic applies to the collection
of vmstat statistics. Figure 1 is a sample output report
for 
iostat; Figure 2 shows a sample for vmstat. 
iostat 
The syntax of iostat is: 
 
iostat [sampling interval[number of iterations]] 
 
Thus the command iostat 3 4 samples and produces 
output every three seconds for four iterations. The
first report always 
provides cumulative statistics since the last system
reboot and should 
be ignored, as it does not accurately represent the
current system 
activity. The remaining reports provide statistics gathered
between 
the sampling intervals and will provide a more accurate
snapshot of 
how the system is managing its resources. The iostat
command 
produces output in a two-part report. The first part
reports on cpu 
activity and the second, on disk I/O activity. My program
uses only 
the disk I/O statistics because the vmstat command produces
a more comprehensive report of cpu statistics. The following
is a 
description of each field output by the disk I/O section
of an iostat 
report: 
%tm_act -- The percent of time the disk 
was active or the bandwidth utilization of the drive. 
Kbps -- Amount of data read and written 
in kb/sec for the drive. 
tps -- Transfers/sec (i/o requests) made 
to the disk. A single i/o request can be made up of
several logical 
requests. 
msps -- Average milliseconds per seek. 
Most disks do not report this data. 
Kb_read -- The number of kilo bytes read 
from the drive. 
Kb_wrtn -- The number of kilo bytes written 
to the drive. 
These statistics can be used to identify disk I/O delays
due to poor 
load balancing. You will probably observe that the disk
which contains 
the operating system will show higher activity than
other disks on 
the system. This is normal and to be expected, but perhaps
applications 
running on your system can be spread across physical
volumes that 
are separate from the operating system. Strategic placement
of executables, 
data, and temporary work areas can significantly improve
system performance. 
One final note: in order for iostat to function under
AIX, 
the operating system attribute to "continuously
maintain disk 
i/o history" must be set to true. You can check
your system with 
the following command, which will list the effective
attribute iostat 
for logical device sys0: 
 
lsattr -l'sys0' -E -a 'iostat' 
 
If this attribute is set to false on your system, you
can set it to true with the following command, which
changes logical 
device attribute iostat: 
 
chdev -l 'sys0' -a'iostat=true' 
 
vmstat 
The syntax for vmstat is the same as for iostat. Also
similarly, the first report from vmstat contains cumulative
statistics from the last system reboot and should be
ignored. vmstat 
reports on 17 statistics grouped under five categories.
The categories 
and statistics are listed below, along with a description. 
Processes 
The AIX operating system is a multitasking operating
system which 
allows all processes to compete for use of the cpu.
The scheduler 
determines when processes will run. Each process is
assigned a priority 
and a slot in the process table. Processes must be in
real memory 
to run. If a process is scheduled to run but a memory
page for part 
of that process is not in real memory, that process
is blocked and 
placed in the wait queue. Processes ready to run are
placed in the 
runque. vmstat reports on processes which are in the
runque 
and processes that are blocked. 
r -- The number of runnable processes in 
the runque. This number should be a single-digit number
on a healthy 
and stable system. 
b -- The number of processes scheduled 
to be executed but blocked, waiting for the virtual
memory manager 
to page the part of that process which is on disk into
real memory. 
This number should also be a single digit on a healthy
and stable 
system. 
Memory 
Memory is controlled by the virtual memory manager.
Virtual memory 
includes all of real memory as well as all the paging
space. Disk 
paging space allows the virtual memory manager to overbook
real memory. 
Virtual memory addresses must be translated into real
memory addresses 
by the virtual memory manager. Address translations
take time to resolve, 
so the virtual memory manager caches frequently used
memory addresses 
in Translation Lookaside Buffers. A page fault occurs
when the virtual 
memory manager attempts to access a memory page that
is not in real 
memory. Real memory that is not used is placed in the
free list. The 
virtual memory manager is responsible for maintaining
the free list. 
avm -- The number of active 4Kb disk blocks 
being used for page space or back store. 
fre -- The number of available 4Kb real 
memory frames. This number should be high right after
you reboot your 
system. As applications require memory, the virtual
memory manager 
will allocate real memory from the free list to those
applications. 
The virtual memory manager will try to maintain the
free list above 
the operating system parameter MINFREE. If the virtual
memory manager 
needs to free memory, it will page out real memory frames
to disk 
back store. 
Paging 
Virtual memory address space is partitioned into segments
of 256 Mb 
of contiguous space. Segments are further partitioned
into pages. 
There are different types of segments. A persistent
segment is used 
to permanently store pages that are part of files and
executables. 
Working segments use the paging space or back store
for transitory 
pages with no permanent storage space. Process stack
and data regions, 
as well as shared library text regions, will be paged
out to working 
segments. 
re -- The number of currently unused frames 
reclaimed by the system after they were placed back
on the free list. 
The number of frames accessed as a result of a read
ahead pre-fetch 
from disk is also reported under this column. 
pi -- The number of page-ins from disk. 
po -- The number of page-outs to disk. 
fr -- The number of frames freed to replenish 
the free list. 
sr -- The number of frames examined for 
page out. The virtual memory manager uses various criteria
when selecting 
the frames which can be placed back on the free list.
The idea is 
not to page out frames that may soon be needed again. 
cy -- Real memory frames are referenced 
by the virtual memory manager through a Page Frame Table.
This statistic 
indicates the number of cycles the virtual memory manager
made while 
scanning the entire Page Frame Table in search of candidates
to be 
placed back on the free list. 
Faults 
A fault is defined as an interrupt. Interrupts can either
be hardware 
or software interrupts. A disk interrupt would be an
example of a 
hardware interrupt. A system call is an example of a
software interrupt 
implemented with a software interrupt instruction that
branches to 
the system call handler routine. 
in -- The number of device or hardware 
interrupts. This number will never be less than 100
due to the 10-millisecond 
system clock. 
sy -- The number of system calls. System 
calls allow user processes to exchange data with the
kernel and use 
system resources such as disk I/O. 
cs -- The number of context switches. Because 
AIX is a multitasking operating system, all processes
appear to run 
simultaneously. In actuality, cpu time is given to each
process in 
time slices. When a process has used up its time slice,
it must relinquish 
the cpu to another process. The cpu must save the working
environment 
of the current process and load in a new working environment
for the 
next process to be executed. This is known as a context
switch. AIX, 
in combination with the RS/6000 architecture, handles
context switches 
very efficiently. 
CPU 
A process that executes within its own code and does
not require the 
system or kernel resources is operating in user mode.
While a process 
is executing system calls, it is operating in kernel
or system mode. 
us -- The percent of time the cpu is operating 
in user mode. 
sy -- The percent of time the cpu is operating 
in kernel mode. 
id -- The percent of time the cpu is idle 
with no processes available for execution and no pending
i/o. 
wa -- The percent of time the cpu is idle 
with no processes available for execution but with pending
i/o requests. 
Conclusion 
I want to conclude with the advice that it is always
easier to diagnose 
problems if you have a profile of normal system activity.
Understanding 
how your system performs before trouble occurs can help
immeasurably 
in the determination of performance problems. 
Bibliography 
IBM. Performance Monitoring and Tuning Guide. 
IBM Publication SC23-2365-01. 
Loukides, Mike. System Performance Tuning. 
Sebastopol, CA: O'Reilly & Associates, 1990. 
Heise, Russell. "The vmstat Tool," IBM AIXtra:
September/October 1993.  
 
 About the Author
 
Bill Genosa is a systems administrator for American
Express, where he
has responsibility for RS6000 workstations and servers.
He can be reached at 186 Bryant Avenue, Floral Park,
NY 11001,
or via email as wgenosa@attmail.com. 
 
 
 |