Monitoring Performance with iostat and vmstat
As a system administrator, I use a variety of tools
ps, trace, iostat, and vmstat, to
identify problems related to system performance. In
learning how to
use iostat and vmstat, I saw that both could be much
more useful if they could archive reports in the same
fashion as sar.
I could then use iostat and vmstat to monitor applications
that run for several hours to see where I might be able
sar produces reports based on data collected from the
command, which must be set up to run in cron at the
interval (I run it every 15 minutes on my systems).
The data files
are stored in the directory /var/adm/sa. The name of
data file includes the day of the month, such that sa15
be the data file for the 15th of the month. With this
scheme in mind,
I have written a shell program to collect statistics
and vmstat. This article explains how the program works
provides an example of how the output files look. Also
a discussion of the iostat and vmstat commands.
The name of the program is sysstat (Listing 1), and
is a cron entry to run this script every fifteen minutes.
I begin my program by defining where the output files
will be stored
in the filesystem. (Note that it is always a good idea
to avoid hard-coding
file names in your code: instead, assign filenames to
that your code can be easily modified.) I chose to place
files in the directory /var/adm/stat. The TIME variable
is used to create a timestamp of each record that is
appended to the
output files. It is also used to check for midnight,
which is the
first time cron will run sysstat. If the output files
do not exist, they are created. If they do exist, the
data in the
output files will be from a prior month. That data must
before the program appends the data for the current
The program then uses a for loop to collect statistics
every drive on the system. The lspv command (list physical
volume) builds the list, and awk extracts the first
which is the name of the physical volume. The iostat
is run for four iterations, sampling every two seconds.
rate was kept short because there are ten drives on
this system. If
your machine has fewer disks, you may want to increase
interval to five seconds. The output of iostat is piped
grep to extract only the disk information. There will
lines of information for each disk, one for each iteration
The first line is discarded by the tail command (see
of iostat below). The remaining three lines are piped
awk, which splits each field into an array. Each array
hold three values which must be summed and then divided
by three to
get an average. awk outputs the average, using the printf
statement to preserve the correct number of spaces between
and to ensure the correct number of digits to the right
of the decimal.
The TIME variable is output for the first line in each
(hdisk0). Note that the same logic applies to the collection
of vmstat statistics. Figure 1 is a sample output report
iostat; Figure 2 shows a sample for vmstat.
The syntax of iostat is:
iostat [sampling interval[number of iterations]]
Thus the command iostat 3 4 samples and produces
output every three seconds for four iterations. The
first report always
provides cumulative statistics since the last system
reboot and should
be ignored, as it does not accurately represent the
activity. The remaining reports provide statistics gathered
the sampling intervals and will provide a more accurate
how the system is managing its resources. The iostat
produces output in a two-part report. The first part
reports on cpu
activity and the second, on disk I/O activity. My program
the disk I/O statistics because the vmstat command produces
a more comprehensive report of cpu statistics. The following
description of each field output by the disk I/O section
of an iostat
%tm_act -- The percent of time the disk
was active or the bandwidth utilization of the drive.
Kbps -- Amount of data read and written
in kb/sec for the drive.
tps -- Transfers/sec (i/o requests) made
to the disk. A single i/o request can be made up of
msps -- Average milliseconds per seek.
Most disks do not report this data.
Kb_read -- The number of kilo bytes read
from the drive.
Kb_wrtn -- The number of kilo bytes written
to the drive.
These statistics can be used to identify disk I/O delays
due to poor
load balancing. You will probably observe that the disk
the operating system will show higher activity than
other disks on
the system. This is normal and to be expected, but perhaps
running on your system can be spread across physical
are separate from the operating system. Strategic placement
data, and temporary work areas can significantly improve
One final note: in order for iostat to function under
the operating system attribute to "continuously
i/o history" must be set to true. You can check
your system with
the following command, which will list the effective
for logical device sys0:
lsattr -l'sys0' -E -a 'iostat'
If this attribute is set to false on your system, you
can set it to true with the following command, which
device attribute iostat:
chdev -l 'sys0' -a'iostat=true'
The syntax for vmstat is the same as for iostat. Also
similarly, the first report from vmstat contains cumulative
statistics from the last system reboot and should be
reports on 17 statistics grouped under five categories.
and statistics are listed below, along with a description.
The AIX operating system is a multitasking operating
allows all processes to compete for use of the cpu.
determines when processes will run. Each process is
assigned a priority
and a slot in the process table. Processes must be in
to run. If a process is scheduled to run but a memory
page for part
of that process is not in real memory, that process
is blocked and
placed in the wait queue. Processes ready to run are
placed in the
runque. vmstat reports on processes which are in the
and processes that are blocked.
r -- The number of runnable processes in
the runque. This number should be a single-digit number
on a healthy
and stable system.
b -- The number of processes scheduled
to be executed but blocked, waiting for the virtual
to page the part of that process which is on disk into
This number should also be a single digit on a healthy
Memory is controlled by the virtual memory manager.
includes all of real memory as well as all the paging
paging space allows the virtual memory manager to overbook
Virtual memory addresses must be translated into real
by the virtual memory manager. Address translations
take time to resolve,
so the virtual memory manager caches frequently used
in Translation Lookaside Buffers. A page fault occurs
when the virtual
memory manager attempts to access a memory page that
is not in real
memory. Real memory that is not used is placed in the
free list. The
virtual memory manager is responsible for maintaining
the free list.
avm -- The number of active 4Kb disk blocks
being used for page space or back store.
fre -- The number of available 4Kb real
memory frames. This number should be high right after
you reboot your
system. As applications require memory, the virtual
will allocate real memory from the free list to those
The virtual memory manager will try to maintain the
free list above
the operating system parameter MINFREE. If the virtual
needs to free memory, it will page out real memory frames
Virtual memory address space is partitioned into segments
of 256 Mb
of contiguous space. Segments are further partitioned
There are different types of segments. A persistent
segment is used
to permanently store pages that are part of files and
Working segments use the paging space or back store
pages with no permanent storage space. Process stack
and data regions,
as well as shared library text regions, will be paged
out to working
re -- The number of currently unused frames
reclaimed by the system after they were placed back
on the free list.
The number of frames accessed as a result of a read
from disk is also reported under this column.
pi -- The number of page-ins from disk.
po -- The number of page-outs to disk.
fr -- The number of frames freed to replenish
the free list.
sr -- The number of frames examined for
page out. The virtual memory manager uses various criteria
the frames which can be placed back on the free list.
The idea is
not to page out frames that may soon be needed again.
cy -- Real memory frames are referenced
by the virtual memory manager through a Page Frame Table.
indicates the number of cycles the virtual memory manager
scanning the entire Page Frame Table in search of candidates
placed back on the free list.
A fault is defined as an interrupt. Interrupts can either
or software interrupts. A disk interrupt would be an
example of a
hardware interrupt. A system call is an example of a
implemented with a software interrupt instruction that
the system call handler routine.
in -- The number of device or hardware
interrupts. This number will never be less than 100
due to the 10-millisecond
sy -- The number of system calls. System
calls allow user processes to exchange data with the
kernel and use
system resources such as disk I/O.
cs -- The number of context switches. Because
AIX is a multitasking operating system, all processes
appear to run
simultaneously. In actuality, cpu time is given to each
time slices. When a process has used up its time slice,
it must relinquish
the cpu to another process. The cpu must save the working
of the current process and load in a new working environment
next process to be executed. This is known as a context
in combination with the RS/6000 architecture, handles
A process that executes within its own code and does
not require the
system or kernel resources is operating in user mode.
While a process
is executing system calls, it is operating in kernel
or system mode.
us -- The percent of time the cpu is operating
in user mode.
sy -- The percent of time the cpu is operating
in kernel mode.
id -- The percent of time the cpu is idle
with no processes available for execution and no pending
wa -- The percent of time the cpu is idle
with no processes available for execution but with pending
I want to conclude with the advice that it is always
easier to diagnose
problems if you have a profile of normal system activity.
how your system performs before trouble occurs can help
in the determination of performance problems.
IBM. Performance Monitoring and Tuning Guide.
IBM Publication SC23-2365-01.
Loukides, Mike. System Performance Tuning.
Sebastopol, CA: O'Reilly & Associates, 1990.
Heise, Russell. "The vmstat Tool," IBM AIXtra:
About the Author
Bill Genosa is a systems administrator for American
Express, where he
has responsibility for RS6000 workstations and servers.
He can be reached at 186 Bryant Avenue, Floral Park,
or via email as firstname.lastname@example.org.