Monitoring Performance with iostat and vmstat
William Genosa
As a system administrator, I use a variety of tools
including sar,
ps, trace, iostat, and vmstat, to
identify problems related to system performance. In
learning how to
use iostat and vmstat, I saw that both could be much
more useful if they could archive reports in the same
fashion as sar.
I could then use iostat and vmstat to monitor applications
that run for several hours to see where I might be able
to improve
performance.
sar produces reports based on data collected from the
sadc
command, which must be set up to run in cron at the
desired
interval (I run it every 15 minutes on my systems).
The data files
are stored in the directory /var/adm/sa. The name of
each
data file includes the day of the month, such that sa15
would
be the data file for the 15th of the month. With this
scheme in mind,
I have written a shell program to collect statistics
from iostat
and vmstat. This article explains how the program works
and
provides an example of how the output files look. Also
included is
a discussion of the iostat and vmstat commands.
The name of the program is sysstat (Listing 1), and
there
is a cron entry to run this script every fifteen minutes.
I begin my program by defining where the output files
will be stored
in the filesystem. (Note that it is always a good idea
to avoid hard-coding
file names in your code: instead, assign filenames to
variables so
that your code can be easily modified.) I chose to place
the output
files in the directory /var/adm/stat. The TIME variable
is used to create a timestamp of each record that is
appended to the
output files. It is also used to check for midnight,
which is the
first time cron will run sysstat. If the output files
do not exist, they are created. If they do exist, the
data in the
output files will be from a prior month. That data must
be truncated
before the program appends the data for the current
day.
The program then uses a for loop to collect statistics
on
every drive on the system. The lspv command (list physical
volume) builds the list, and awk extracts the first
field,
which is the name of the physical volume. The iostat
command
is run for four iterations, sampling every two seconds.
The sampling
rate was kept short because there are ten drives on
this system. If
your machine has fewer disks, you may want to increase
the sampling
interval to five seconds. The output of iostat is piped
into
grep to extract only the disk information. There will
be four
lines of information for each disk, one for each iteration
of iostat.
The first line is discarded by the tail command (see
the description
of iostat below). The remaining three lines are piped
into
awk, which splits each field into an array. Each array
will
hold three values which must be summed and then divided
by three to
get an average. awk outputs the average, using the printf
statement to preserve the correct number of spaces between
fields
and to ensure the correct number of digits to the right
of the decimal.
The TIME variable is output for the first line in each
stanza
(hdisk0). Note that the same logic applies to the collection
of vmstat statistics. Figure 1 is a sample output report
for
iostat; Figure 2 shows a sample for vmstat.
iostat
The syntax of iostat is:
iostat [sampling interval[number of iterations]]
Thus the command iostat 3 4 samples and produces
output every three seconds for four iterations. The
first report always
provides cumulative statistics since the last system
reboot and should
be ignored, as it does not accurately represent the
current system
activity. The remaining reports provide statistics gathered
between
the sampling intervals and will provide a more accurate
snapshot of
how the system is managing its resources. The iostat
command
produces output in a two-part report. The first part
reports on cpu
activity and the second, on disk I/O activity. My program
uses only
the disk I/O statistics because the vmstat command produces
a more comprehensive report of cpu statistics. The following
is a
description of each field output by the disk I/O section
of an iostat
report:
%tm_act -- The percent of time the disk
was active or the bandwidth utilization of the drive.
Kbps -- Amount of data read and written
in kb/sec for the drive.
tps -- Transfers/sec (i/o requests) made
to the disk. A single i/o request can be made up of
several logical
requests.
msps -- Average milliseconds per seek.
Most disks do not report this data.
Kb_read -- The number of kilo bytes read
from the drive.
Kb_wrtn -- The number of kilo bytes written
to the drive.
These statistics can be used to identify disk I/O delays
due to poor
load balancing. You will probably observe that the disk
which contains
the operating system will show higher activity than
other disks on
the system. This is normal and to be expected, but perhaps
applications
running on your system can be spread across physical
volumes that
are separate from the operating system. Strategic placement
of executables,
data, and temporary work areas can significantly improve
system performance.
One final note: in order for iostat to function under
AIX,
the operating system attribute to "continuously
maintain disk
i/o history" must be set to true. You can check
your system with
the following command, which will list the effective
attribute iostat
for logical device sys0:
lsattr -l'sys0' -E -a 'iostat'
If this attribute is set to false on your system, you
can set it to true with the following command, which
changes logical
device attribute iostat:
chdev -l 'sys0' -a'iostat=true'
vmstat
The syntax for vmstat is the same as for iostat. Also
similarly, the first report from vmstat contains cumulative
statistics from the last system reboot and should be
ignored. vmstat
reports on 17 statistics grouped under five categories.
The categories
and statistics are listed below, along with a description.
Processes
The AIX operating system is a multitasking operating
system which
allows all processes to compete for use of the cpu.
The scheduler
determines when processes will run. Each process is
assigned a priority
and a slot in the process table. Processes must be in
real memory
to run. If a process is scheduled to run but a memory
page for part
of that process is not in real memory, that process
is blocked and
placed in the wait queue. Processes ready to run are
placed in the
runque. vmstat reports on processes which are in the
runque
and processes that are blocked.
r -- The number of runnable processes in
the runque. This number should be a single-digit number
on a healthy
and stable system.
b -- The number of processes scheduled
to be executed but blocked, waiting for the virtual
memory manager
to page the part of that process which is on disk into
real memory.
This number should also be a single digit on a healthy
and stable
system.
Memory
Memory is controlled by the virtual memory manager.
Virtual memory
includes all of real memory as well as all the paging
space. Disk
paging space allows the virtual memory manager to overbook
real memory.
Virtual memory addresses must be translated into real
memory addresses
by the virtual memory manager. Address translations
take time to resolve,
so the virtual memory manager caches frequently used
memory addresses
in Translation Lookaside Buffers. A page fault occurs
when the virtual
memory manager attempts to access a memory page that
is not in real
memory. Real memory that is not used is placed in the
free list. The
virtual memory manager is responsible for maintaining
the free list.
avm -- The number of active 4Kb disk blocks
being used for page space or back store.
fre -- The number of available 4Kb real
memory frames. This number should be high right after
you reboot your
system. As applications require memory, the virtual
memory manager
will allocate real memory from the free list to those
applications.
The virtual memory manager will try to maintain the
free list above
the operating system parameter MINFREE. If the virtual
memory manager
needs to free memory, it will page out real memory frames
to disk
back store.
Paging
Virtual memory address space is partitioned into segments
of 256 Mb
of contiguous space. Segments are further partitioned
into pages.
There are different types of segments. A persistent
segment is used
to permanently store pages that are part of files and
executables.
Working segments use the paging space or back store
for transitory
pages with no permanent storage space. Process stack
and data regions,
as well as shared library text regions, will be paged
out to working
segments.
re -- The number of currently unused frames
reclaimed by the system after they were placed back
on the free list.
The number of frames accessed as a result of a read
ahead pre-fetch
from disk is also reported under this column.
pi -- The number of page-ins from disk.
po -- The number of page-outs to disk.
fr -- The number of frames freed to replenish
the free list.
sr -- The number of frames examined for
page out. The virtual memory manager uses various criteria
when selecting
the frames which can be placed back on the free list.
The idea is
not to page out frames that may soon be needed again.
cy -- Real memory frames are referenced
by the virtual memory manager through a Page Frame Table.
This statistic
indicates the number of cycles the virtual memory manager
made while
scanning the entire Page Frame Table in search of candidates
to be
placed back on the free list.
Faults
A fault is defined as an interrupt. Interrupts can either
be hardware
or software interrupts. A disk interrupt would be an
example of a
hardware interrupt. A system call is an example of a
software interrupt
implemented with a software interrupt instruction that
branches to
the system call handler routine.
in -- The number of device or hardware
interrupts. This number will never be less than 100
due to the 10-millisecond
system clock.
sy -- The number of system calls. System
calls allow user processes to exchange data with the
kernel and use
system resources such as disk I/O.
cs -- The number of context switches. Because
AIX is a multitasking operating system, all processes
appear to run
simultaneously. In actuality, cpu time is given to each
process in
time slices. When a process has used up its time slice,
it must relinquish
the cpu to another process. The cpu must save the working
environment
of the current process and load in a new working environment
for the
next process to be executed. This is known as a context
switch. AIX,
in combination with the RS/6000 architecture, handles
context switches
very efficiently.
CPU
A process that executes within its own code and does
not require the
system or kernel resources is operating in user mode.
While a process
is executing system calls, it is operating in kernel
or system mode.
us -- The percent of time the cpu is operating
in user mode.
sy -- The percent of time the cpu is operating
in kernel mode.
id -- The percent of time the cpu is idle
with no processes available for execution and no pending
i/o.
wa -- The percent of time the cpu is idle
with no processes available for execution but with pending
i/o requests.
Conclusion
I want to conclude with the advice that it is always
easier to diagnose
problems if you have a profile of normal system activity.
Understanding
how your system performs before trouble occurs can help
immeasurably
in the determination of performance problems.
Bibliography
IBM. Performance Monitoring and Tuning Guide.
IBM Publication SC23-2365-01.
Loukides, Mike. System Performance Tuning.
Sebastopol, CA: O'Reilly & Associates, 1990.
Heise, Russell. "The vmstat Tool," IBM AIXtra:
September/October 1993.
About the Author
Bill Genosa is a systems administrator for American
Express, where he
has responsibility for RS6000 workstations and servers.
He can be reached at 186 Bryant Avenue, Floral Park,
NY 11001,
or via email as wgenosa@attmail.com.
|