The System Performance Monitor
William Genosa
Cars equipped with gauges often provide some indication
of problems
before they become serious. Cars equipped with idiot
lights often
alert the driver after the problem has already become
severe. The
System Performance Monitor is designed as a set of gauges
for a computer.
The program provides for visual indication of system
activity as well
as a means of alerting the system administrator before
potential problems
cause poor performance and unscheduled down time.
The program was written on a 3B2 1000-80 running AT&T
System V Rel
3.2.2 and used for database applications. As the number
of concurrent
users steadily increased, the performance of the machine
degraded.
The purpose of the program is to notify the system administrator
when
system tunable parameters may need to be adjusted.
Program Overview
The program is designed around the sar utility supplied
with
the operating system. sar collects performance statistics
at
time intervals determined by a cron entry for sys. The
System Performance Monitor can also be started by a
cron entry
and should be adjusted to run at the same frequency
as sar.
The program redirects output to /dev/tty22 where the
information
is displayed on a terminal. Warnings are sent to the
system console
when vital thresholds are exceeded. The thresholds are
determined
by examination of the current system tunable parameters.
AT&T provides
the command /etc/sysdef to display the current settings.
How It Works
The program starts by finding the number of users currently
logged
into the system. It counts TCP/IP rlogin and telnet
sessions by looking for pseudo tty's, which are designated
by a letter following the tty description. This example
is
configured for 64 pseudo tty's, the maximum allowed
on this
machine. The program next examines the runqueu to determine
the number
of processes in memory waiting execution (jobs scheduled
for I/O are
not part of this figure). If the runqueue is high (above
five), expect
the runqueue to be occupied for a high percentage of
time. If it is
not, the system may be I/O bound or swapping. Check
the read and write
cache hits and the available free memory.
The program goes on to check the process and file table
entries. Each
running process occupies a slot in the process table.
The size of
the process table is a tunable parameter. The program
compares the
number of entries to a threshold so as to warn the administrator
before
a process table overflow occurs. Similarly, each open
file occupies
a slot in the file table. The file table size is also
a tunable parameter.
The number of file table entries is also compared to
a threshold so
that an alarm can be sounded before a file table overflow
occurs.
When a system runs low on memory, it swaps processes
from memory to
disk. The Performance Monitor checks available memory
and triggers
an alarm on the basis of a threshold set to the tunable
parameter
GPGSLO, which determines when the system will start
swapping
to free memory. The program also checks the number of
pages per second
being swapped from memory to disk, as well as the number
of processes
switching per second and the number of address translation
page faults
(an address translation page fault occurs when the system
attempts
to access a valid page of memory which has been swapped
to disk).
Disk access is slow and can hurt performance, but can
be reduced when
transfers are completed from cache buffers. A tunable
parameter called
BDFLUSHR sets the frequency at which buffers are flushed
to
disk. The program checks successful cache hits on both
read and write
operations. Changing BDFLUSHR is not recommended, but
if your
cache hits are poor and there's enough available free
memory, you
may be able to improve performance by increasing the
disk buffers.
Terminal and network activity are the last areas the
program monitors.
A high number of received raw characters can indicate
a bad modem
or terminal. A high number of characters being output
may be caused
by reports printing on high-speed printers. The netstat
command
is used to check the health of the network. The number
of collisions
multiplied by 100 is then divided by the number of packets
transmitted
to provide a collision rate. A rate above 10 percent
is high and may
degrade the network performance. Possible causes include
faulty hardware,
misconfigured hardware, and poorly scheduled network
intensive tasks.
Tuning Parameters: How-to and Cautions
The tunable parameters for this machine are kept in
object files under
the directory /etc/master.d. After a parameter is changed
in
one of the object files, the mkboot command must be
executed
to make the object file bootable. Bootable objects on
this machine
are stored in the /boot directory. When making the KERNEL
bootable always use the -k option with the mkboot command.
Use the command touch /etc/system to force the machine
to generate
a new /unix. Since there is always a chance that your
system
may not boot after you've made changes, be sure to copy
your current
/unix to /Ounix before you make the changes.
Always consult your documentation before making changes
to bootable
parameters. An excellent third-party source on this
topic is the book
System Performance Tuning published by O'Reilly &
Associates.
Be aware that many tunable parameters have dependencies
on other tunable
parameters. For example, if you increase the size of
the process table,
you must also increase the number of active regions
because each process
will have three active regions -- text, data, and stack.
The number
of active regions should therefore be three times the
size of the
process table. Always obey the boundaries of the tunable
parameter.
For example, to increase the number of disk buffers
from 64 to 128
would be more desirable than to change from 64 to 135.
Good luck and
happy tuning.
About the Author
William Genosa is the Chief System Administrator for
a leading systems
intergrator. He may be reached at 186 Bryant Avenue,
Floral Park, NY 11001.
|