Article

The System Performance Monitor

William Genosa

Cars equipped with gauges often provide some indication of problems before they become serious. Cars equipped with idiot lights often alert the driver after the problem has already become severe. The System Performance Monitor is designed as a set of gauges for a computer. The program provides for visual indication of system activity as well as a means of alerting the system administrator before potential problems cause poor performance and unscheduled down time.

The program was written on a 3B2 1000-80 running AT&T System V Rel 3.2.2 and used for database applications. As the number of concurrent users steadily increased, the performance of the machine degraded. The purpose of the program is to notify the system administrator when system tunable parameters may need to be adjusted.

Program Overview

The program is designed around the sar utility supplied with the operating system. sar collects performance statistics at time intervals determined by a cron entry for sys. The System Performance Monitor can also be started by a cron entry and should be adjusted to run at the same frequency as sar. The program redirects output to /dev/tty22 where the information is displayed on a terminal. Warnings are sent to the system console when vital thresholds are exceeded. The thresholds are determined by examination of the current system tunable parameters. AT&T provides the command /etc/sysdef to display the current settings.

How It Works

The program starts by finding the number of users currently logged into the system. It counts TCP/IP rlogin and telnet sessions by looking for pseudo tty's, which are designated by a letter following the tty description. This example is configured for 64 pseudo tty's, the maximum allowed on this machine. The program next examines the runqueu to determine the number of processes in memory waiting execution (jobs scheduled for I/O are not part of this figure). If the runqueue is high (above five), expect the runqueue to be occupied for a high percentage of time. If it is not, the system may be I/O bound or swapping. Check the read and write cache hits and the available free memory.

The program goes on to check the process and file table entries. Each running process occupies a slot in the process table. The size of the process table is a tunable parameter. The program compares the number of entries to a threshold so as to warn the administrator before a process table overflow occurs. Similarly, each open file occupies a slot in the file table. The file table size is also a tunable parameter. The number of file table entries is also compared to a threshold so that an alarm can be sounded before a file table overflow occurs.

When a system runs low on memory, it swaps processes from memory to disk. The Performance Monitor checks available memory and triggers an alarm on the basis of a threshold set to the tunable parameter GPGSLO, which determines when the system will start swapping to free memory. The program also checks the number of pages per second being swapped from memory to disk, as well as the number of processes switching per second and the number of address translation page faults (an address translation page fault occurs when the system attempts to access a valid page of memory which has been swapped to disk).

Disk access is slow and can hurt performance, but can be reduced when transfers are completed from cache buffers. A tunable parameter called BDFLUSHR sets the frequency at which buffers are flushed to disk. The program checks successful cache hits on both read and write operations. Changing BDFLUSHR is not recommended, but if your cache hits are poor and there's enough available free memory, you may be able to improve performance by increasing the disk buffers.

Terminal and network activity are the last areas the program monitors. A high number of received raw characters can indicate a bad modem or terminal. A high number of characters being output may be caused by reports printing on high-speed printers. The netstat command is used to check the health of the network. The number of collisions multiplied by 100 is then divided by the number of packets transmitted to provide a collision rate. A rate above 10 percent is high and may degrade the network performance. Possible causes include faulty hardware, misconfigured hardware, and poorly scheduled network intensive tasks.

Tuning Parameters: How-to and Cautions

The tunable parameters for this machine are kept in object files under the directory /etc/master.d. After a parameter is changed in one of the object files, the mkboot command must be executed to make the object file bootable. Bootable objects on this machine are stored in the /boot directory. When making the KERNEL bootable always use the -k option with the mkboot command. Use the command touch /etc/system to force the machine to generate a new /unix. Since there is always a chance that your system may not boot after you've made changes, be sure to copy your current /unix to /Ounix before you make the changes.

Always consult your documentation before making changes to bootable parameters. An excellent third-party source on this topic is the book System Performance Tuning published by O'Reilly & Associates. Be aware that many tunable parameters have dependencies on other tunable parameters. For example, if you increase the size of the process table, you must also increase the number of active regions because each process will have three active regions -- text, data, and stack. The number of active regions should therefore be three times the size of the process table. Always obey the boundaries of the tunable parameter. For example, to increase the number of disk buffers from 64 to 128 would be more desirable than to change from 64 to 135. Good luck and happy tuning.

About the Author

William Genosa is the Chief System Administrator for a leading systems intergrator. He may be reached at 186 Bryant Avenue, Floral Park, NY 11001.