A System Load Monitoring Trilogy
If you've been following my articles in the past two
issues of Sys
Admin, you've probably noticed that one of my big concerns
administrator here at R&D Publications has been
to seek out new and
useful ways to smooth out the CPU system load on our
The overnight and background job spooling utilities
allow our users a great degree of direct control over
their use of
system resources. From time to time, the users must
such as whether to launch a long series of reports in
or to run them overnight, instead. Most of our users,
not technical enough to comfortably use the standard
utilities to get a handle on the system load. Without
a tool to translate
the load figures spewed by programs such as uptime into
English, those users would lack the information on which
to make job
To address this problem, and to assist me in gauging
the effects of
various efficiency-related system policies and tools,
I have developed
the set of shell scripts described in this article.
The first script,
load, provides a single number and English-language
of the current system load for nontechnical users. The
generates some useful instantaneous statistics for the
perusal, including the system load, the total number
of system jobs,
and the average number of jobs per user. The final script,
is a long-term system load tracking facility with automatic
averaging. All information processed by these scripts
generated using the standard UNIX utilities ps, who,
load: Characterizing the Current System Load
The system command uptime (actually a link to the w
command, equivalent to w -t) displays a line of system
containing the elapsed time since system boot-up, the
of users, and the system CPU load (as the number of
jobs in the run
queue) averaged out over the last 1, 5, and 15 minutes.
script (Listing 1) runs uptime and pipes the output
awk script to extract the first of the three average
and display a status report based on that value.
Line 11 extracts the load value based on the number
of tokens detected
in the uptime output text. The precise format of the
by uptime actually varies with the length of time the
has been up. Therefore, the awk script sets the val
variable to the value of the third-to-last token. Then,
strip the trailing comma.
The rest of the script simply displays some text based
on the value
of val. The text tells a user what impact a CPU-intensive
job is likely to have on system performance at the current
The user is then in a better position to weigh the potential
impact of his/her job against the criticality of that
job, and decide
whether or not to run the job in the background.
A sample output of the load script is shown in Figure
1. If your computer
system's horsepower differs significantly from ours
(a 486-33 ISA
machine), then you may want to alter the load values
the script's comparison lines to better reflect the
of your particular machine.
a: Displaying System and User Processes Statistics
One very powerful window into the system process table
is the ps
command. I wrote the a shell script (Listing 2) to analyze
data provided by ps and display a summary containing
statistics otherwise difficult to glean from the raw
When extracting data about user patterns and trends
from the system
process table, it is useful to first separate the "signal"
from the "noise." Therefore, a breaks the
list of all
system processes down into three categories: root processes,
processes, and user processes. Root processes (getty,
other demons, etc.) and printer processes (the master
intermittent printer request handlers) are not large
to the system load, and are therefore segregated from
applications when collecting user process data.
The a script recognizes one further dichotomy: shell
are distinguished from other kinds of user processes.
processes tend to be dormant while their subprocesses
This is certainly not always the case, so I've included
to summarize the user process statistics both with and
interpreter instances taken into consideration.
The output from a sample a run is shown in Figure 2.
is performed in lines 18-34. There is some tricky coding
so I'll annotate what I've done.
In line 18, the innermost in-line statement
ps -u root
generates a list of all processes owned by root.
This list is piped to
to produce a single number representing a count of the
number of lines in the ps output. Finally, this number
by 1 (using the expr command) to compensate for the
line produced by ps, and the result is assigned to the
environment variable. The next line repeats the same
count lp processes, and then the sum of the root and
lp process counts is assigned to the otherpros variable.
In line 22, a total system process count is computed
by running ps
-e, counting the output lines, and subtracting 3 (one
for the header
line, and two for the processes spawned by invocation
of the a
command itself). To get the number of user processes,
I subtract the
value of otherpros from totpros. The result is assigned
Lines 25-28 count up the number of user shell interpreters
active, and assign that value to shpros. Since root
processes have already been counted up in a class of
their own, any
shell interpreters owned by root are excluded from the
To calculate the total number of non-shell user processes,
of shpros is subtracted from userpros and the result
is assigned to nonshpos (line 29).
To calculate the processes-per-user averages, it is
to find out how many "distinct" users are
in to the system, since a single user may be logged
in on multiple
terminals or have several multiscreen sessions active
on a single
terminal. Line 30 calculates the number of distinct
users by listing
the user ID of all processes, sorting by the ID, eliminating
and counting the number of lines in the output. The
is assigned to nusers.
The final calculations in lines 31-34 produce the averages
decimal places, applying a standard multiplication and
useful with integer-only math. The integer and fractional
of the average values are calculated separately.
sysload.sh: Recording a Periodic System Load History
The two scripts described above provide instantaneous
but contain no provisions for maintaining a history.
The last script
for this month is a facility for recording long-term
history information into a set of log files. These files
may be inspected
periodically in order to seek out cyclical trends or
patterns of light
and heavy system usage.
sysload.sh (Listing 3) writes to three log files, given
symbolic names DAYLOG, LOADLOG, and AVGLOG. You
fill in the actual pathnames for these files in lines
26-28, and the
pathnames for the debugging versions in lines 30-32.
The DAYLOG file is used when the call to sysload.sh
has the form:
You decide how often to sample the system load, and
a cron table entry that schedules the above command
For example, on our system the script runs every fifteen
8 A.M. and 5:45 P.M. Monday through Friday. The cron
table entry appears as follows:
0,15,30,45 8-17 * * 1-5
where /usr/local is where the sysload.sh
script resides. Figure 3 shows the entire contents of
DAILY log file as I write this. Each one-line entry
the date, the time, and the system load. In Listing
3, these daily
runs are processed in lines 38-50.
After all sampling for the day is complete, sysload.sh
be run one more time with the argument final instead
Several things happen at that point:
1. The entire contents of DAYLOG are appended onto LOADLOG.
LOADLOG thus contains a cumulative record of all daily
samples ever taken.
2. The average load for the day (as per all entries
is computed, and a line containing this information
is appended onto
LOADLOG. The same line is also appended onto AVGLOG.
3. On Friday of each week, the five most recent daily
AVGLOG are themselves averaged, and a line containing
weekly average is appended onto AVGLOG.
4. The DAYLOG file is deleted, and the next weekday's
averages are thus written to a new DAYLOG file.
Our cron table entry for the end-of-day sysload.sh invocation
0 18 * * 1-5
The last daily run happens at 5:45, so the final run
for 6:00 P.M. Figure 4 shows the tail portion of the
of a representative AVGLOG file.
These utilities have provided several benefits to me
as a system administrator.
With the help of the load program, nontechnical users
confident enough to diagnose aberrant system slowdowns,
bring such events to my attention before I'm even aware
The a program, in conjunction with SCO's vmstat utility,
gives me a fairly good, quick map of system utilization
at any one
given moment, and sysload.sh allows me to report long-term
system load statistics to management in order to help
and software requirements for the company. I hope the
useful to you in your administration duties, as well.
I recently discovered a bug in one of the Onite system
in the Sys Admin Premiere issue. In isonite.sh (Listing
7, page 24), the script that tells whether a particular
job name exists
in the overnight queue, the line printed as:
[ -r $SPOOLDIR/$1 ] && exit 0
is bogus. The line should be corrected to read:
[ -f $SPOOL
DIR/P$priority/$1 ] && exit 0
About the Author
Leor Zolman wrote BDS C, the first C compiler targeted
for personal computers. He is currently a system administrator
software developer for R&D Publications, Inc., and
columnist for both
The C Users Journal and Windows/DOS Developer's Journal.
Leor's first book, Illustrated C, has just been published
R&D. He may be reached in care of R&D Publications,
Inc., or via net
E-mail as email@example.com ("...!uunet!rdpub!leor").