AIX 3.2* logs all warnings and errors, both hardware-
and software-related,
in the file /var/adm/ras/errlog. Entries are always
appended
and are never deleted. All entries contain binary data.
The system
administrator uses the command errpt to create a readable
output of this error log. No tools exist to automatically
highlight
severe errors or to notify the system administrator
of new entries
in the error log file. As a result, it's often the case
that the error
log file gets completely ignored. This is unfortunate
because the
error log file contains vital information, such as disk
drive failures
and adapter problems, which can't be ignored.
I present here two tools that I have developed to automate
error reporting.
They can be invoked on the command line or by cron.
Their
purpose is to notify the system administrator of new
entries in the
error log file. The tools can be used for daily and
monthly error
reporting and can mail the output to user root. Both
are shell scripts
which can be executed by either the Bourne or Korn shell.
The following commands are supplied by the AIX operating
system and
are used in the scripts below.
errclear <days> -- Removes all entries in
the error log file which are older than the number of
days supplied
as the parameter. This command must be used to reduce
the size of
the error log file.
Daily Error Reporting
The shell script /var/adm/ras/daily (in Listing 1) generates
the daily error report. It is usually invoked by cron
but
can also be invoked from the command line.
This program mails all selected errors which occurred
today (flag
-t), or since the last boot (default), to user root.
The last
boot time is found in /etc/utmp and can be displayed
with
the command who -b. The program uses the errpt command
to extract entries from the system's error log.
Even if no errors occurred, daily mails its output to
root
to indicate that the error report program was executed.
In this case
the system administrator receives mail with an empty
message. A sample
output of errpt is shown in Figure 1.
As the figure shows, each line is a separate entry.
The first column
is a unique error identification. The second column
specifies the
time the entry was made, in the format mmddhhmmyy (month,
day,
hour, minute, and year). The third column describes
the error type,
which can be T (temporary), P (permanent), or U
(unknown). The fourth column describes the class of
the error, which
can be H (hardware), S (software), or O (entries
made by the super-user using the errlogger command).
The fifth
column is the resource name of the software product
which made the
entry, and the last column is a verbose explanation
of the error.
A powerful report generator, errpt can select entries
in the
error log file by time, error identifier, type of error
(hardware,
software), class of error (permanent, temporary), device
type for
hardware errors, and product names for software errors.
If a spare terminal is available, errpt can be invoked
in
concurrent mode. All error log entries are reported
as soon as they
have been added to the error log file.
Monthly Error Reports
Daily reports are for system administrator notification.
If error
logs are to be kept for a longer period of time, they
must be stored
elsewhere. This is mandatory: if the system's error
log file is not
reduced with the errclear command, the error log will
eat
up all available disk space in that file system. The
script in Listing 2
groups error reports by month and stores them in a
compressed file.
It also cleans the system's error log file by deleting
all entries
from the previous month. cron invokes the script the
first
five days of each month.
This script keeps monthly error reports for exactly
one year. The
reports are stored in directory /var/adm/errors, which
has
to be created by the system administrator.
cron Entries
To automate the reporting, two entries are added to
the crontab
file for user root. The first entry generates daily
error reports.
The second entry invokes the monthly error reporting,
which also removes
last month's error log entries. This script is invoked
the first five
days each month, in case the system is shut down during
the weekend
or bank holidays. The two crontab entries are as follows:
59 23 * * * /usr/bin/ksh /var/adm/errors/daily -t
59 23 1-5 * * /usr/bin/ksh /var/adm/errors/monthly
Now, when the system administrator logs in each day,
he or she will find a mail message containing a list
of the most recently
reported errors.
Bibliography
IBM. IBM AIX Commands Reference: Vol 2. GC23-2366-02.
About the Author
Thomas Richter has studied mathematics and Computer
Science
at the University of Ulm, Germany. He has worked on
various UNIX platforms
as a software developer and C/C++ as main programming
languages. His
projects include compiler construction, device drivers,
and network
programming. He has also administered various UNIX machines
for the
last 8 hears. He has worked for IBM UK since January
1993.