PICA Framework for Integrated Alarms (PIFIA)
As we have already said, the alarm manager is very basic. So,
the rest of this section is purely vaporware. The design floats
around the following concepts:
1. Flexibility (did we mention it was important to us?)
2. Simplicity
An alarm is a special type of object: it's seen simply as
one that we will normally want to execute. It needs some more attributes,
like priority, but, like the rest of the objects, it's installed
in the appropriate hosts, so we will always have the possibility
of executing any alarm by hand, even if the "central"
host is down or unreachable.
The actual call of each alarm is done via a special script called
"<alarmname>-picacaller". That way we can
use scripts not designed originally for PICA. The only thing you
need to define is the calling convention (one of the alarm's
extra attributes). In its definition, we can take advantage of the
many features of the preprocessor, such as variable substitution
or Perl on-the-fly code generation. It describes the parameters
the alarm will be called with. It will probably be a way to tell
the caller script to expect command-line arguments, but don't
count on it.
What We Have Now (WWHN)
Until we have a more elaborated alarm framework, we have developed
a very simple alarm management system. We have a "scheduler"
script that is run periodically from crond. This script runs
every alarm in a given priority, gets every alarm output, and generates
a report that is sent via email (or whatever you specify). Each
alarm has to manage its own variable persistency at this point.
For now, we have some basic alarm scripts to:
- Check if a given RPM package is installed (also check version).
- Check if critical processes are running, and restart them if
they are down.
- Check file and directory permissions (and correct them).
- Check usage of filesystems and send an alarm if it's over
a given threshold.
All these alarms send a notification if any of the conditions are
met. The alarms remember when they sent the last notification and
will send another only if a given "remind" period has passed.
Thus, we avoid flooding sys admins with notifications if a problem
persists. In the near future, all these notification and persistency
features will be integrated in the alarm framework, and will be transparent
to the alarm script.
Because the scheduler script only gets the output of the alarm
scripts, we can really run any program as an alarm. This is very
useful for sending reports of excellent external programs, such
as TripWire, PortSentry, intrusion detection systems, etc.
|