PICA Framework for Integrated Alarms (PIFIA)

As we have already said, the alarm manager is very basic. So, the rest of this section is purely vaporware. The design floats around the following concepts:

1. Flexibility (did we mention it was important to us?)

2. Simplicity

An alarm is a special type of object: it's seen simply as one that we will normally want to execute. It needs some more attributes, like priority, but, like the rest of the objects, it's installed in the appropriate hosts, so we will always have the possibility of executing any alarm by hand, even if the "central" host is down or unreachable.

The actual call of each alarm is done via a special script called "<alarmname>-picacaller". That way we can use scripts not designed originally for PICA. The only thing you need to define is the calling convention (one of the alarm's extra attributes). In its definition, we can take advantage of the many features of the preprocessor, such as variable substitution or Perl on-the-fly code generation. It describes the parameters the alarm will be called with. It will probably be a way to tell the caller script to expect command-line arguments, but don't count on it.

What We Have Now (WWHN)

Until we have a more elaborated alarm framework, we have developed a very simple alarm management system. We have a "scheduler" script that is run periodically from crond. This script runs every alarm in a given priority, gets every alarm output, and generates a report that is sent via email (or whatever you specify). Each alarm has to manage its own variable persistency at this point.

For now, we have some basic alarm scripts to:

Check if a given RPM package is installed (also check version).
Check if critical processes are running, and restart them if they are down.
Check file and directory permissions (and correct them).
Check usage of filesystems and send an alarm if it's over a given threshold.

All these alarms send a notification if any of the conditions are met. The alarms remember when they sent the last notification and will send another only if a given "remind" period has passed. Thus, we avoid flooding sys admins with notifications if a problem persists. In the near future, all these notification and persistency features will be integrated in the alarm framework, and will be transparent to the alarm script.

Because the scheduler script only gets the output of the alarm scripts, we can really run any program as an alarm. This is very useful for sending reports of excellent external programs, such as TripWire, PortSentry, intrusion detection systems, etc.