BBWARN: A Notification Extension for Big Brother
Like any UNIX system administrator, I would rather work in a proactive mode than a reactive one. One tool that has enabled me to achieve this goal is BB, the "Big Brother Systems and Network Monitor" written by Sean MacGuire. BB is available at http://www.maclawran.ca/. This Web-based UNIX system and network monitor runs on most UNIX platforms and has an NT client (see sidebar). An article describing BB appeared in the March 1997 issue of Sys Admin.
Although BB's built-in notification capabilities include email, numeric pager and SMS, I found the methods available too narrow and restrictive for my purposes. All alerts are sent to the same admin(s) on a 24x7 basis. Problems with a particular service can't be sent to one particular admin; any alert will be sent to all admins defined in the configuration file. I viewed this as a major drawback and decided to modify BB to make it more flexible. Fortunately, BB is uniquely designed to accommodate various alterations and adaptations, so I set to work. When properly modified, specific alerts can be sent out to admins according to various criteria defined in a configuration file. These will allow for notifications to be sent based on combinations of hosts, services, dates, and times of day.
Although the BBWARN extension is based on version 1.06a of BB, the following modifications should work with most previous versions. I recommend that you first install BB and configure the basic notification feature properly.
First, make a backup of your current installation of BB on the host defined by the BBPAGER keyword in the bb-hosts file. Next, as root (or the userid you use for BB), retrieve the bbwarn.tgz file from the Sys Admin ftp archives (ftp.mfi.com in /pub/sysadmin/1997/mar97.tar.z). On the BBPAGER host, unzip and untar the archive in a working directory. Finally, copy the files found in the tar archive to your current BB installation on BBPAGER host:
bb.c -> $BBHOME/src
bbwarn.cfg -> $BBHOME/etc
bbwarn.sh -> $BBHOME/bin
bb-page.sh -> $BBHOME/bin
sendmsg -> $BBHOME/bin
Make sure the scripts are executables and set the permissions and ownerships according to your local installation.
BB is based on a client/server design. It sends out a status or a notification message to a server process. The client program that performs that task is bb. The server process (called bbd) is usually installed on a single host. It receives both types of messages, but can also be installed on two hosts; one to receive the status message (the BBDISPLAY host) and the other to receive the notification message (the BBPAGER host). Status and notification messages are explicitly transmitted within the BB scripts themselves. This means that when a condition red (alert) occurs, two messages must be coded in the scripts: a status message sent to the BBDISPLAY host and a notification message sent to the BBPAGER host. I have modified the bb program such that only one script call need be executed. The bb program verifies the condition level (green, yellow, red) of the status message sent to the BBDISPLAY host, and, if the message is a condition red, it also sends a notification message to the BBPAGER host. Because all client hosts use this bb program, you must recompile bb.c for each hardware/OS platform supported with BB, then copy the platform specific bb executable to the $BBHOME/bin of each client host in your BB installation.
After installing the new files, some changes to the current BB installation must be made to the BBPAGER host:
1. $BBHOME/etc/bbsys.sh - Add these lines at the end:
EGREP="/usr/bin/egrep" # change to correct paths for
AWK="/bin/awk" # your environment
export EGREP AWK TRUE TR
2. $BBHOME/etc/bbinc.sh - Add these lines at the end:
SVCERRLIST="conn:0500 cpu:0200 disk:0100 dns:0800 ftp:0900
http:0600 msgs:0400 oracle:1000 procs:0300 smtp:1100" export
Again, change the values to reflect your current installation. In this case, I have added dns/oracle/smtptrap checking. The service:id pair designates the code that appears along with the IP address of the failed host in a numeric pager message. The service token is the file extension found in the name of the status file in the $BBHOME/www/logs directory.
3. $BBHOME/etc/bbdef.sh - Add these lines at the end:
SUFFIX=",ATH0" # Special modem handling, modify
# according to your setup
BBWARN="TRUE" # set to "" to disable
export BBMASTER SUFFIX BBWARN
Also the TTYLINE variable can be defined with multiple modem devices. If you have more than one modem line, they can all be defined in the TTYLINE variable, and the first modem device available will be used. If more than one admin is to be notified, BBWARN will make paging calls simultaneously on different modem lines. If no modems are available within the allowable waiting period, an email message will be sent to the recipient(s) in BBMASTER.
4. $BBHOME/src/bbd.c - Replace the line:
sprintf(pager, "%s/bin/bb-page.sh", bbhome);
sprintf(pager, "%s/bin/bbwarn.sh", bbhome);
Recompile bbd.c and copy the executable to $BBHOME/bin. This change allows the bbwarn.sh script to be executed when a notification message is received by the bbd program. The bbwarn.sh determines whether a notification message is necessary, and, if so, it calls the bb-page.sh script.
Because BBWARN does not require explicit calls made to the BBPAGER host when sending a notification, the code in the scripts that made those calls must be removed. Actually, it will work if they are not removed, but I recommend cleaning them up.
Remove all lines that make this call:
$BB $BBPAGE ...
Also, remove the "IF" statement blocks if $BB $BBPAGE calls are the only lines executed within them. Also, don't forget to remove those blocks in any scripts that you created.
At the heart of BBWARN is the configuration file found in $BBHOME/etc/bbwarn.cfg. An example is shown in Listing 1. This file contains entries made up of hosts/services/time/recipients combinations that constitute the BBWARN matrix. If a notification entry (defined by the name of the file created in $BBHOME/www/logs: host1.cpu or host1,domain.cpu) matches the configuration line definition, recipients are sent notification. The format of an entry in the configuration file is:
hosts: Which hosts to include
exhosts: Which hosts to exclude
services: Which services to include
exservices: Which services to exclude
day: On these days
time: At this time
recipients: Who/what to notify
The value * , which means all, can be used as a wildcard for the hosts/services/day/time parameters. It can also be used for exhosts and exservices but would be rather useless; it would mean discard all hosts or services! The hosts/exhosts/services/exservices parameters can also be specified with regular expressions.
The day parameter can be specified as a single day or a range of days. The day parameter value ranges from 0 (Sunday) to 6 (Saturday) according to the date command %w directive. The time parameter can be set at a specific time but is more commonly used within a time range. The recipients parameter is the email address, numeric pager, or SMS number.
Here's an example:
This configuration line will match any host for any service on any day of the week at any time of the day and will email a notification to "robert@localhost". The same line can be rewritten as:
The * day parameter is replaced with the range "0-6" (Sunday->Saturday) and the time parameter uses the "0000-2359" range instead of the * value.
Here is the same example with some exclude specifications:
*,host1 host2 ny.*,*,smtp,0-6,0000-2359,robert@localhost
This line matches any host except for host1, host2 and host names matching ny* (as in nycity, nylake,...) for any service except SMTP on any day of the week at any time of the day and will email a notification to "robert@localhost".
The following example includes further modifications:
*,host1 host2 ny.*,*,smtp,0-2 6,0700-1200 1300-1800,999-9999
This line matches on any host except for host1, host2 and hostnames matching ny*, for any service except SMTP on Sunday, Monday, Tuesday, Saturday between 7 AM to 12 PM and 1 and 6 PM and sends a notification to the numeric pager at 999-9999.
Still more specific date/time modifications:
*,host1 host2 ny.*,*,smtp,0-2,0700-1200,robert@localhost
*,host1 host2 ny.*,*,smtp,6,1300-1800,robert@localhost
Because you may want to be paged at different times depending upon the day of the week, two lines are needed to accomplish this task. The first line matches any host, excepting host1, host2 and hostnames matching ny*, for any service except SMTP on Sunday, Monday, Tuesday between 7 AM to 12 PM and send notification to "robert@localhost". The second line sends email notification on Saturday between 1 and 6 PM.
A special entry exists in the configuration file that enables a BB administrator to ignore alert conditions from certain hosts or certain services enumerated on that line. The line beginning with the ignforall: token is used to discard all hosts and services that match the regular expression. The token line can be specified anywhere is the configuration file.
Consider the following:
This line means that no attempt to page will be made if the host.service combination matches *cpu, *msgs, host99* or host1.disk. Furthermore, any error conditions coming from host99 as well as all alert conditions created by the cpu and msgs services will likewise be ignored. The disk service from host1 will also be discarded. This is especially handy if you want to exclude some hosts/services temporarily in the event of a maintenance assignment. Simply remove the entries when the assignment is over to re-enable notification for those hosts/services.
To disable all notifications, simply set the configuration line to:
Since the .* matches everything, all conditions are ignored. Another option is to set the BBWARN variable in $BBHOME/etc/bbdef.sh to "", but you would have to stop and then restart BB. Set the BBWARN variable to "" to disable the notification feature and save some system resources, because only the first few lines of the bbwarn.sh script will be executed.
Tips and Warnings
If you decide to use a BBPAGER host other than BBDISPLAY, make sure you have an identical copy of $BBHOME/etc/bb-hosts installed on each host. If the bb-hosts files don't match, notification messages run the risk of containing errors. Worse yet, there is a good chance that no message will be sent at all!
The environment variable PAGERDELAY, defined in $BBHOME/etc/bbdef.sh, controls the number of minutes between notification. This variable exists in the original notification feature of BB and is still used for the same purpose. However, this delay is now applied individually on each notification destination. Consider the following configuration file:
host1 host2,,disk,,0-2,0600-2000,disk@localhost 999-9999
host1 host2,,smtp,,0-2,0700-1800,smtp@localhost 999-9999
Assume the PAGERDELAY is set to 15 minutes. A notification from host1 concerning disk service is received at 10h00; disk@locahost receives an email, and a numeric page is sent to 999-9999. At 10h05, a notification is received for host2 pertaining to the SMTP service; smtp@localhost receives an email, but the 999-9999 numeric pager isn't called because it was notified 5 minutes ago. This feature is particularly dear to my heart and serves as a memorial to the untimely death of a much-loved pager following a bout of repeated notifications. Remember, if you wish to receive all notifications, set PAGERDELAY to 0.
Extending Big Brother
With BB it is fairly easy to add local or remote checking because of access to the source code. Simply add your own scripts to the runbb.sh script, and they will be executed along with the standard scripts. If you intend to add your own scripts, make sure the service extension of the new log file is unique (.sybase, .oracle, etc.) and add each one to the SVCERRLIST variable along with a code in order to be notified of an error condition. When adding new environment variables in the BB configuration files, don't forget to export those variables.
Try to make your code as portable as possible using the most basic commands. It is best to use a basic script as a template and build from that. Don't forget about fully qualified domain names (FQDN) in your scripts, especially if you intend to make them available on the 'Net. Although you may not need FQDN, you should be aware that many other sites do use BB with FQDN.
Should you encounter any unresolvable problems, set the first line of the faulty script(s) to #!/bin/sh -x and restart BB using the nohup command. All output from the script(s) will be appended to nohup.out and will give you all the information necessary to fix the problem. When dealing with multiple OS platforms make sure to test each one, because some shell commands are similar but not identical. If you decide to write a C program instead of a shell script, each of your hardware/OS platforms must have a C compiler, or your program will not run on all hosts. Portability should be factored into all your scripting decisions. When a new hardware/OS platform is installed at your site, and your scripts/programs can only support your current platform, you will rue the day that you didn't heed this hard-earned advice.
If it wasn't for BBWARN, I would still suffer from Sys Admin Catch 22. I could submit to being frightened into wakefulness in the middle of the night for a problem that could easily have waited until morning. Or, I could be tempted to disable existing warning systems, and thereby incur the wrath of the technology gods in a dazzling, early morning display of severe, righteous, and semi-public punishment. Nobody should be forced to endure either scenario. Using BBWARN, I've managed to keep my sanity (and that of the other administrator that works with me) by creating a notification schedule that allows me to determine how and when I'll be notified about critical problems.
About the Author
Robert-Andre is an independent UNIX consultant working in Montreal, Canada. When not working as a system administrator or C programmer, he enjoys spending time with his girlfriend and riding his Marinoni bicycle or his Ducati motorcycle. He can be reached at email@example.com