System administrators are responsible for ensuring the
of their machines to users. When servers crash or backups
users are inconvenienced, which will most likely result
for the system administrator. If you are a sys admin
for more than
one machine and your machines are networked together
then you may want to set up a Sentinel to watch over
and inform you of a system crash or backup failure.
A modem and a
beeper would also be required.
My network consists of eight 3B2 hosts running AT&T
System V Rel 3.2.2
networked together with TCP/IP over ethernet. The Sentinel
and the two servers I am concerned with are cpu5 and
My servers reboot each weeknight and run unattended
level zero backups
to back up the entire system. A failure on a server
would result in
downtime the next morning that would cost my company
dollars. If a server crashes at night, the Sentinel
alerts me so that
I can take corrective action before my users arrive
in the morning.
This makes my boss very happy.
The program makes use of the ruptime command, which
the status of hosts on a local area network. Output
of the command
uses the following format:
cpu1 up 9:40, 5 users, load 0.04, 0.08, 0.05
cpu2 up 2+04:25, 8 users, load 0.08, 1.23, 0.40
cpu3 up 8:35, 5 users, load 1.00, 1.01, 1.06
cpu4 up 8:43, 6 users, load 1.04, 1.00, 0.95
cpu5 up 8:47, 75 users, load 4.60, 3.90, 4.65
cpu6 up 5+07:50, 0 users, load 0.00, 0.00, 0.00
cpu7 down 0:20
cpu8 up 8:45, 65 users, load 4.70, 4.20, 3.85
The first field represents the name of the host. The
second field displays the status of the host. The third
how long the host has been up and running in days, hours,
The fourth and fifth fields tell how many users are
in. The last four fields display the system load or
of processes over the last one, five, and fifteen minutes.
The program also makes use of uucp. My backup scripts
been modified to test for the exit status of cpio because
use cpio to create backups. If the backup is successful
the server, I send a file to cpu4, the Sentinel, using
cpio -Obv -O/dev/RSA/qtape2 < /tmp/backup.list
if [ "$?" -ne 0 ]
echo The backup failed on `hostname` at `date`. | mail root
echo The backup was successful on `hostname` at `date`. | mail root
uuto /tmp/backup.ok cpu4!bill
The routine tests for a successful backup by testing
the exit status of cpio. The if conditional checks $?,
the exit status. If the exit status is zero, I create
file called backup.ok and send it to cpu4 using the
uuto utility. If sent from the server cpu5, this file
would be sent to cpu4 and be placed in the directory
If the Sentinel doesn't find the file
use the cu utility to page me on my beeper.
The sample entries here have been appended to the uucp
file, /usr/lib/uucp/Systems. Similar entries would be
on your Sentinel host. The first field of the Systems
usually represents a hostname. I have created bogus
hostnames to allow
the Sentinel host the ability to dial a beeper number
a code which briefly describes the problem.
The entries that follow are called with cu; they are
notify the system administrator by sending a code on
his or her beeper.
### The server called cpu5 has crashed.
cpu5down Any ACU 9600 93631448,,,,,5551111
### The server called cpu8 has crashed.
cpu8down Any ACU 9600 93631448,,,,,8881111
### The backup has failed on cpu5.
badback5 Any ACU 9600 93631448,,,,,5552222
### The backup has failed on cpu8.
badback8 Any ACU 9600 93631448,,,,,8882222
The next entry is a sample entry for those of you who
have a SkyPager Beeper. Notice the use of the pound
key and the trailing
Any ACU 9600 918007597243,,,,,6182093#,,,,5551111#,
The Sentinel program should run constantly to keep watch
over the critical hosts. For this reason I start the
an rc script which runs at boot time. (See the sidebar
a brief explanation of how rc scripts work.)
The rc script I use (see Listing 1) is called sentinel
and is located in the directory /etc/init.d. It is then
linked to /etc/rc2.d/S99sentinel. No action will be
upon shutdown. If there were a need to take action,
the same file
would also be linked to /etc/rc0.d/K99sentinel.
The Sentinel program (Listing 2) can be modified to
critical events on your network. This is an example
system administration. Systems will crash and backups
will fail but
you can still attempt to minimize the effect this will
have on you
About the Author
William Genosa is the Chief System Administrator for
a leading systems
intergrator. He may be reached at 186 Bryant Avenue,
Floral Park, NY 11001.