checkcron: Checking for the Unexpected
Steven G. Isaacson
Most system administrators use daemons like cron and
sendmail
to help keep their systems running smoothly. Daemons
run unobtrusively
in the background, starting up as needed to do their
work and then
going back to sleep until more work needs to be done.
Running in the background is good because for the most
part you don't
need to see what is happening. But running in the background
also
means that nothing obvious happens when things go wrong.
What Could Go Wrong?
On our main development system we use cron to bundle
up source
code and then transfer it to various machines on the
network. The
files are bundled, transferred, then unpacked on the
target system.
One night errors were reported on the target system.
The next day
changes were made to the source code to correct the
problem, but that
night the same errors appeared. This went on for several
days, until
someone discovered that the new code had not been transferred
to the
target system. The new code had not been transferred
because cron
had failed.
This particular problem could be addressed by having
the target system
move or remove the file when it was done with it, which
would cause
a "missing file" error to be generated the
next night. But
that only addresses one part of a complex system.
What Else Could Go Wrong?
Recently NIS failed on our "communication box,"
a computer
dedicated to handling all of our incoming mail, that
is, mail from
outside of the company. Without NIS the alias file was
useless and
two days' worth of mail bounced back.
What's needed is a general solution, a way to check
on background
processes that doesn't itself rely upon background processes.
A General Solution
First, how do you tell if a background process like
cron is
still running? Type ps -fu root, pipe the results to
grep,
and look for cron (on some systems you cannot specify
the user
and so must look through all processes).
ps -fu root | grep cron
That's easy enough to make into a shell script, and
you
could echo a warning if grep exits with a bad exit status,
indicating that /etc/cron was not found. The script
could check
for sendmail, NIS, ypbind -- any background processes
you want to keep tabs on.
But there are two problems.
Two Problems
The first problem is a technical one. You need to make
sure that you
find what you're looking for ... and not what you're
looking for.
Let me explain.
When you type ps and grep for "cron," a new
process, with the word "cron" on its command
line, is started.
Sometimes that process shows up and sometimes it doesn't,
depending
upon the load on the system. So if cron was found in
the ps
output, was it /etc/cron, "grep cron," or
both?
So why not just look for /etc/cron?
Checking for /etc/cron doesn't work because as soon
as you
grep for /etc/cron, /etc/cron shows up as an
argument on the grep command line.
Listing 1 illustrates this problem with two examples.
The first example
usually works, the second one never works. With the
addition of a
filter, you can make it always work.
ps -fu root | sed '/grep/d' | grep cron
The command sequence (ps | sed | grep) looks as
if it won't work because the grep-delete occurs before
the
call to grep.
But it does work. It works because it is only after
the shell has
parsed the command line that the three processes are
started (almost
simultaneously). Before you can attach pipes, there
must be programs
to attach them to.
So, if the "grep cron" line appears in the
ps output,
the sed command deletes it. If "grep cron"
doesn't
appear, it's not deleted. Either way, you get the information
you
need.
The Real Problem
The second problem is the real problem.
How do you automatically check background processes
to see if they
are still running? That is, how can you make it so the
checkcron
script is run every so often without your having to
remember to do
it? (Don't say cron!)
I got around this automated process problem by using
something manual,
my .profile. I simply added a call to checkcron. Now
whenever I log in I know within a few seconds if there
is a problem.
Installation
checkcron is in Listing 2.
Installation is trivial. Customize the program for your
system (by
editing the line with the list of daemons), and then
add one line
to $HOME/.profile:
checkcron &
Every time you log in, you'll see a background pid
number echoed to your screen and then whatever you normally
see when
you log in.
For each daemon that cannot be found, checkcron echos
an error
message to your screen. If all daemons are accounted
for, it does
nothing. Simple.
checkcron may also be run from the command line if you
have
been logged in for a while and simply want to double-check
your daemons.
About the Author
Steven G. Isaacson has been writing C and Informix
4GL applications
since 1985. He is currently developing automated testing
tools for
FourGen Software, the leading developer of accounting
software and
CASE Tools for the UNIX market. He may be reached via
email at
uunet!4gen!steve1 or steve1%4gen@uunet.uu.net.
|