Cover V02, I06
Article
Listing 1
Listing 2

nov93.tar


checkcron: Checking for the Unexpected

Steven G. Isaacson

Most system administrators use daemons like cron and sendmail to help keep their systems running smoothly. Daemons run unobtrusively in the background, starting up as needed to do their work and then going back to sleep until more work needs to be done.

Running in the background is good because for the most part you don't need to see what is happening. But running in the background also means that nothing obvious happens when things go wrong.

What Could Go Wrong?

On our main development system we use cron to bundle up source code and then transfer it to various machines on the network. The files are bundled, transferred, then unpacked on the target system.

One night errors were reported on the target system. The next day changes were made to the source code to correct the problem, but that night the same errors appeared. This went on for several days, until someone discovered that the new code had not been transferred to the target system. The new code had not been transferred because cron had failed.

This particular problem could be addressed by having the target system move or remove the file when it was done with it, which would cause a "missing file" error to be generated the next night. But that only addresses one part of a complex system.

What Else Could Go Wrong?

Recently NIS failed on our "communication box," a computer dedicated to handling all of our incoming mail, that is, mail from outside of the company. Without NIS the alias file was useless and two days' worth of mail bounced back.

What's needed is a general solution, a way to check on background processes that doesn't itself rely upon background processes.

A General Solution

First, how do you tell if a background process like cron is still running? Type ps -fu root, pipe the results to grep, and look for cron (on some systems you cannot specify the user and so must look through all processes).

ps -fu root | grep cron

That's easy enough to make into a shell script, and you could echo a warning if grep exits with a bad exit status, indicating that /etc/cron was not found. The script could check for sendmail, NIS, ypbind -- any background processes you want to keep tabs on.

But there are two problems.

Two Problems

The first problem is a technical one. You need to make sure that you find what you're looking for ... and not what you're looking for. Let me explain.

When you type ps and grep for "cron," a new process, with the word "cron" on its command line, is started. Sometimes that process shows up and sometimes it doesn't, depending upon the load on the system. So if cron was found in the ps output, was it /etc/cron, "grep cron," or both?

So why not just look for /etc/cron?

Checking for /etc/cron doesn't work because as soon as you grep for /etc/cron, /etc/cron shows up as an argument on the grep command line.

Listing 1 illustrates this problem with two examples. The first example usually works, the second one never works. With the addition of a filter, you can make it always work.

ps -fu root | sed '/grep/d' | grep cron

The command sequence (ps | sed | grep) looks as if it won't work because the grep-delete occurs before the call to grep.

But it does work. It works because it is only after the shell has parsed the command line that the three processes are started (almost simultaneously). Before you can attach pipes, there must be programs to attach them to.

So, if the "grep cron" line appears in the ps output, the sed command deletes it. If "grep cron" doesn't appear, it's not deleted. Either way, you get the information you need.

The Real Problem

The second problem is the real problem.

How do you automatically check background processes to see if they are still running? That is, how can you make it so the checkcron script is run every so often without your having to remember to do it? (Don't say cron!)

I got around this automated process problem by using something manual, my .profile. I simply added a call to checkcron. Now whenever I log in I know within a few seconds if there is a problem.

Installation

checkcron is in Listing 2.

Installation is trivial. Customize the program for your system (by editing the line with the list of daemons), and then add one line to $HOME/.profile:

checkcron &

Every time you log in, you'll see a background pid number echoed to your screen and then whatever you normally see when you log in.

For each daemon that cannot be found, checkcron echos an error message to your screen. If all daemons are accounted for, it does nothing. Simple.

checkcron may also be run from the command line if you have been logged in for a while and simply want to double-check your daemons.

About the Author

Steven G. Isaacson has been writing C and Informix 4GL applications since 1985. He is currently developing automated testing tools for FourGen Software, the leading developer of accounting software and CASE Tools for the UNIX market. He may be reached via email at uunet!4gen!steve1 or steve1%4gen@uunet.uu.net.