procmon: A Process Monitor
Anyone responsible for maintaining a UNIX system must
with ongoing processes, that is, those that are started
when the system
is brought up and that we never want to exit. Sometimes,
that just isn't possible. Processes die, and some of
the reasons for
this include maximum parameters being reached, programming
or a system resource not being available when needed.
For some processes, you can combat this situation by
using the /etc/inittab
file on System V systems. /etc/inittab contains a list
processes that are to be executed when the system enters
a run level,
and specifies what to do when the process exits.
The inittab file consists of four colon-separated fields:
A sample /etc/inittab file is shown in Figure 1.
inittab is a powerful mechanism, but it is found only
V variants of UNIX. In addition, /etc/inittab and the
command give no indication that the process has exited
and been restarted
unless the process continuously dies and init prints
on the console. The message is typically something like
is respawning too rapidly."
BSD-based operating systems, such as SunOS, BSD/OS from
BSD Inc, FreeBSD
and others, do not use /etc/inittab to control processes
The problem then becomes "How can I provide a system-independent
method of monitoring and restarting critical system
The answer is procmon.
procmon is a perl script that is started during system
startup, and runs for the life of the system. It has
to be a system daemon, and it behaves as such. The purpose
program is to monitor the operation of a set of defined
and if they are not present in the process list, to
It also logs its actions, noting when the process fails
and when it
is restarted. A sample log, which is generated through
the UNIX syslog
facility, is shown in Figure 2.
The syslog output shows procmon starting up and recording
what it is doing. The goal here is to capture as much
as possible about the process being monitored, in this
Automatically Starting procmon
The benefit of a program such as procmon can only be
when the program is started at system boot time. How
this is accomplished
depends upon the UNIX variant you are using. On System
the lines shown in Figure 3 are added to /etc/rc2, or
file in the /etc/rc2.d directory, which is the preferred
procmon is a daemon process. It handles all of the system
signals, and disconnects itself from a controlling terminal.
procmon starts, it prints a line indicating what configuration
parameters it is using, and then quietly moves to the
All logging at this point is accomplished through the
facility. The output printed when procmon starts is
in Figure 4.
The procmon Files
procmon uses two configuration files: procmon.cfg
and procmon.cmd. Of the two, only procmon.cmd is absolutely
necessary. If procmon.cfg exists, then it will be used
alter the base configuration of the program. I will
files in detail here.
The procmon Configuration File
The default configuration file is /etc/procmon.cfg.
file is not found when procmon starts, then it uses
parameters built into the program. This configuration
file is intended
to provide a mechanism for the system administrator
to change the
location of the procmon.cmd file and the delay between
the commands in the list.
If no /etc/procmon.cfg file is found, then the program
in the /usr/local/bin directory for procmon.cmd and
uses a delay of five minutes between checks. The sample
file shown in Figure 5 illustrates using a delay of
15 minutes (900
seconds), and a configuration directory of /etc. Notice
the delay value is in seconds, not minutes.
The /etc/procmon.cfg file is not processed by procmon;
if it exists, it is loaded into procmon by perl. This
that comments using the # symbol are supported, and
line of the file must contain the command 1; to signify
end of the the loaded file.
The reason for using this configuration file is that
of the program can be modified without having to affect
code. The delay_between variable is used to define the
of delay between processing the list of commands. For
the delay_between variable is 300, then there will be
of 300 seconds between processing.
The ConfigDir variable defines to procmon where
the procmon.cmd file is located. The program defaults
for it in /usr/local/bin. The sample in Figure 6 places
the file in /etc.
If you look at Listing 1, the procmon source code, you
which is how the configuration file is loaded into procmon.
The perl require command causes perl to read the named
file, procmon.cfg, into the current program. This is
feature of perl, and it allows developers the freedom
on the problem they are trying to solve, rather than
on the mundane
task of processing configuration files.
The procmon Command File
The procmon command file contains the list of processes
that are to be monitored. It contains two exclamation
fields: the pattern to search for in the process list,
and the name
of the command to execute if the pattern is not found.
A sample file
is shown in Figure 6.
In this file, procmon will be watching for named
and cron. If named is not in the process list, then
the command /etc/named is started. The same holds true
the cron command. Again, the purpose of using a configuration
file for this information is to allow the system administrator
configure the file on the fly. If the contents of the
then the procmon daemon must be restarted to read the
The syslog facility records several messages. These
messages are discussed
Startup messages are recorded by syslog when procmon
starts up. The appropriate information is substituted
for the values
in <value>. The <timestamp> is replaced
by the current time through
syslog. <PID> is the process identification number
procmon process, and <system_name> is the name
of the system,
as recorded by syslog.
<timestamp> <system_name> procmon[<PID>]: Process Monitor started
<timestamp> <system_name> procmon[<PID>]: Loaded config file
<value> <timestamp> <system_name> procmon[<PID>]: Command File:
<value> <timestamp> <system_name> procmon[<PID>]: Loop Delay=
<value> <timestamp> <system_name> procmon[<PID>]: Adding
<value> to stored process list <timestamp>
procmon[<PID>]: Monitoring : <value> processes
These messages are printed during the monitoring process;
the status of the monitored processes.
<timestamp> <system_name> procmon[<PID>]: <process> running as PID
This record is printed after every check, and indicates
that the monitored
process is running.
<timestamp> <system_name> procmon[<PID>]: <process> is NOT running
This record is printed when the monitored process cannot
in the process list.
<timestamp> <system_name> procmon[<PID>]: Last Failure of <process> @
This record is printed to record when the last (previous)
of the process was.
<timestamp> <system_name> procmon[<PID>]: issuing
<start_command> to system
This record is printed before the identified command
<timestamp> <system_name> procmon[<PID>]:<start_command>
This command is printed after the command has been issued
to the system.
Looking at the syslog may give you clues regarding the
of things after the command was issued.
Enhancements and Deficiencies
The procmon code in Listing 1 was written to run on
V systems. It has been in operation successfully since
1994. However, some enhancements would be useful. For
would be wise to report a critical message in syslog
command returns anything other than 0, since a non-zero
code generally indicates that the command did not start.
it would be better to include a BSD option to parse
output, and add an option in the configuration file
to choose System
V or BSD.
The procmon script helps to ensure that operation-critical
applications remain in operation. While a similar mechanism
through /etc/inittab and the init command, not all
systems support it. Moreover, it provides no logging
or history mechanism
to determine if there is a significant problem to be
About the Author
Chris Hare is the Operations Manager of fONOROLA i*internet,
a Canadian national Internet Access Provider. He is
of the books Inside UNIX and Internet Firewalls and
Security. Along with his full-time job and writing,
he is the president
of the Unilabs Research Group, and is presently working
on his third
book, The UNIX Professional Reference, for New Riders
Chris can be reached at firstname.lastname@example.org or email@example.com.