Zombie Processes
Sydney S. Weinstein, CDP, CCP
UNIX is full of colorful terminology (such as killing
children --
oops, I mean child processes), and zombies are a particularly
dramatic
example. Before I explain what a Zombie process is,
however, a short
tutorial on reading the process status listing, which
describes UNIX's
process table, is in order.
The UNIX Command Process Status
Each process running under UNIX has its own entry in
the system's
Process Table. This entry describes the process, its
current status,
its accounting information, and its resources. It is
used by the scheduler
to assign CPU resources to the process, and by the system
to account
for which processes are using various other system resources.
Note
that I have been saying processes instead of programs:
since UNIX
has a fork system call, a single program can be made
up of
many processes. Each process has its own entry in the
process table.
There are two major flavors of UNIX, and therefore two
major flavors
of the process table. Each flavor of process table also
has its own
flavor of ps command. Those UNIX systems based on the
USL System
V model support a ps command that selects by terminal,
process,
or user and has output similar to that shown in Figure
1. The other
ps command format, for systems descended from the Berkeley
Software Distribution (BSD) UNIX, has no direct selection
option except
for current user or all processes. With either command,
the process
table is printed out in order of the entries in the
systems process
table, which at times can be a pretty useless order.
Neither ps command provides much detail in its default
output
(ps -ef for USL, ps -ax for BSD). Both provide the reference
number or Process ID (PID); the terminal associated
with the command,
if any, or a "?" if the process is not attached
to a terminal;
the accumulated CPU time; and the command. In addition,
USL versions
add the owner of the process, its parent's process ID,
the time it
started executing (was created), and something called
C, which is
a flag that specifies how CPU-intensive a process is,
for use by the
scheduler. The higher the number (roughly a percentage
of CPU utilization
over the last second), the more CPU-intensive a process
is. The BSD
version provides a status flag column, which lists what
state the
process is in.
Zombies
Figure 1 and Figure 2 both include a
process with the command
field showing
<defunct>. This is the dreaded zombie process.
The BSD ps
even shows the state as Z (for Zombie). A process becomes
a
zombie (or defunct) when it has actually exited, but
its parent has
not yet picked up the status of that exit. To revert
to traditional
terms: a process is created with a fork and reaped with
a join. The
UNIX fork system call matches the fork concept. The
join is
performed with the wait system call. So a zombie process
has
exited, but its parent has yet to issue the wait system
call.
The wait system call returns several parameters to the
parent
process's information area. These include the exit status
and the
amount of system and user CPU time used by this child
process. These
three values need a place to be remembered until the
parent executes
the wait system call, and that place is in the process
table
entry for the now defunct (or exited) process.
People always want to kill these zombie processes, and
they wonder
why the kill command has no effect on them. Like true
zombies,
they cannot be killed because they are already dead
-- or, in other
words, because they have already exited. The kill system
call
(actually a misnomer, as you'll see) sends a signal
to a running process.
The running process then reacts to that signal (by ignoring
it, exiting,
or by fielding it). However, a zombie is not a running
process, just
an entry in the process table. Since there is no process
to receive
it, the signal is ignored.
If zombie entries in process table do no harm, then
why do people
try to get rid of them? They may not do harm, but they
do take up
process table slots. In the older, pre-SVR4 UNIX kernels,
the process
table is of fixed size, so a large number of zombie
processes can
get in the way. Eventually, the process table fills
up and no new
processes can be created. This is signified by the "can't
fork"
error message.
Reaping Zombies
So how do you get rid of zombies? The only way is to
get the process's
parent to execute the wait system call. Generally, shells
are
the parents of processes. A shell will "reap"
all of its outstanding
children just after it executes a command and before
it prompts for
the next command. So, if the zombies all belong to a
single terminal,
just executing a command (or often just hitting a carriage
return
to force the next prompt) is sufficient to cause the
shell to execute
the wait call in its command loop and reap the zombies
that
are waiting for it.
If, however, the parent process is hung, it will never
reap its children,
which means that the zombies will stay there. There
is a way around
this problem. When a parent exits, its remaining children
are re-assigned
to process 1, init. init constantly sits in a loop,
reaping children. So killing the parent will cause all
of its child
zombie processes to be reassigned to init, which will
immediately
reap them.
Coming Events
If you've noticed, the output of the ps command is not
very
useful. It comes out in process table order, rather
than in parent-child
order. I've written a tool in perl that takes the output
of the ps
command and sorts it into a much more usable order.
In a later issue
of Sys Admin, I'll present this perl script and explain
how it works. In the interim, you might want to search
your local
archive site for a copy of the perl utility.
In Summary
Zombies are nothing harmful, just a sign that the parent
process has
yet to collect the status of its sub-processes that
have exited. Causing
the parents to wake up and reap their children will
cause the zombies
to go away.
About the Author
Sydney S. Weinstein, CDP, CCP is a consultant, columnist,
lecturer, author,
professor, and President of Datacomp Systems, Inc.,
a consulting and
contracting programming firm specializing in databases,
data presentation
and windowing, transaction processing, networking, testing
and test suites,
and device management for UNIX and MS-DOS. He can be
contacted care of
Datacomp Systems, Inc., 3837 Byron Road, Huntingdon
Valley, PA 19006-2320
or via electronic mail on the Internet/USENET mailbox
syd@DSI.COM (dsinc!syd
for those who cannot do Internet addressing).
|