Cover V02, I01
Article
Figure 1
Figure 2

jan93.tar


Zombie Processes

Sydney S. Weinstein, CDP, CCP

UNIX is full of colorful terminology (such as killing children -- oops, I mean child processes), and zombies are a particularly dramatic example. Before I explain what a Zombie process is, however, a short tutorial on reading the process status listing, which describes UNIX's process table, is in order.

The UNIX Command Process Status

Each process running under UNIX has its own entry in the system's Process Table. This entry describes the process, its current status, its accounting information, and its resources. It is used by the scheduler to assign CPU resources to the process, and by the system to account for which processes are using various other system resources. Note that I have been saying processes instead of programs: since UNIX has a fork system call, a single program can be made up of many processes. Each process has its own entry in the process table.

There are two major flavors of UNIX, and therefore two major flavors of the process table. Each flavor of process table also has its own flavor of ps command. Those UNIX systems based on the USL System V model support a ps command that selects by terminal, process, or user and has output similar to that shown in Figure 1. The other ps command format, for systems descended from the Berkeley Software Distribution (BSD) UNIX, has no direct selection option except for current user or all processes. With either command, the process table is printed out in order of the entries in the systems process table, which at times can be a pretty useless order.

Neither ps command provides much detail in its default output (ps -ef for USL, ps -ax for BSD). Both provide the reference number or Process ID (PID); the terminal associated with the command, if any, or a "?" if the process is not attached to a terminal; the accumulated CPU time; and the command. In addition, USL versions add the owner of the process, its parent's process ID, the time it started executing (was created), and something called C, which is a flag that specifies how CPU-intensive a process is, for use by the scheduler. The higher the number (roughly a percentage of CPU utilization over the last second), the more CPU-intensive a process is. The BSD version provides a status flag column, which lists what state the process is in.

Zombies

Figure 1 and Figure 2 both include a process with the command field showing <defunct>. This is the dreaded zombie process. The BSD ps even shows the state as Z (for Zombie). A process becomes a zombie (or defunct) when it has actually exited, but its parent has not yet picked up the status of that exit. To revert to traditional terms: a process is created with a fork and reaped with a join. The UNIX fork system call matches the fork concept. The join is performed with the wait system call. So a zombie process has exited, but its parent has yet to issue the wait system call.

The wait system call returns several parameters to the parent process's information area. These include the exit status and the amount of system and user CPU time used by this child process. These three values need a place to be remembered until the parent executes the wait system call, and that place is in the process table entry for the now defunct (or exited) process.

People always want to kill these zombie processes, and they wonder why the kill command has no effect on them. Like true zombies, they cannot be killed because they are already dead -- or, in other words, because they have already exited. The kill system call (actually a misnomer, as you'll see) sends a signal to a running process. The running process then reacts to that signal (by ignoring it, exiting, or by fielding it). However, a zombie is not a running process, just an entry in the process table. Since there is no process to receive it, the signal is ignored.

If zombie entries in process table do no harm, then why do people try to get rid of them? They may not do harm, but they do take up process table slots. In the older, pre-SVR4 UNIX kernels, the process table is of fixed size, so a large number of zombie processes can get in the way. Eventually, the process table fills up and no new processes can be created. This is signified by the "can't fork" error message.

Reaping Zombies

So how do you get rid of zombies? The only way is to get the process's parent to execute the wait system call. Generally, shells are the parents of processes. A shell will "reap" all of its outstanding children just after it executes a command and before it prompts for the next command. So, if the zombies all belong to a single terminal, just executing a command (or often just hitting a carriage return to force the next prompt) is sufficient to cause the shell to execute the wait call in its command loop and reap the zombies that are waiting for it.

If, however, the parent process is hung, it will never reap its children, which means that the zombies will stay there. There is a way around this problem. When a parent exits, its remaining children are re-assigned to process 1, init. init constantly sits in a loop, reaping children. So killing the parent will cause all of its child zombie processes to be reassigned to init, which will immediately reap them.

Coming Events

If you've noticed, the output of the ps command is not very useful. It comes out in process table order, rather than in parent-child order. I've written a tool in perl that takes the output of the ps command and sorts it into a much more usable order. In a later issue of Sys Admin, I'll present this perl script and explain how it works. In the interim, you might want to search your local archive site for a copy of the perl utility.

In Summary

Zombies are nothing harmful, just a sign that the parent process has yet to collect the status of its sub-processes that have exited. Causing the parents to wake up and reap their children will cause the zombies to go away.

About the Author

Sydney S. Weinstein, CDP, CCP is a consultant, columnist, lecturer, author, professor, and President of Datacomp Systems, Inc., a consulting and contracting programming firm specializing in databases, data presentation and windowing, transaction processing, networking, testing and test suites, and device management for UNIX and MS-DOS. He can be contacted care of Datacomp Systems, Inc., 3837 Byron Road, Huntingdon Valley, PA 19006-2320 or via electronic mail on the Internet/USENET mailbox syd@DSI.COM (dsinc!syd for those who cannot do Internet addressing).