Article

The UNIX Process Management System

Chris Hare

In several articles in Sys Admin, I've discussed various applications that monitor the system and aid in the performance-tuning process. In this article I discuss something that few people understand but that UNIX users deal with constantly: the process management system. I then explain why and how I developed a command I call mon, which is a ps lookalike.

I use a System V environment (SCO UNIX) as the basis for my discussion and I assume some level of C knowledge, as I will be discussing C system header files and code.

What Is the Process Management System?

The UNIX process management system is a based on a time-sharing kernel. The kernel manages placement of each process in the CPU for execution. This process management facility offers kernel routines that create processes; kernel routines that handle interrupts; and a process scheduling mechanism.

A Process Definition

A process is a program in motion. The executable code stored on the disk is not a process but a program.

For every executing process, a process context describes the system resources. UNIX shares the system resources by switching the process contexts. Process contexts consist of (1) text, data, and stack segments contained wholly or partially in memory; and (2) kernel process data structures.

The kernel process data structures consist of:

The process u-area, usually two 4Kb pages which contains information needed only when the process is swapped in and is a candidate for paging. There is one u-area for every process currently running on the system. The contents of the process u-area are listed in /usr/include/sys/user.h.

A process table entry, which contains information used to determine scheduling. There is one entry in the process table for each process running on this system.

Entries in the file and inode tables, which control access to the system's files and filesystems.

A task state segment, which defines the registers identifying where the instructions are located in memory.

The Task State Segment

The Task State Segment (TSS) is a hardware construct. The TSS contains a copy of all of the registers needed to locate the instructions and data used by the process. These are:

-- the general registers

-- the segment registers

-- the flags register

-- the instruction pointer register

-- the selectors for the process's Local Descriptor Table, and the kernel's Global Descriptor Table

-- the Page Descriptor Register, and

-- the read-only stack pointers for the privileged execution levels.

The TSS is highly hardware-dependent. The TSS described here would be found in a segmented architecture machine, such as an Intel 386 or 486 machine. Non-segmented architecture machines, such as the Motorola 680x0 series, will have a very different TSS structure. The TSS structure is listed in /usr/include/sys/tss.h.

The Binary Executable File Structure

The structure of an executable or binary file is related to the TSS because of the hardware dependence of the CPU architectures. There are several types, the two most important being the Object Module Format, which is generally used by Microsoft and XENIX, and the Common Object File Format, which is typical on SCO and AT&T UNIX systems, as well as others. Regardless of the binary type, each has several distinct components:

a text segment, which contains the actual machine instructions to be executed

a data segment, which contains the variables and structures used by the program

other tables and structures, which contain other useful information about the program, including a symbol table and a comment section.

At the beginning of every binary executable is a header describing the contents of the file. This header identifies the type of binary, 80286 vs 80386, for example, the size and offset of the text and data segments, the size and offset of the symbol table, and the entry point of the program.

Binary programs on SCO UNIX may be of the Intel Object Module Format (OMF) or the AT&T Common Object File Format (COFF). The layout of a binary under SCO UNIX is shown in Figure 1.

Creation of a Process

When a user issues a command to the command interpreter, the fork(S) system call is executed. fork() creates a new entry into a kernel table known as the process table. The process table is fixed in size. I will discuss it in more detail later in this article.

Through the fork() mechanism, only an existing process may create a new process. The fork() call copies the original process to a new process known as the child, and then executes the child process, which may use an exec(S) system call to load another program text for the desired process.

fork() gives the parent process the Process ID (PID) number of the child, and gives the child process a value of zero. By this mechanism, programmers can develop processes that behave differently depending on whether the processes perceive themselves as parent or child. The sample PERL code in Figure 3 illustrates a small program using fork and exec. The fork causes the child program to be started, which prints the date. The parent continues execution and sleeps for 10 seconds, prints the message "the parent is dead", and exits.

fork() is accomplished by two kernel routines known as newproc and procdup. Collectively, these two routines allocate a new PID number and create an entry in the process table for the process, then perform the steps listed above.

Process Execution

An exec() system call initially handles the process execution. exec() creates and initializes the context for the new process. If there isn't already a copy of the process running in the system, then a process region is assigned for its text segment (executable code), data segment, and stack.

A process region is a data structure that describes the segment in memory. For example, it inludes the type of segment (text, data, stack), how many memory pages are in the region, the number of processes sharing the region, and more. On SCO UNIX systems, the region table and associated structures are defined in /usr/include/sys/region.h.

UNIX creates four processes on system startup that exist for the lifetime of the system. These processes are the memory scheduler, the paging daemon, the buffer flushing daemon, and the init process. It is important to note that all four of these processes are running in kernel mode, not user mode.

Kernel Mode

In kernel mode state, processes are not preemptible. In kernel mode, the CPU is seized by the process until the process gives up the CPU voluntarily, or the time-slice has expired. While a process executes in kernel mode, signals are saved until the process exits kernel mode, whereupon the signals are processed. This is illustrated in Figure 2. [Editor's note: Zombie processes and processes that hang the system are frequently those trapped in kernel mode. They can't be interrupted in kernel mode. Even signal 9 may not get through.] There are situations where the process is in such a state as to not respond to signals, such as zombie processes. These processes (and others) do not respond as they are typically stuck in kernel mode. When using the appropriate options to the ps command (ps -el on SCO and AT&T systems), the process priorities are listed.

The range for process priorities is 0 to 127, with 0 being highest priority, and 127 being the lowest. Priorities 0 to 39 indicate kernel mode, and 40 to 127 indicate user mode. It is important to note that processes whose priorities are 26 or higher can respond to signals, and processes whose priorities are less than 26 will not respond to signals. The more common signals and their values are listed in Figure 4.

User Mode

User mode is all other states of execution. A user process can only execute instructions from its own text segment, reference its own data segment, and use its own stack.

Some instructions are privileged and require kernel mode to execute. User processes get access to kernel mode by using system calls, predefined kernel routines such as open(), read(), and write(), or loadable device driver routines. Once in kernel mode, the process can execute instructions from the kernel's text segment, access the kernel's data structures, and use a system stack in the kernel's u-area.

Switching from user to kernel mode is not a context switch, but a mode switch. The running process continues to execute after a mode switch. With a context switch, a new TSS is loaded and a new process begins execution.

The System Processes

The memory scheduler -- sched, swapper, or PID 0 -- is responsible for swapping processes in and out of RAM according to their priority and the available memory on the system. Most UNIX systems today perform demand paging rather than swapping, as older UNIX systems did. (See the sidebar, "Paging and Swapping under SCO UNIX," for discussion of demand paging and swapping.)

The paging daemon, typically vhand, or PID 2, steals pages of memory that have not been recently referenced for use by the system or other processes. If the page contains data or stack segments, then it is saved to the swap device for later retrieval. If the page contains program text, the page is simply used.

The buffer flushing daemon, usually bdflush, or PID 3, flushes "dirty" buffers which have been in the cache for too long.

Finally, the init process is the first true user process that is executed. When entering multiuser mode, init creates all of the gettys used to permit login to the system.

Why does /unix not show up in a ps listing? The kernel, /unix on SCO systems, consists of four distinct parts that execute asynchronously, not as a single entity visible by name in a ps listing. These parts are:

the code to initialize the hardware and kernel data structures

the three system processes already mentioned

the system call support for user processes, and

the exception and interrupt handling support for the hardware.

The Process Table

The process table is configured to contain a maximum number of processes that the system can handle. This maximum number may be based on the amount of RAM installed on the machine, the number of users who will use it, or other site-dependent criteria. On many machines, the size of this table is configurable, but on some it is not.

The process table is a list of process structures. A process structure on an SCO system is defined in /usr/include/sys/proc.h. The table contains an entry for every process that the system is currently executing, whether or not that process is actually running.

The configurability of the process table relates to the number of entries, which is defined by the kernel variable NPROC. This defines the maximum size of the process table. The table is full when the message "no more process" appears on the system console.

Building a Custom Process Statistic Program: mon.c

I built mon.c (Listing 1), a ps command lookalike, to explore the system structures that control the creation, execution, and scheduling of processes. Sample output from mon.c is shown in Table 1. The mon.c process monitor code in If you can't get to the u-area, then you can list the processes, but can't find out which process is what. The code section assumes that if the sysi86() function returns a negative number, then the process no longer exists other than in the process table. The process is defunct.

About the Author

Chris Hare is Ottawa Technical Services Manager for Choreo Systems, Inc. He has worked in the UNIX environment since 1986 and in 1988 became one of the first SCO authorized instructors in Canada. He teaches UNIX introductory, system administration, and programming classes. His current focus is on networking, Perl, and X. Chris can be reached at chare@choreo.ca, or chare@unilabs.org, which is his home.