In several articles in Sys Admin, I've discussed various
applications
that monitor the system and aid in the performance-tuning
process.
In this article I discuss something that few people
understand but
that UNIX users deal with constantly: the process management
system.
I then explain why and how I developed a command I call
mon,
which is a ps lookalike.
I use a System V environment (SCO UNIX) as the basis
for my discussion
and I assume some level of C knowledge, as I will be
discussing C
system header files and code.
The UNIX process management system is a based on a time-sharing
kernel.
The kernel manages placement of each process in the
CPU for execution.
This process management facility offers kernel routines
that create
processes; kernel routines that handle interrupts; and
a process scheduling
mechanism.
A process is a program in motion. The executable code
stored on the
disk is not a process but a program.
For every executing process, a process context describes
the system
resources. UNIX shares the system resources by switching
the process
contexts. Process contexts consist of (1) text, data,
and stack segments
contained wholly or partially in memory; and (2) kernel
process data
structures.
A task state segment, which defines the registers identifying
where the instructions are located in memory.
The Task State Segment
The Task State Segment (TSS) is a hardware construct.
The TSS contains
a copy of all of the registers needed to locate the
instructions and
data used by the process. These are:
-- the general registers
-- the segment registers
-- the flags register
-- the instruction pointer register
-- the selectors for the process's Local Descriptor
Table, and the kernel's Global Descriptor Table
-- the Page Descriptor Register, and
-- the read-only stack pointers for the privileged
execution levels.
The TSS is highly hardware-dependent. The TSS described
here would
be found in a segmented architecture machine, such as
an Intel 386
or 486 machine. Non-segmented architecture machines,
such as the Motorola
680x0 series, will have a very different TSS structure.
The TSS structure
is listed in /usr/include/sys/tss.h.
The Binary Executable File Structure
The structure of an executable or binary file is related
to the TSS
because of the hardware dependence of the CPU architectures.
There
are several types, the two most important being the
Object Module
Format, which is generally used by Microsoft and XENIX,
and the Common
Object File Format, which is typical on SCO and AT&T
UNIX systems,
as well as others. Regardless of the binary type, each
has several
distinct components:
a text segment, which contains the actual
machine instructions to be executed
a data segment, which contains the variables
and structures used by the program
other tables and structures, which contain
other useful information about the program, including
a symbol table
and a comment section.
At the beginning of every binary executable is a header
describing
the contents of the file. This header identifies the
type of binary,
80286 vs 80386, for example, the size and offset of
the text and data
segments, the size and offset of the symbol table, and
the entry point
of the program.
Binary programs on SCO UNIX may be of the Intel Object
Module Format
(OMF) or the AT&T Common Object File Format (COFF).
The layout of
a binary under SCO UNIX is shown in Figure 1.
Creation of a Process
When a user issues a command to the command interpreter,
the fork(S)
system call is executed. fork() creates a new entry
into a
kernel table known as the process table. The process
table is fixed
in size. I will discuss it in more detail later in this
article.
Through the fork() mechanism, only an existing process
may create a new process. The fork() call copies the
original
process to a new process known as the child, and then
executes the
child process, which may use an exec(S) system call
to load
another program text for the desired process.
fork() gives the parent process the Process ID (PID)
number
of the child, and gives the child process a value of
zero. By this
mechanism, programmers can develop processes that behave
differently
depending on whether the processes perceive themselves
as parent or
child. The sample PERL code in Figure 3 illustrates
a small program
using fork and exec. The fork causes the child
program to be started, which prints the date. The parent
continues
execution and sleeps for 10 seconds, prints the message
"the parent
is dead", and exits.
fork() is accomplished by two kernel routines known
as newproc
and procdup. Collectively, these two routines allocate
a new
PID number and create an entry in the process table
for the process,
then perform the steps listed above.
Process Execution
An exec() system call initially handles the process
execution.
exec() creates and initializes the context for the new
process.
If there isn't already a copy of the process running
in the system,
then a process region is assigned for its text segment
(executable
code), data segment, and stack.
A process region is a data structure that describes
the segment in
memory. For example, it inludes the type of segment
(text, data, stack),
how many memory pages are in the region, the number
of processes sharing
the region, and more. On SCO UNIX systems, the region
table and associated
structures are defined in /usr/include/sys/region.h.
UNIX creates four processes on system startup that exist
for the lifetime
of the system. These processes are the memory scheduler,
the paging
daemon, the buffer flushing daemon, and the init process.
It is important
to note that all four of these processes are running
in kernel mode,
not user mode.
Kernel Mode
In kernel mode state, processes are not preemptible.
In kernel mode,
the CPU is seized by the process until the process gives
up the CPU
voluntarily, or the time-slice has expired. While a
process executes
in kernel mode, signals are saved until the process
exits kernel mode,
whereupon the signals are processed. This is illustrated
in Figure 2.
[Editor's note: Zombie processes and processes that
hang the system
are frequently those trapped in kernel mode. They can't
be interrupted
in kernel mode. Even signal 9 may not get through.]
There are situations
where the process is in such a state as to not respond
to signals,
such as zombie processes. These processes (and others)
do not respond
as they are typically stuck in kernel mode. When using
the appropriate
options to the ps command (ps -el on SCO and AT&T
systems), the process
priorities are listed.
The range for process priorities is 0 to 127, with 0
being highest
priority, and 127 being the lowest. Priorities 0 to
39 indicate kernel
mode, and 40 to 127 indicate user mode. It is important
to note that
processes whose priorities are 26 or higher can respond
to signals,
and processes whose priorities are less than 26 will
not respond to
signals. The more common signals and their values are
listed in Figure 4.
User Mode
User mode is all other states of execution. A user process
can only
execute instructions from its own text segment, reference
its own
data segment, and use its own stack.
Some instructions are privileged and require kernel
mode to execute.
User processes get access to kernel mode by using system
calls, predefined
kernel routines such as open(), read(), and write(),
or loadable device driver routines. Once in kernel mode,
the process
can execute instructions from the kernel's text segment,
access the
kernel's data structures, and use a system stack in
the kernel's u-area.
Switching from user to kernel mode is not a context
switch, but a
mode switch. The running process continues to execute
after a mode
switch. With a context switch, a new TSS is loaded and
a new process
begins execution.
The System Processes
The memory scheduler -- sched, swapper, or PID 0 --
is responsible
for swapping processes in and out of RAM according to
their priority
and the available memory on the system. Most UNIX systems
today perform
demand paging rather than swapping, as older UNIX systems
did. (See
the sidebar, "Paging and Swapping under SCO UNIX,"
for discussion
of demand paging and swapping.)
The paging daemon, typically vhand, or PID 2, steals
pages of memory that have not been recently referenced
for use by
the system or other processes. If the page contains
data or stack segments, then it is saved to the swap
device for later
retrieval. If the page contains program text, the page
is simply used.
The buffer flushing daemon, usually bdflush, or PID
3, flushes
"dirty" buffers which have been in the cache
for too long.
Finally, the init process is the first true user process
that is executed. When entering multiuser mode, init
creates
all of the gettys used to permit login to the system.
Why does /unix not show up in a ps listing? The kernel,
/unix on SCO systems, consists of four distinct parts
that
execute asynchronously, not as a single entity visible
by name in
a ps listing. These parts are: