Easy Troubleshooting with crash
crash is a window into the UNIX kernel -- you can use
to learn just about anything regarding the state of
system. crash is like a debugger for UNIX; it debugs
optionally using a mainstore dump, and displays the
value of kernel
data, which includes processes, files, memory, and system
While crash is full of features for kernel hackers,
it is also
useful for system administrators and programmers. When
you have exhausted
the information provided by more common UNIX tools,
you can use crash
to dig as deep as you want into the operating system.
does not tell you enough about a program, you can learn
from crash. Ncheck will tell you which programs have
a file open, but crash will tell you which files a program
has open. When deadlock occurs, crash will tell you
holding locks and which files are locked. If your system
can use crash to read the mainstore dump to find out
program caused the failure.
crash Front-End Programs
This article presents two perl programs that work as
front ends to
crash. You can use them as real scripts to look at a
system call stack and find out the amount of memory
a process is using,
or you can use them as examples to walk through crash
some basic commands. They will work on the AT&T
3B2 and on AT&T SystemV/386
Release 3.2 for Intel platforms. They will probably
not work on other
variations of UNIX, although modifications may be minor
that come directly from SVR3.
Tracing a Process
The first sample program shows the system call trace
for a process.
This can be helpful if a program is not working as expected
want to see what it is doing. To look at the process
you use two commands, proc and trace.
The proc command lets you view the process table. Much
information is the same as what ps displays, but proc
also provides a slot number so that you can use other
learn more about the program. The command syntax is
proc [-e] [-f] [-w filename] [-r] [process slot|#process id]
You can use p as an alias for proc. The -e, -f,
and -w options work as described in the sidebar "Invoking
crash." The -r option causes only runnable processes
to be displayed and will show the crash program only
on a running
system. If you want to look at specific processes, you
can enter either
a slot number or a pound sign followed by a process
id. The slot number
is an index into an internal kernel table used to track
The output of proc resembles that of ps. The first column
has a slot number, which is used as a parameter for
command. ST shows the state of the process. PID is the
process id. The EVENT column lists an address that can
to find out what object the program is waiting on. The
field shows the name of the process and the FLAGS section
flags that contain more information on the state of
the program. In
order to find out what the flags mean, try looking for
them in /usr/include/sys/proc.h.
Use case-insensitive search when looking for flag names.
The first step, then, to viewing a stack trace is to
as an argument to proc to find out what the process
value is. Once you have this number, you can use the
command to view the stack trace. The trace command has
trace(t) [-e] [-w filename] [-r] [- i[-p] st_addr]
You can use t as an alias for trace. The
-e and -w options work as described in the sidebar "Invoking
crash." The other options are used to dig further
the stack trace and will not be described here.
The first columns of the output list various forms of
Following the addresses are system call names. If the
program is not
executing a system call, you will not get a trace, just
that the program is in user space. The most recent system
listed first. The output often lists calls such as read
open, so you have some idea of what the process is doing.
to the calls are also listed, and you can use these
to find out what
file or event the process is acting on. (This process
involved: I will elaborate on it in a later article.)
A Perl Utility to Show System Call Stack
Listing 1 is a perl program that shows the stack trace
of a given
process. It takes a process id as the argument and calls
get_pslot to find out the slot number of the process.
slot number is then passed to get_trace, which returns
stack trace. The trace is printed and the program exits.
get_pslot opens crash and then enters the command "p
#pid", where pid is the process id. The output
is then closed to force crash to flush its output. The
reads the crash output until it finds the last header
if the process id is valid, the next line contains the
The function then returns the slot number. If the process
id is invalid,
then the displayed line "pid not valid process
id" will be
caught and the function will return a -1.
get_trace opens crash and then enters the command "t
process-slot" to display the system stack trace.
is read into an array and returned.
Checking the Size of a Process
The ps command shows the largest amount of memory that
process has used, which can give you a general idea
of the size of
a running program. If you want to do further analysis,
can provide more information. You can find out how much
of the process
is shared with other processes and how much is private,
as well as
how much memory is in core and how much is on the hard
UNIX memory is divided into regions. A region is a functional
of memory. Typical regions include the program text
and shared library text. crash lists memory usage according
to a program's regions.
To estimate total memory usage for a single program,
all you need
to do is find the program's regions, multiply the number
pages by the page size, and add up the results.
When estimating total system memory usage, you also
want to take advantage
of shared text. In System V, if a program is running
more than once,
its text section (instructions) is stored in memory
only once to save
memory. To compute the total memory requirements for
one or more invocations
of a program, the formula is:
sum of private regions + (number of invocations \
* size of shared regions)
To find out how much memory a process is using, you
first find out what regions the process is using. To
do this, type
proc -f #pid to get a full listing on the process id.
at the end of the display lists all the regions the
program is using.
The first column of the table has process region numbers;
are relative to the process I do not use them here.
The second column
shows region slot numbers. The type column tells you
of region you are looking at, and the flag column lists
facts about the region. You can look for the meaning
of these flags
Once you have the region slot numbers, you can call
command. The syntax is like this:
region [-e] [-f] [-w filename]\
The -e, -f, and -w options are as described in
the sidebar "Invoking crash." The -p option
lets you look for a region by physical address. The
is usually a region slot number, although you can use
if you know them.
I cover only three columns of the region output here:
column, the #VL column, and the type column. The PSZ
tells you how many total pages of memory and disk the
This gives you the maximum size of a region, which is
the actual in-core size. The #VL column tells how many
are actually in RAM (pages are 2Kb on the 3B2 and 4Kb
in System V/386:
you may want to check what it is on your system by looking
NBPP in the file /usr/include/sys/immu.h). The third
column to watch is the type column, which will usually
or stxt for private or shared regions.
To use the region information, run crash when the system
not paging and the program you want to measure has been
Get the region size figures. Add up the private numbers
in the #VL
column. This will give you the amount of in-core memory
the first invocation of the program. Next add up the
shared text numbers
in the #VL column. This will give you the amount of
memory that the first and each additional invocation
of the process
will use. You can then put these numbers into a spreadsheet
out how much memory different combinations of programs
A Perl Utility to Show Process Size
Listing 2 is a perl script that displays the memory
taken up by the
given process. The output will show four figures. SPSZ
full size of the sharable regions in kilobytes. PPSZ
full size of the private regions. SVL is the size of
regions in core. PVL is the amount of private regions
In Listing 2 the variable $nbpp will be set to the size
a page in kilobytes. Function get_page_size returns
of a page in the UNIX implementation you are using.
gets the region numbers from the process table. get_reg_info
gets the region lines for the requested process. get_reg_size
calculates the sizes of the four types of memory in
values are then displayed on the screen.
get_page_size opens /usr/include/sys/immu.h and looks
for the define for NBPP. The value of NBPP is
divided by 1024 to return the number of kilobytes in
get_regions opens crash then enters the command "p
-f #process-id" to get a full listing, which includes
of regions at the end. The function reads lines until
it reaches the
region header. The following lines are read until a
blank line or
end of file is reached. The region number is cut out
of each of the
lines and pushed onto the region array.
get_reg_info opens crash and writes the command "region
region1 region2...." The entire output is read
into array reglist
and returned from the function.
get_reg_size checks each line to see if it represents
or a shared region. The PSZ and #VL fields are then
cut out and added to the variables for total and in-core
function checks the header line to see whether PSZ is
second column or the third, so that the program can
run on both the
3B2 and System V/386.
These programs will give you a good start in learning
Use the programs, but also experiment with the crash
that they incorporate.
About the Author
Beirne Konarski is a systems analyst for Summit Information
Systems, a subsidiary of Roadway Express. He has a BS
Science from Kent State and is currently working on
his Master's there.
His specialties are SNA and tinkering with UNIX. He
can be reached
via email as firstname.lastname@example.org. Copyright 1993 Beirne