Easy Troubleshooting with crash
Beirne Konarski
crash is a window into the UNIX kernel -- you can use
it
to learn just about anything regarding the state of
the operating
system. crash is like a debugger for UNIX; it debugs
the kernel,
optionally using a mainstore dump, and displays the
value of kernel
data, which includes processes, files, memory, and system
tables.
While crash is full of features for kernel hackers,
it is also
useful for system administrators and programmers. When
you have exhausted
the information provided by more common UNIX tools,
you can use crash
to dig as deep as you want into the operating system.
If ps
does not tell you enough about a program, you can learn
much more
from crash. Ncheck will tell you which programs have
a file open, but crash will tell you which files a program
has open. When deadlock occurs, crash will tell you
which are
holding locks and which files are locked. If your system
panics, you
can use crash to read the mainstore dump to find out
which
program caused the failure.
crash Front-End Programs
This article presents two perl programs that work as
front ends to
crash. You can use them as real scripts to look at a
process's
system call stack and find out the amount of memory
a process is using,
or you can use them as examples to walk through crash
and learn
some basic commands. They will work on the AT&T
3B2 and on AT&T SystemV/386
Release 3.2 for Intel platforms. They will probably
not work on other
variations of UNIX, although modifications may be minor
for variations
that come directly from SVR3.
Tracing a Process
The first sample program shows the system call trace
for a process.
This can be helpful if a program is not working as expected
and you
want to see what it is doing. To look at the process
stack trace,
you use two commands, proc and trace.
The proc command lets you view the process table. Much
of the
information is the same as what ps displays, but proc
also provides a slot number so that you can use other
commands to
learn more about the program. The command syntax is
like this:
proc [-e] [-f] [-w filename] [-r] [process slot|#process id]
You can use p as an alias for proc. The -e, -f,
and -w options work as described in the sidebar "Invoking
crash." The -r option causes only runnable processes
to be displayed and will show the crash program only
on a running
system. If you want to look at specific processes, you
can enter either
a slot number or a pound sign followed by a process
id. The slot number
is an index into an internal kernel table used to track
processes.
The output of proc resembles that of ps. The first column
has a slot number, which is used as a parameter for
the trace
command. ST shows the state of the process. PID is the
process id. The EVENT column lists an address that can
be used
to find out what object the program is waiting on. The
NAME
field shows the name of the process and the FLAGS section
lists
flags that contain more information on the state of
the program. In
order to find out what the flags mean, try looking for
them in /usr/include/sys/proc.h.
Use case-insensitive search when looking for flag names.
The first step, then, to viewing a stack trace is to
use #process-id
as an argument to proc to find out what the process
slot
value is. Once you have this number, you can use the
trace
command to view the stack trace. The trace command has
the
following syntax:
trace(t) [-e] [-w filename] [-r] [- i[-p] st_addr]
You can use t as an alias for trace. The
-e and -w options work as described in the sidebar "Invoking
crash." The other options are used to dig further
into
the stack trace and will not be described here.
trace Output
The first columns of the output list various forms of
stack addresses.
Following the addresses are system call names. If the
program is not
executing a system call, you will not get a trace, just
a message
that the program is in user space. The most recent system
call is
listed first. The output often lists calls such as read
or
open, so you have some idea of what the process is doing.
Arguments
to the calls are also listed, and you can use these
to find out what
file or event the process is acting on. (This process
is somewhat
involved: I will elaborate on it in a later article.)
A Perl Utility to Show System Call Stack
Listing 1 is a perl program that shows the stack trace
of a given
process. It takes a process id as the argument and calls
function
get_pslot to find out the slot number of the process.
The process
slot number is then passed to get_trace, which returns
the
stack trace. The trace is printed and the program exits.
get_pslot opens crash and then enters the command "p
#pid", where pid is the process id. The output
pipe
is then closed to force crash to flush its output. The
function
reads the crash output until it finds the last header
line;
if the process id is valid, the next line contains the
process slot.
The function then returns the slot number. If the process
id is invalid,
then the displayed line "pid not valid process
id" will be
caught and the function will return a -1.
get_trace opens crash and then enters the command "t
process-slot" to display the system stack trace.
The trace
is read into an array and returned.
Checking the Size of a Process
The ps command shows the largest amount of memory that
the
process has used, which can give you a general idea
of the size of
a running program. If you want to do further analysis,
though, crash
can provide more information. You can find out how much
of the process
is shared with other processes and how much is private,
as well as
how much memory is in core and how much is on the hard
disk.
UNIX memory is divided into regions. A region is a functional
block
of memory. Typical regions include the program text
section, data,
and shared library text. crash lists memory usage according
to a program's regions.
To estimate total memory usage for a single program,
all you need
to do is find the program's regions, multiply the number
of in-core
pages by the page size, and add up the results.
When estimating total system memory usage, you also
want to take advantage
of shared text. In System V, if a program is running
more than once,
its text section (instructions) is stored in memory
only once to save
memory. To compute the total memory requirements for
one or more invocations
of a program, the formula is:
sum of private regions + (number of invocations \
* size of shared regions)
To find out how much memory a process is using, you
must
first find out what regions the process is using. To
do this, type
proc -f #pid to get a full listing on the process id.
A table
at the end of the display lists all the regions the
program is using.
The first column of the table has process region numbers;
since these
are relative to the process I do not use them here.
The second column
shows region slot numbers. The type column tells you
what kind
of region you are looking at, and the flag column lists
further
facts about the region. You can look for the meaning
of these flags
in /usr/include/sys/region.h.
Once you have the region slot numbers, you can call
the region
command. The syntax is like this:
region [-e] [-f] [-w filename]\
[[-p] tbl_entry(s)]
The -e, -f, and -w options are as described in
the sidebar "Invoking crash." The -p option
lets you look for a region by physical address. The
tbl_entry
is usually a region slot number, although you can use
addresses
if you know them.
I cover only three columns of the region output here:
the PSZ
column, the #VL column, and the type column. The PSZ
column
tells you how many total pages of memory and disk the
region occupies.
This gives you the maximum size of a region, which is
usually not
the actual in-core size. The #VL column tells how many
pages
are actually in RAM (pages are 2Kb on the 3B2 and 4Kb
in System V/386:
you may want to check what it is on your system by looking
at macro
NBPP in the file /usr/include/sys/immu.h). The third
column to watch is the type column, which will usually
be prvt
or stxt for private or shared regions.
To use the region information, run crash when the system
is
not paging and the program you want to measure has been
active recently.
Get the region size figures. Add up the private numbers
in the #VL
column. This will give you the amount of in-core memory
needed for
the first invocation of the program. Next add up the
shared text numbers
in the #VL column. This will give you the amount of
in-core
memory that the first and each additional invocation
of the process
will use. You can then put these numbers into a spreadsheet
and find
out how much memory different combinations of programs
will use.
A Perl Utility to Show Process Size
Listing 2 is a perl script that displays the memory
taken up by the
given process. The output will show four figures. SPSZ
is the
full size of the sharable regions in kilobytes. PPSZ
is the
full size of the private regions. SVL is the size of
the sharable
regions in core. PVL is the amount of private regions
in core.
In Listing 2 the variable $nbpp will be set to the size
of
a page in kilobytes. Function get_page_size returns
the size
of a page in the UNIX implementation you are using.
get_regions
gets the region numbers from the process table. get_reg_info
gets the region lines for the requested process. get_reg_size
calculates the sizes of the four types of memory in
kilobytes. The
values are then displayed on the screen.
get_page_size opens /usr/include/sys/immu.h and looks
for the define for NBPP. The value of NBPP is
divided by 1024 to return the number of kilobytes in
a page.
get_regions opens crash then enters the command "p
-f #process-id" to get a full listing, which includes
the list
of regions at the end. The function reads lines until
it reaches the
region header. The following lines are read until a
blank line or
end of file is reached. The region number is cut out
of each of the
lines and pushed onto the region array.
get_reg_info opens crash and writes the command "region
region1 region2...." The entire output is read
into array reglist
and returned from the function.
get_reg_size checks each line to see if it represents
a private
or a shared region. The PSZ and #VL fields are then
cut out and added to the variables for total and in-core
memory. The
function checks the header line to see whether PSZ is
in the
second column or the third, so that the program can
run on both the
3B2 and System V/386.
Summary
These programs will give you a good start in learning
crash.
Use the programs, but also experiment with the crash
commands
that they incorporate.
About the Author
Beirne Konarski is a systems analyst for Summit Information
Systems, a subsidiary of Roadway Express. He has a BS
in Computer
Science from Kent State and is currently working on
his Master's there.
His specialties are SNA and tinkering with UNIX. He
can be reached
via email as beirnek@summitis.com. Copyright 1993 Beirne
Konarski.
|