G. Clark Brown
Frozen terminals are one of the most common system administration
problems in a UNIX system. When the user says, "It
locked up on
me!" you need a step-by-step approach for diagnosing
and solving
the problem with a minimum of damage to the user's work
and to the
rest of the system. Admittedly, certain brute force
methods (e.g.,
rebooting) will almost always free the terminal, but
often such methods
also cause the user to lose his/her work. Moreover,
because brute
force methods seldom help you learn why the terminal
froze,
they do nothing to help you avoid future lockups. This
article describes
how to diagnose the cause of most common terminal lockups
and supplies
a tool that will unfreeze terminals in some situations.
To the user, the problem is simple: "When I press
on the keys,
nothing happens." The administrator's view is more
complex; a
frozen terminal can be caused by a wide variety of problems.
The hardware,
the cable, or the port might be faulty. The settings
of the UNIX line
or the terminal might be wrong. The program the user
is running may
have a bug in it, or it may not be ready to accept his
keystrokes.
Or, perhaps a faulty flow control signal has been received
by the
terminal or the host.
What is the exact behavior now? Are keys echoed? Are
carriage returns echoed? Does the break key work? What
does the screen
look like?
Second, use the ps(1M) command to determine what program
is
running on the user's line. You will need to know the
device name
of the user's terminal, so you should have a list of
these names prepared
in advance. If you don't have a list, try to figure
out the name by
tracing wires and using the tty(1) command on terminals
that
are plugged into ports near the one that is locked up.
If the user was on /dev/tty1d, for example, you could
use the
ps -t tty1d or ps -ef | grep 1d commands to get a list
of programs running on the line.
Third, use some form of the stty(1) command to check
the settings
on the frozen line. Although stty options vary across
systems,
on most System V systems, the stty -a </dev/tty1d
command will
give a report of the line settings for /dev/tty1d.
Read the UNIX documentation for your host to familiarize
yourself
with the line settings displayed in the stty report.
Note that
most settings are displayed with a leading minus sign
if they are
off and without the minus sign if they are on. Pay particular
attention
to the flow control settings (like ixon, ixany, ixoff,
rtsflow, ctsflow, etc.).
Sometimes, the stty command itself will lock on your
terminal
when you are trying to read the parameters of a locked
up line. If
this happens, press break or the interrupt key and continue
with the diagnosis.
Once you have the information from the user, the output
from the ps
command, and the report from the stty command, you can
begin
the step-by-step elimination of possibilities that will
lead you to
the problem. If you have seen this same problem before,
something
in the information you already have will probably point
you in the
right direction. Otherwise, you should check possible
problems in
the order of most likely to least likely.
Diagnosis and Treatment Steps
Receive Flow Control Lock
The most common cause of frozen terminals is flow control
lock. This
occurs when the terminal or the host gets a signal that
tells it to
stop sending characters. When the line is set to use
XON/XOFF flow
control (the stty report shows ixon), the host will
stop sending characters if it receives a Control-S.
This avoids
overrun problems by allowing the terminal to control
how fast it must
receive traffic from the host, but can also cause a
very simple form
of terminal "lockup." If the user presses
Control-S
(or the STOP or HOLD key on some terminals), the host
will not send any more characters, making the terminal
appear to be
locked up.
This kind of flow control lock can be released by sending
the host
a Control-Q, which, in the X-ON/X-OFF protocol, is the
signal
to resume sending characters. If the terminal was locked
by a Control-S,
sending the Control-Q will cause all of the commands
that were
entered while the terminal was "locked" to
suddenly "work."
Tell the user to be more careful to avoid Control-S,
and to
try Control-Q if he/she accidentally locks the terminal
again.
A Tool to Unfreeze Receive Flow Control for Another Terminal
Sometimes, even though the terminal is locked because
of flow control,
pressing Control-Q will not fix it. For example, if
the parity
setting on the line is incorrect, Control-S will be
recognized
as a valid character, but Control-Q will be ignored
because
its parity is wrong. In some modes the host can't see
the Control-Q
until the line buffer in the driver is flushed. Also,
sometimes
the Control-S was transmitted over a modem line that
is now
disconnected, making it difficult to send the necessary
Control-Q.
In these situations, it is helpful to have a program
that can be run
by the superuser at another terminal to solve the problem.
The program
shown in Listing 1 will do this on most UNIX systems.
Compile it with
cc -O ctrlq.c -o ctrlq. Then use the ctrlq /dev/tty1d
command to free the device /dev/tty1d.
Listing 1 uses the ioctl(2) system call to send two
special
commands to the device driver for the line. The first
command, TXONC,
tells the driver to start sending characters to the
terminal again,
even if it has received the XOFF (Control-S) character.
The second command, TCFLSH, forces the driver to transmit
any
characters that are in the output buffer. The flush
may seem unnecessary,
but experience has shown this second step to be necessary
in some
cases.
Transmit Flow Control Lock and Keyboard Lock
Problems with flow control signals in the terminal to
host direction
cause keyboard lock. The terminal will appear to be
frozen because
it is obeying a host-issued command to stop sending
the user's keystrokes
to the host. The stop command might be Control-S, or
it might
be an escape sequence that specifically locks the keyboard.
When in
the locked state, some terminals will display a "LOCK"
or
"HOLD" message on the status line, some light
an LED indicator
on the keyboard, and some just stop working without
explanation.
Keyboard lock is often caused by "garbage"
on the line. Using
a modem over a noisy phone line or running a program
with the wrong
type of terminal settings can both produce transmissions
that the
host cannot accurately decode.
On most terminals, there are only two ways to free a
locked keyboard.
You can turn the terminal off and then back on, or the
host can send
the command that unlocks the keyboard. The first choice
will free
the terminal, but leaves you with a blank screen. However,
since the
power-on resets several other modes, the problem may
not have been
caused by keyboard lock.
Many terminals have other modes that will stop them
from working properly,
for example, transparent print and invisible text mode.
If you have
a consistent problem with one of these, you should check
the terminal
manual and the program that causes the problem to eliminate
the command
that is putting the terminal into that mode.
A Tool to Correct Keyboard Lock
The program keyfree.c (Listing 2) sends Control-Q and
some other common keyboard-unlock commands to the terminal
specified
on the command line. Check your terminal manuals to
find what commands
work for the terminals on your system. Delete the commands
that you
don't need, so that they won't clutter up the screen
when you run
this program.
Line Insanity
When a full-screen program dies suddenly, it leaves
the line in raw
mode (the stty report will show -icannon and -echo).
When the user types, nothing happens. In particular,
the normal carriage
return does not work, but Control-J acts like a carriage
return.
The solution to this condition is to put the line back
into normal
"cooked" mode. On most machines, you do this
by typing Control-J
stty sane Control-J. On some machines, you use Control-J
tset
Control-J instead.
On most systems you can also correct such line mode
problems from
another terminal with stty sane </dev/tty1d. In fact,
the stty
command always works on its standard input, so you can
read and change
settings of other lines by using the redirection shown
in this example.
Only the superuser (root) can change settings on another
user's lines.
Simple Hardware Problems
If none of these solutions cures the problem, there
may be a hardware
problem. The most common problem is a cable knocked
loose on the back
of the terminal or on the host. These should be checked
and screwed
in, if possible.
The terminal setup should also be checked. Make sure
that the baud
rate agrees with the stty report. If the terminal is
set for
seven data bits, the CS7 flag should be set. For eight
data
bits, the line should be set to CS8. If parity is set
to NONE
on the terminal, stty should report -parenb. If parity
is EVEN, stty should report parenb -parodd. For
ODD parity, look for parenb parodd.
If the host echoes about half of the characters typed,
it is usually
a parity or data bits problem. If it consistently echoes
x
or some other incorrect character, it is usually a baud
rate problem.
Application Program Problems
A terminal will also appear to be frozen if the user's
application
program "hangs." A "hung" program
has stopped reading
keys, but has not exited. By using the ps commands described
above, you should be able to get a list of processes
running on the
terminal. You may also find the who -u command useful,
since
it gives the process number of the primary program for
the terminal.
Also, on most UNIX systems /etc/fuser /dev/tty1d will
give
a list of all process numbers that are using /dev/tty1d.
In the case of a hung application, you must kill the
attached processes
(with the kill command) to free the terminal. Always
use the
kill <pid> command first, since it gives the program
a chance
to save its work files and exit gracefully. Then if
ps -fp<pid>
shows that the program is still running, use kill -1
<pid>.
It is a bad idea to use kill -9 <pid> unless you
cannot stop
the program with one of the milder forms. The -9 option
forces
the process to quit immediately without giving it a
chance to clean
up work files or stop child processes.
Serious Hardware Problems
If there is no program running on the line, and you
have checked all
of the flow control and stty problems, you may have
a more
unusual hardware problem. The line may be broken, or
there may be
a hardware flow control problem.
Using a breakout box (or a voltmeter) verify that pin
2 or 3 on the
rs232c connector from the host is active when the terminal
is disconnected.
Check that the remaining pin in the 2,3 pair is active
on the terminal
when the line from the host is disconnected.
Checking hardware flow control (CTS/RTS or DTR/DSR)
requires knowledge
of how the cables are designed to work and what stty
settings
are supposed to be in effect. Using the breakout box,
the manual for
the terminal, and the man page for stty, check that
everything matches. The details of hardware flow control
implementations
vary greatly from one type of UNIX to another, and are
complex enough
to be covered in another article by themselves.
Most hardware problems can be checked by swapping terminals.
Find
out if the line or the terminal has the problem by switching
the locked-up
terminal to a line that you know works. If the problem
terminal works
on the new line, test the terminal from the good line
on the problem
line. This test will show if the terminal itself has
broken. A similar
exercise at the host end with two different ports can
show if the
cable is defective.
System Restart
If all else fails, shutting down the system and restarting
it will
sometimes clear the problem. This corrects most hardware
and software
flow control lockups. It also re-initializes the device
drivers, which
will temporarily correct lockups caused by bugs in the
driver code.
Like cycling the power on the terminal, shutting down
the machine
often corrects the problem without explaining why it
happened in the
first place. Of course for an uncommon problem, a quick
fix that leaves
the cause unexplained is usually better for the user
than hours of
analytic downtime. But, if the problem occurs often
you should investigate
it as much as you can before rebooting.
Prevention
For some kinds of lock-up, once you have accurately
diagnosed the
cause, you can implement a permanent solution. If a
program has an
error, you can correct it. If line noise is causing
the problem, you
can upgrade your modems and phone lines. But even when
you can't absolutely
eliminate a cause, you can still take preventative steps
to reduce
the frequency of terminal lockups.
First, you can educate the users about terminal lockups.
Make sure
they know how to document the conditions that lead to
the lock-up.
If there are problems that they encounter often, show
them how to
correctly recover on their own.
Many sites also schedule regular shutdowns. Shutting
down the machine
on a regular basis (once a week) may avoid some obscure
bugs in the
device drivers that only occur after the terminal has
been running
for a long time.
Final Notes
This article gives you a "first draft" procedure
for freeing
frozen terminals. To customize it for your site, you
should make your
own list of common problems and incorporate appropriate
solutions
as you encounter them. Of course, the most important
step in this
procedure is to diagnose the problem as you solve it.
An accurate
diagnosis is the first step toward avoiding the same
trouble in the
future.
About the Author
G. Clark Brown is a senior software engineer at Structured
Software Solutions Inc. in Plano, TX. As developer/support
contact
for SSSI's FacetTerm and Facet/PC products, he deals
with a variety
of installation and configuration problems that relate
to connecting
ASCII terminals to UNIX and making them work with applications.
Clark
has been doing this with applications that he has written
for 16 years
(nine years with UNIX).