Article

The Cure for Hung Up, Locked Up, and Frozen Terminals

G. Clark Brown

Introduction

Frozen terminals are one of the most common system administration problems in a UNIX system. When the user says, "It locked up on me!" you need a step-by-step approach for diagnosing and solving the problem with a minimum of damage to the user's work and to the rest of the system. Admittedly, certain brute force methods (e.g., rebooting) will almost always free the terminal, but often such methods also cause the user to lose his/her work. Moreover, because brute force methods seldom help you learn why the terminal froze, they do nothing to help you avoid future lockups. This article describes how to diagnose the cause of most common terminal lockups and supplies a tool that will unfreeze terminals in some situations.

To the user, the problem is simple: "When I press on the keys, nothing happens." The administrator's view is more complex; a frozen terminal can be caused by a wide variety of problems. The hardware, the cable, or the port might be faulty. The settings of the UNIX line or the terminal might be wrong. The program the user is running may have a bug in it, or it may not be ready to accept his keystrokes. Or, perhaps a faulty flow control signal has been received by the terminal or the host.

Preliminary Checks

Begin by collecting the following information from the user:

What was he/she doing at the time of the lock-up? What program was running and where was it in its processing? What was the last key pressed before the problem occurred?

Has this problem been seen before in similar situations? What was done to free the terminal?

What is the exact behavior now? Are keys echoed? Are carriage returns echoed? Does the break key work? What does the screen look like?

Second, use the ps(1M) command to determine what program is running on the user's line. You will need to know the device name of the user's terminal, so you should have a list of these names prepared in advance. If you don't have a list, try to figure out the name by tracing wires and using the tty(1) command on terminals that are plugged into ports near the one that is locked up.

If the user was on /dev/tty1d, for example, you could use the ps -t tty1d or ps -ef | grep 1d commands to get a list of programs running on the line.

Third, use some form of the stty(1) command to check the settings on the frozen line. Although stty options vary across systems, on most System V systems, the stty -a </dev/tty1d command will give a report of the line settings for /dev/tty1d.

Read the UNIX documentation for your host to familiarize yourself with the line settings displayed in the stty report. Note that most settings are displayed with a leading minus sign if they are off and without the minus sign if they are on. Pay particular attention to the flow control settings (like ixon, ixany, ixoff, rtsflow, ctsflow, etc.).

Sometimes, the stty command itself will lock on your terminal when you are trying to read the parameters of a locked up line. If this happens, press break or the interrupt key and continue with the diagnosis.

Once you have the information from the user, the output from the ps command, and the report from the stty command, you can begin the step-by-step elimination of possibilities that will lead you to the problem. If you have seen this same problem before, something in the information you already have will probably point you in the right direction. Otherwise, you should check possible problems in the order of most likely to least likely.

Diagnosis and Treatment Steps

Receive Flow Control Lock

The most common cause of frozen terminals is flow control lock. This occurs when the terminal or the host gets a signal that tells it to stop sending characters. When the line is set to use XON/XOFF flow control (the stty report shows ixon), the host will stop sending characters if it receives a Control-S. This avoids overrun problems by allowing the terminal to control how fast it must receive traffic from the host, but can also cause a very simple form of terminal "lockup." If the user presses Control-S (or the STOP or HOLD key on some terminals), the host will not send any more characters, making the terminal appear to be locked up.

This kind of flow control lock can be released by sending the host a Control-Q, which, in the X-ON/X-OFF protocol, is the signal to resume sending characters. If the terminal was locked by a Control-S, sending the Control-Q will cause all of the commands that were entered while the terminal was "locked" to suddenly "work." Tell the user to be more careful to avoid Control-S, and to try Control-Q if he/she accidentally locks the terminal again.

A Tool to Unfreeze Receive Flow Control for Another Terminal

Sometimes, even though the terminal is locked because of flow control, pressing Control-Q will not fix it. For example, if the parity setting on the line is incorrect, Control-S will be recognized as a valid character, but Control-Q will be ignored because its parity is wrong. In some modes the host can't see the Control-Q until the line buffer in the driver is flushed. Also, sometimes the Control-S was transmitted over a modem line that is now disconnected, making it difficult to send the necessary Control-Q.

In these situations, it is helpful to have a program that can be run by the superuser at another terminal to solve the problem. The program shown in Listing 1 will do this on most UNIX systems. Compile it with cc -O ctrlq.c -o ctrlq. Then use the ctrlq /dev/tty1d command to free the device /dev/tty1d.

Listing 1 uses the ioctl(2) system call to send two special commands to the device driver for the line. The first command, TXONC, tells the driver to start sending characters to the terminal again, even if it has received the XOFF (Control-S) character. The second command, TCFLSH, forces the driver to transmit any characters that are in the output buffer. The flush may seem unnecessary, but experience has shown this second step to be necessary in some cases.

Transmit Flow Control Lock and Keyboard Lock

Problems with flow control signals in the terminal to host direction cause keyboard lock. The terminal will appear to be frozen because it is obeying a host-issued command to stop sending the user's keystrokes to the host. The stop command might be Control-S, or it might be an escape sequence that specifically locks the keyboard. When in the locked state, some terminals will display a "LOCK" or "HOLD" message on the status line, some light an LED indicator on the keyboard, and some just stop working without explanation.

Keyboard lock is often caused by "garbage" on the line. Using a modem over a noisy phone line or running a program with the wrong type of terminal settings can both produce transmissions that the host cannot accurately decode.

On most terminals, there are only two ways to free a locked keyboard. You can turn the terminal off and then back on, or the host can send the command that unlocks the keyboard. The first choice will free the terminal, but leaves you with a blank screen. However, since the power-on resets several other modes, the problem may not have been caused by keyboard lock.

Many terminals have other modes that will stop them from working properly, for example, transparent print and invisible text mode. If you have a consistent problem with one of these, you should check the terminal manual and the program that causes the problem to eliminate the command that is putting the terminal into that mode.

A Tool to Correct Keyboard Lock

The program keyfree.c (Listing 2) sends Control-Q and some other common keyboard-unlock commands to the terminal specified on the command line. Check your terminal manuals to find what commands work for the terminals on your system. Delete the commands that you don't need, so that they won't clutter up the screen when you run this program.

Line Insanity

When a full-screen program dies suddenly, it leaves the line in raw mode (the stty report will show -icannon and -echo). When the user types, nothing happens. In particular, the normal carriage return does not work, but Control-J acts like a carriage return.

The solution to this condition is to put the line back into normal "cooked" mode. On most machines, you do this by typing Control-J stty sane Control-J. On some machines, you use Control-J tset Control-J instead.

On most systems you can also correct such line mode problems from another terminal with stty sane </dev/tty1d. In fact, the stty command always works on its standard input, so you can read and change settings of other lines by using the redirection shown in this example. Only the superuser (root) can change settings on another user's lines.

Simple Hardware Problems

If none of these solutions cures the problem, there may be a hardware problem. The most common problem is a cable knocked loose on the back of the terminal or on the host. These should be checked and screwed in, if possible.

The terminal setup should also be checked. Make sure that the baud rate agrees with the stty report. If the terminal is set for seven data bits, the CS7 flag should be set. For eight data bits, the line should be set to CS8. If parity is set to NONE on the terminal, stty should report -parenb. If parity is EVEN, stty should report parenb -parodd. For ODD parity, look for parenb parodd.

If the host echoes about half of the characters typed, it is usually a parity or data bits problem. If it consistently echoes x or some other incorrect character, it is usually a baud rate problem.

Application Program Problems

A terminal will also appear to be frozen if the user's application program "hangs." A "hung" program has stopped reading keys, but has not exited. By using the ps commands described above, you should be able to get a list of processes running on the terminal. You may also find the who -u command useful, since it gives the process number of the primary program for the terminal. Also, on most UNIX systems /etc/fuser /dev/tty1d will give a list of all process numbers that are using /dev/tty1d.

In the case of a hung application, you must kill the attached processes (with the kill command) to free the terminal. Always use the kill <pid> command first, since it gives the program a chance to save its work files and exit gracefully. Then if ps -fp<pid> shows that the program is still running, use kill -1 <pid>. It is a bad idea to use kill -9 <pid> unless you cannot stop the program with one of the milder forms. The -9 option forces the process to quit immediately without giving it a chance to clean up work files or stop child processes.

Serious Hardware Problems

If there is no program running on the line, and you have checked all of the flow control and stty problems, you may have a more unusual hardware problem. The line may be broken, or there may be a hardware flow control problem.

Using a breakout box (or a voltmeter) verify that pin 2 or 3 on the rs232c connector from the host is active when the terminal is disconnected. Check that the remaining pin in the 2,3 pair is active on the terminal when the line from the host is disconnected.

Checking hardware flow control (CTS/RTS or DTR/DSR) requires knowledge of how the cables are designed to work and what stty settings are supposed to be in effect. Using the breakout box, the manual for the terminal, and the man page for stty, check that everything matches. The details of hardware flow control implementations vary greatly from one type of UNIX to another, and are complex enough to be covered in another article by themselves.

Most hardware problems can be checked by swapping terminals. Find out if the line or the terminal has the problem by switching the locked-up terminal to a line that you know works. If the problem terminal works on the new line, test the terminal from the good line on the problem line. This test will show if the terminal itself has broken. A similar exercise at the host end with two different ports can show if the cable is defective.

System Restart

If all else fails, shutting down the system and restarting it will sometimes clear the problem. This corrects most hardware and software flow control lockups. It also re-initializes the device drivers, which will temporarily correct lockups caused by bugs in the driver code.

Like cycling the power on the terminal, shutting down the machine often corrects the problem without explaining why it happened in the first place. Of course for an uncommon problem, a quick fix that leaves the cause unexplained is usually better for the user than hours of analytic downtime. But, if the problem occurs often you should investigate it as much as you can before rebooting.

Prevention

For some kinds of lock-up, once you have accurately diagnosed the cause, you can implement a permanent solution. If a program has an error, you can correct it. If line noise is causing the problem, you can upgrade your modems and phone lines. But even when you can't absolutely eliminate a cause, you can still take preventative steps to reduce the frequency of terminal lockups.

First, you can educate the users about terminal lockups. Make sure they know how to document the conditions that lead to the lock-up. If there are problems that they encounter often, show them how to correctly recover on their own.

Many sites also schedule regular shutdowns. Shutting down the machine on a regular basis (once a week) may avoid some obscure bugs in the device drivers that only occur after the terminal has been running for a long time.

Final Notes

This article gives you a "first draft" procedure for freeing frozen terminals. To customize it for your site, you should make your own list of common problems and incorporate appropriate solutions as you encounter them. Of course, the most important step in this procedure is to diagnose the problem as you solve it. An accurate diagnosis is the first step toward avoiding the same trouble in the future.

About the Author

G. Clark Brown is a senior software engineer at Structured Software Solutions Inc. in Plano, TX. As developer/support contact for SSSI's FacetTerm and Facet/PC products, he deals with a variety of installation and configuration problems that relate to connecting ASCII terminals to UNIX and making them work with applications. Clark has been doing this with applications that he has written for 16 years (nine years with UNIX).