Article

Standard UNIX Network Diagnostic Tools

Emmett Dulaney

Once the software has been installed, the appropriate cards added, and the cabling completed, it's time to test the network to see if all components are communicating as they should. A variety of diagnostic tools can be purchased from third parties, but the UNIX operating system contains many that do a remarkably competent job, without requiring the outlay of additional capital.

The UNIX network diagnostic utilities are included with almost every vendor's UNIX networking package. They are useful not only on a first install, but also at any time when an administrator might need to check the status of the connections. There are seven such utilities, or tools: ping, netstat, ifconfig, rwho, ruptime, rlogin, and remsh. The last four, all of which start with an "r," work only when two or more UNIX stations are involved, and not with connected processors running another operating system.

ping

Submarines use sonar waves to test the water and see if any other objects, including other submarines, are out there. This process is known as "pinging," and the creators of the utility thought the analogy close enough to borrow its name.

Typically, /usr/etc or /var/etc contains ping whose purpose in life is to test network applications and connections. ping reads addresses and entries from the /etc/hosts file and is able to communicate with host machines listed there. The first entry in this file is always:

127.0.0.1  me loopback localhost

This provides an internal address to the network card. You can test the status of that card by using ping me. This creates a loop wherein a signal is sent through the internal hardware to verify that all is working properly.

Two versions of ping are presently in use. The first continues to send signals until interrupted; the second performs a quick operation and reports the outcome. A sample session with the first looks like:

# /usr/etc/ping me

PING me: 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0. time=2 60th of sec
64 bytes from 127.0.0.1: icmp_seq=1. time=1 60th of sec
64 bytes from 127.0.0.1: icmp_seq=2. time=1 60th of sec
64 bytes from 127.0.0.1: icmp_seq=3. time=2 60th of sec
64 bytes from 127.0.0.1: icmp_seq=4. time=1 60th of sec
64 bytes from 127.0.0.1: icmp_seq=5. time=1 60th of sec
64 bytes from 127.0.0.1: icmp_seq=6. time=1 60th of sec
64 bytes from 127.0.0.1: icmp_seq=7. time=1 60th of sec
64 bytes from 127.0.0.1: icmp_seq=8. time=1 60th of sec
64 bytes from 127.0.0.1: icmp_seq=9. time=1 60th of sec
64 bytes from 127.0.0.1: icmp_seq=10. time=1 60th of sec
(<Ctrl><D> pressed by user):

----me PING Statistics----

11 packets transmitted, 11 packets received, 0% packet loss
round-trip (60th of sec) min/avg/max = 1/1/2
#

For the second, a sample session looks like:

# /usr/etc/ping me
PING me: is alive

In both instances, all is as should be -- the packets were sent and received, with zero packet loss. To test another machine on the network, specify its name in place of "me":

# /usr/etc/ping QUEEN
PING QUEEN: 56 data bytes
64 bytes from 197.9.200.17: icmp_seq=0. time=2 60th of sec

64 bytes from 197.9.200.17: icmp_seq=1. time=1 60th of sec

64 bytes from 197.9.200.17: icmp_seq=2. time=2 60th of sec

64 bytes from 197.9.200.17: icmp_seq=3. time=1 60th of sec

(<Ctrl><D> pressed by user):

----SUN7 PING Statistics----
4 packets transmitted, 4 packets received, 0% packet loss

round-trip (60th of sec) min/avg/max = 1/1/2
#

In this example, known host QUEEN is able to be reached and no errors occurred during the packet testing. Errors that can occur include the inability to send packets due to the machine being down, or an unknown host on the network, as shown in the following example:

# /usr/etc/ping SUN7
PING SUN7: 56 data bytes
(<Ctrl><D> pressed by user after several seconds):

----SUN7 PING Statistics----

11 packets transmitted, 0 packets received, 100% packet loss

#

In this instance, the machine is down, so nothing is received or echoed back. With the second ping version, the result would indicate that the machine is not alive. If you are using the version of ping that continues testing until interrupted, you can specify the number of bytes to be sent in each packet, and the number of packets to be sent, causing ping to automatically time out at the desired count:

# /usr/etc/ping me 64 4
PING me: 64 data bytes
72 bytes from 127.0.0.1: icmp_seq=0. time=1 60th of sec
72 bytes from 127.0.0.1: icmp_seq=1. time=1 60th of sec
72 bytes from 127.0.0.1: icmp_seq=2. time=1 60th of sec
72 bytes from 127.0.0.1: icmp_seq=3. time=1 60th of sec

----me PING Statistics----

4 packets transmitted, 4 packets received, 0% packet loss
round-trip (60th of sec) min/avg/max = 1/1/2
#

Apart from providing information about whether or not the other host can be communicated with, ping reports two other items. "icmp_seq" is the sequence number in which the packets are arriving at the host. If they are not in numeric order, then packets are being scrambled and there could be a hardware conflict. "time" is the amount of time in milliseconds that it is taking for a round trip of the packet, from send to receive.

Netstat

ping is useful for checking the status of one host. netstat gives you information on how the whole network is interacting with this host. netstat can be called with several options that provide more detailed information:

-A -- adds associated protocol control block to the display

-a -- shows all network interfaces

-i -- displays configured network interfaces

-n -- shows the output in numeric form

-r -- gives a routing table display, where applicable

-s -- displays statistics

netstat is a very powerful diagnostic tool, as Figures 1 through 5 show. Figure 1 presents a standard invocation of netstat, while Figure 2 changes the display from alphanumeric representations of the hosts to straight numeric. Figure 3 lists the routing tables -- in this instance, no router is used, thus the only entry other than "me" is for the host itself. Also note the interface of "en0," symbolizing an ethernet card\connection.

Figure 4 shows all the networked interfaces. The processes that have been highlighted are network daemons that are running. Notice the states that processes are in when netstat -a is invoked. ESTABLISHED means a connection is presently in progress. LISTEN indicates the host is waiting for a connection request, and TIME_WAIT denotes that the host is waiting for sufficient time to pass to ensure the remote has received the acknowledgment to close its connection.

Other possible states, many of which sound very similar to each other, are:

LASTACK -- the host has sent and received a request to close the connection and is waiting for acknowledgment from the remote.

CLOSING -- the connection is closing.

SYNSENT -- the machine is waiting for a connection open request from the remote.

SYNRECEIVED -- the open connection request has been sent and received and now the host is awaiting request acknowledgment.

CLOSEWAIT -- the machine is awaiting a close request from the local host after receiving such a request from the remote.

FINWAIT1 -- the local has requested a close and is now waiting for the same from the remote.

FINWAIT2 -- the local is waiting for a connection termination request from the remote.

Figure 5 is probably the most useful of all. It shows the statistics for network transactions that have occurred on this host. The statistical categories are described in the following paragraphs.

The "ip" section deals with incoming packets. Numbers greater than zero can indicate problems with the internal boards or with the cabling leading to the host.

The "icmp" section pertains to the Internet Control Message Protocol, which sends error or control messages from one host to another. In this example, no errors have occurred.

The "tcp" section contains important information including "acks for unsent data," as well as out-of-order packets, packets discarded for any reason, and the number of packets received after close. Packets received after close are lost for good -- they should never occur, but often do in the real world. "Packets" sent represents the total number of packets TCP has sent to the network board, while "packets received" is the number sent from the board to TCP. "Connection requests" is the number of times this host has accessed another, while "connection accepts" is the number of times other hosts have accessed this one. "Connections closed" is an unpredictable number and so not useful as a reference. The one thing to note is that the number should always exceed the number of connections established.

ifconfig

ifconfig, which usually resides in the /usr/etc or /var/etc subdirectories, shows the value of the network interface, and can also be used to set it. To see the value, you must specify the network card. In the example in Figure 3, an ethernet card is shown by netstat as "en0," and that is what is specified here:

# ifconfig en0
en0: flags=3<UP,BROADCAST>
inet 197.9.200.7 netmask ffffff00 broadcast 197.9.200.255
#

This machine is configured as 197.009.200.007 and the broadcast address is the reserved 255 ID.

ifconfig can be used to change values as well. For example, to change from a class C network to A, the command would be:

# ifconfig en0 128.9.200.7 netmask 255.255.255.0 broadcast
128.9.200.255

rwho

The rwho utility, much like the who utility but on a larger scale, shows who is logged onto each host machine attached to the network. This can be crucial in verifying that users on other hosts are able to access a machine. Figure 6 shows the output from a typical running of the utility.

By default, the only users shown are those who have not been idle for an hour or longer. Idle time is depicted in minutes at the rightmost column of the display. Constant activity is represented without a time in this column, as with the second listing, user "quick."

Users who have been idle an hour or more are not shown. Using the "-a" option, however, causes all users to be shown, regardless of idle time, as shown in Figure 7.

ruptime

Just as rwho is a who process for the entire network, ruptime is an uptime process for each machine on the network. Figure 8 shows a sample output from this command.

Each machine's host name is given, as well as the amount of time the host has been on the network in terms of days and hours. KINGS, for example, has been on the network 20 days, 11 hours, and 29 minutes. Following that is the number of users and the load. Loads are averages in three columns -- the last minute, the last five minutes, and the last fifteen minutes.

Both ruptime and rwho obtain their information from the rwhod daemon process running on every host machine:

# ps -ef | grep rwho
root  264 1 0 Aug 9 ?  21:57 /usr/etc/rwhod
#

This daemon maintains files that are traditionally kept in the /usr/spool/rwho subdirectory, and it updates the information every three minutes. Thus it is possible for a user to be logged on for two minutes and not show up in an rwho listing if the files have not updated yet.

It is the responsibility of the rwhod daemon process to produce a list of who is on the current machine, broadcast that to all other machines, and listen for other rwhod's broadcasts of their status to this host. This information is kept in data files within the subdirectory -- one for each host. You can check the last update time by listing these files, and you can use the od -- octal dump -- utility to view the contents.

rlogin

Once you've confirmed that a host machine is up and talking to the network (as verified with ruptime), the next step is to test access to the machine. To login on a remote machine as the same user you are on the current machine, use the rlogin utility with a parameter of the remote host name. This establishes a connection as though your terminal were directly connected to the remote host.

The rlogin process first attempts to log you in without a password by checking for entries in the /etc/hosts.equiv file. If it cannot find the file or an entry for you in the file, it next checks the /etc/passwd file to find your $HOME directory, which searches for a .rhosts file that will allow you to login without verification. If it cannot find that, it prompts you for a password.

Giving the password correctly allows you into the system. If you enter the password incorrectly, you must give the login and password combination all over again, but the connection stays live.

Once connected and successfully logged in, you can perform any UNIX command as if you were sitting at a terminal connected to that host. When you are finished with the session, typing exit closes the connection and returns you to your own machine.

To connect to the remote machine as another user (suppose you are user karen_d on this machine, but have an account as karen on the other machine), follow the normal command with "-l" and the name of the user you will be on the other machine. For example:

$ rlogin KINGS -l karen

When a remote login has been established, this will appear in the process table as the rlogind daemon:

# ps -ef | grep rlogind
root 5924  259 0 19:39:28 ttyv00a 0:00 rlogind 197.9.200.12
#

The user name is not given (though it will appear in who listings); instead, the address of the remote host is shown -- in this case 197.9.200.12.

Toggling Back and Forth

When remotely logged into a host, you can jump back and forth between the remote host and the one you are truly sitting at. To come back to your host, enter a tilde (~) and <z>. To return to the remote host, type "exit" on your machine. Figure 9 shows a representation of this.

The tilde is interpreted as the default escape character. If this is inconvenient, you can redefine the escape character by using the "-e" option. For example, to change it to the dollar sign, the syntax is:

rlogin KINGS -e$ -l karen

remsh

One of the most useful methods of testing the status of a host in relation to the network is to remotely run a job on that host. TCP/IP has a utility that allows you to do this without logging in to the remote machine. The name of the utility is dependent upon the vendor who supplied the version, but it will usually be rsh or remsh -- both indicating that you are remotely running a shell process. Here, remsh is used to mean either/or.

For remsh to be successful, the local and remote host must have proper permissions into each other. /etc/hosts.equiv and/or .rhosts files must allow one machine to access another without password verification. Figure 10 demonstrates using the df utility to test this.

If one user does not have permission to run the process remotely, the "-l" option can be used, as with rlogin, to specify another user. If no command is given following the host name

remsh KINGS

then an rlogin session is initiated.

Quotation marks become all important with remsh commands. When run from QUEEN,

remsh KINGS cat this >> that

appends the contents of KINGS:this file to the QUEEN:that file. However,

remsh KINGS "cat this >> that"

appends the contents of KINGS:this file to the KINGS:that file. You will always get what you ask for, so be careful to specify exactly what you want.

Summary

These seven utilities, standard with UNIX networking packages, allow you to verify that all hosts are communicating correctly. They can not only tell you the status of each machine, but also give you the ability to access it, tell who is logged on, and perform remote shell operations.

About the Author

Emmett Dulaney has contributed to several books on UNIX, and is currently a product developer for New Riders Publishing. He can be reached on the Internet as Edulaney@Newrider.mhs.compuserve.com.