Standard UNIX Network Diagnostic Tools
Emmett Dulaney
Once the software has been installed, the appropriate
cards added,
and the cabling completed, it's time to test the network
to see if
all components are communicating as they should. A variety
of diagnostic
tools can be purchased from third parties, but the UNIX
operating
system contains many that do a remarkably competent
job, without requiring
the outlay of additional capital.
The UNIX network diagnostic utilities are included with
almost every
vendor's UNIX networking package. They are useful not
only on a first
install, but also at any time when an administrator
might need to
check the status of the connections. There are seven
such utilities,
or tools: ping, netstat, ifconfig, rwho,
ruptime, rlogin, and remsh. The last four,
all of which start with an "r," work only
when two or more
UNIX stations are involved, and not with connected processors
running
another operating system.
ping
Submarines use sonar waves to test the water and see
if any other
objects, including other submarines, are out there.
This process is
known as "pinging," and the creators of the
utility thought
the analogy close enough to borrow its name.
Typically, /usr/etc or /var/etc contains ping
whose purpose in life is to test network applications
and connections.
ping reads addresses and entries from the /etc/hosts
file and is able to communicate with host machines listed
there. The
first entry in this file is always:
127.0.0.1 me loopback localhost
This provides an internal address to the network card.
You can test the status of that card by using ping me.
This
creates a loop wherein a signal is sent through the
internal hardware
to verify that all is working properly.
Two versions of ping are presently in use. The first
continues
to send signals until interrupted; the second performs
a quick operation
and reports the outcome. A sample session with the first
looks like:
# /usr/etc/ping me
PING me: 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0. time=2 60th of sec
64 bytes from 127.0.0.1: icmp_seq=1. time=1 60th of sec
64 bytes from 127.0.0.1: icmp_seq=2. time=1 60th of sec
64 bytes from 127.0.0.1: icmp_seq=3. time=2 60th of sec
64 bytes from 127.0.0.1: icmp_seq=4. time=1 60th of sec
64 bytes from 127.0.0.1: icmp_seq=5. time=1 60th of sec
64 bytes from 127.0.0.1: icmp_seq=6. time=1 60th of sec
64 bytes from 127.0.0.1: icmp_seq=7. time=1 60th of sec
64 bytes from 127.0.0.1: icmp_seq=8. time=1 60th of sec
64 bytes from 127.0.0.1: icmp_seq=9. time=1 60th of sec
64 bytes from 127.0.0.1: icmp_seq=10. time=1 60th of sec
(<Ctrl><D> pressed by user):
----me PING Statistics----
11 packets transmitted, 11 packets received, 0% packet loss
round-trip (60th of sec) min/avg/max = 1/1/2
#
For the second, a sample session looks like:
# /usr/etc/ping me
PING me: is alive
In both instances, all is as should be -- the packets
were sent and received, with zero packet loss. To test
another machine
on the network, specify its name in place of "me":
# /usr/etc/ping QUEEN
PING QUEEN: 56 data bytes
64 bytes from 197.9.200.17: icmp_seq=0. time=2 60th of sec
64 bytes from 197.9.200.17: icmp_seq=1. time=1 60th of sec
64 bytes from 197.9.200.17: icmp_seq=2. time=2 60th of sec
64 bytes from 197.9.200.17: icmp_seq=3. time=1 60th of sec
(<Ctrl><D> pressed by user):
----SUN7 PING Statistics----
4 packets transmitted, 4 packets received, 0% packet loss
round-trip (60th of sec) min/avg/max = 1/1/2
#
In this example, known host QUEEN is able to be reached
and no errors occurred during the packet testing. Errors
that can
occur include the inability to send packets due to the
machine being
down, or an unknown host on the network, as shown in
the following
example:
# /usr/etc/ping SUN7
PING SUN7: 56 data bytes
(<Ctrl><D> pressed by user after several seconds):
----SUN7 PING Statistics----
11 packets transmitted, 0 packets received, 100% packet loss
#
In this instance, the machine is down, so nothing is
received or echoed back. With the second ping version,
the
result would indicate that the machine is not alive.
If you are
using the version of ping that continues testing until
interrupted, you can specify the number of bytes to
be sent in each
packet, and the number of packets to be sent, causing
ping
to automatically time out at the desired count:
# /usr/etc/ping me 64 4
PING me: 64 data bytes
72 bytes from 127.0.0.1: icmp_seq=0. time=1 60th of sec
72 bytes from 127.0.0.1: icmp_seq=1. time=1 60th of sec
72 bytes from 127.0.0.1: icmp_seq=2. time=1 60th of sec
72 bytes from 127.0.0.1: icmp_seq=3. time=1 60th of sec
----me PING Statistics----
4 packets transmitted, 4 packets received, 0% packet loss
round-trip (60th of sec) min/avg/max = 1/1/2
#
Apart from providing information about whether or not
the other host can be communicated with, ping reports
two
other items. "icmp_seq" is the sequence number
in which the
packets are arriving at the host. If they are not in
numeric order,
then packets are being scrambled and there could be
a hardware conflict.
"time" is the amount of time in milliseconds
that it is taking
for a round trip of the packet, from send to receive.
Netstat
ping is useful for checking the status of one host.
netstat
gives you information on how the whole network is interacting
with
this host. netstat can be called with several options
that
provide more detailed information:
-A -- adds associated protocol control
block to the display
-a -- shows all network interfaces
-i -- displays configured network interfaces
-n -- shows the output in numeric form
-r -- gives a routing table display, where
applicable
-s -- displays statistics
netstat is a very powerful diagnostic tool, as Figures
1 through
5 show. Figure 1 presents a standard invocation of netstat,
while Figure 2 changes the display from alphanumeric
representations
of the hosts to straight numeric. Figure 3 lists the
routing tables
-- in this instance, no router is used, thus the only
entry other
than "me" is for the host itself. Also note
the interface
of "en0," symbolizing an ethernet card\connection.
Figure 4 shows all the networked interfaces. The processes
that have
been highlighted are network daemons that are running.
Notice the
states that processes are in when netstat -a is invoked.
ESTABLISHED
means a connection is presently in progress. LISTEN
indicates
the host is waiting for a connection request, and TIME_WAIT
denotes that the host is waiting for sufficient time
to pass to ensure
the remote has received the acknowledgment to close
its connection.
Other possible states, many of which sound very similar
to each other,
are:
LASTACK -- the host has sent and received
a request to close the connection and is waiting for
acknowledgment
from the remote.
CLOSING -- the connection is closing.
SYNSENT -- the machine is waiting for a
connection open request from the remote.
SYNRECEIVED -- the open connection request
has been sent and received and now the host is awaiting
request acknowledgment.
CLOSEWAIT -- the machine is awaiting a
close request from the local host after receiving such
a request from
the remote.
FINWAIT1 -- the local has requested a close
and is now waiting for the same from the remote.
FINWAIT2 -- the local is waiting for a
connection termination request from the remote.
Figure 5 is probably the most useful of all. It shows
the statistics
for network transactions that have occurred on this
host. The statistical
categories are described in the following paragraphs.
The "ip" section deals with incoming packets.
Numbers greater
than zero can indicate problems with the internal boards
or with the
cabling leading to the host.
The "icmp" section pertains to the Internet
Control Message
Protocol, which sends error or control messages from
one host to another.
In this example, no errors have occurred.
The "tcp" section contains important information
including
"acks for unsent data," as well as out-of-order
packets, packets
discarded for any reason, and the number of packets
received after
close. Packets received after close are lost for good
-- they should
never occur, but often do in the real world. "Packets"
sent
represents the total number of packets TCP has sent
to the network
board, while "packets received" is the number
sent from the
board to TCP. "Connection requests" is the
number of times
this host has accessed another, while "connection
accepts"
is the number of times other hosts have accessed this
one. "Connections
closed" is an unpredictable number and so not useful
as a reference.
The one thing to note is that the number should always
exceed the
number of connections established.
ifconfig
ifconfig, which usually resides in the /usr/etc or
/var/etc subdirectories, shows the value of the network
interface,
and can also be used to set it. To see the value, you
must specify
the network card. In the example in Figure 3, an ethernet
card is
shown by netstat as "en0," and that is what
is specified here:
# ifconfig en0
en0: flags=3<UP,BROADCAST>
inet 197.9.200.7 netmask ffffff00 broadcast 197.9.200.255
#
This machine is configured as 197.009.200.007 and the
broadcast address is the reserved 255 ID.
ifconfig can be used to change values as well. For example,
to change from a class C network to A, the command would
be:
# ifconfig en0 128.9.200.7 netmask 255.255.255.0 broadcast
128.9.200.255
rwho
The rwho utility, much like the who utility but on
a larger scale, shows who is logged onto each host machine
attached
to the network. This can be crucial in verifying that
users on other
hosts are able to access a machine. Figure 6 shows the
output from
a typical running of the utility.
By default, the only users shown are those who have
not been idle
for an hour or longer. Idle time is depicted in minutes
at the rightmost
column of the display. Constant activity is represented
without a
time in this column, as with the second listing, user
"quick."
Users who have been idle an hour or more are not shown.
Using the
"-a" option, however, causes all users to
be shown, regardless
of idle time, as shown in Figure 7.
ruptime
Just as rwho is a who process for the entire network,
ruptime is an uptime process for each machine on the
network. Figure 8 shows a sample output from this command.
Each machine's host name is given, as well as the amount
of time the
host has been on the network in terms of days and hours.
KINGS, for
example, has been on the network 20 days, 11 hours,
and 29 minutes.
Following that is the number of users and the load.
Loads are averages
in three columns -- the last minute, the last five minutes,
and
the last fifteen minutes.
Both ruptime and rwho obtain their information from
the rwhod daemon process running on every host machine:
# ps -ef | grep rwho
root 264 1 0 Aug 9 ? 21:57 /usr/etc/rwhod
#
This daemon maintains files that are traditionally kept
in the /usr/spool/rwho subdirectory, and it updates
the information
every three minutes. Thus it is possible for a user
to be logged on
for two minutes and not show up in an rwho listing if
the
files have not updated yet.
It is the responsibility of the rwhod daemon process
to produce
a list of who is on the current machine, broadcast that
to all other
machines, and listen for other rwhod's broadcasts of
their
status to this host. This information is kept in data
files within
the subdirectory -- one for each host. You can check
the last update
time by listing these files, and you can use the od
--
octal dump -- utility to view the contents.
rlogin
Once you've confirmed that a host machine is up and
talking to the
network (as verified with ruptime), the next step is
to test
access to the machine. To login on a remote machine
as the same user
you are on the current machine, use the rlogin utility
with
a parameter of the remote host name. This establishes
a connection
as though your terminal were directly connected to the
remote host.
The rlogin process first attempts to log you in without
a
password by checking for entries in the /etc/hosts.equiv
file.
If it cannot find the file or an entry for you in the
file, it next
checks the /etc/passwd file to find your $HOME directory,
which searches for a .rhosts file that will allow you
to login
without verification. If it cannot find that, it prompts
you for a
password.
Giving the password correctly allows you into the system.
If you enter
the password incorrectly, you must give the login and
password combination
all over again, but the connection stays live.
Once connected and successfully logged in, you can perform
any UNIX
command as if you were sitting at a terminal connected
to that host.
When you are finished with the session, typing exit
closes the connection
and returns you to your own machine.
To connect to the remote machine as another user (suppose
you are
user karen_d on this machine, but have an account as
karen
on the other machine), follow the normal command with
"-l"
and the name of the user you will be on the other machine.
For example:
$ rlogin KINGS -l karen
When a remote login has been established, this will
appear
in the process table as the rlogind daemon:
# ps -ef | grep rlogind
root 5924 259 0 19:39:28 ttyv00a 0:00 rlogind 197.9.200.12
#
The user name is not given (though it will appear in
who listings); instead, the address of the remote host
is
shown -- in this case 197.9.200.12.
Toggling Back and Forth
When remotely logged into a host, you can jump back
and forth between
the remote host and the one you are truly sitting at.
To come back
to your host, enter a tilde (~) and <z>. To return
to the remote
host, type "exit" on your machine. Figure
9 shows a representation
of this.
The tilde is interpreted as the default escape character.
If this
is inconvenient, you can redefine the escape character
by using the
"-e" option. For example, to change it to
the dollar sign,
the syntax is:
rlogin KINGS -e$ -l karen
remsh
One of the most useful methods of testing the status
of a host in
relation to the network is to remotely run a job on
that host. TCP/IP
has a utility that allows you to do this without logging
in to the
remote machine. The name of the utility is dependent
upon the vendor
who supplied the version, but it will usually be rsh
or remsh
-- both indicating that you are remotely running a shell
process.
Here, remsh is used to mean either/or.
For remsh to be successful, the local and remote host
must
have proper permissions into each other. /etc/hosts.equiv
and/or .rhosts files must allow one machine to access
another
without password verification. Figure 10 demonstrates
using the df
utility to test this.
If one user does not have permission to run the process
remotely,
the "-l" option can be used, as with rlogin,
to specify
another user. If no command is given following the host
name
remsh KINGS
then an rlogin session is initiated.
Quotation marks become all important with remsh commands.
When run from QUEEN,
remsh KINGS cat this >> that
appends the contents of KINGS:this file to the
QUEEN:that file. However,
remsh KINGS "cat this >> that"
appends the contents of KINGS:this file to the
KINGS:that file. You will always get what you ask for,
so
be careful to specify exactly what you want.
Summary
These seven utilities, standard with UNIX networking
packages, allow
you to verify that all hosts are communicating correctly.
They can
not only tell you the status of each machine, but also
give you the
ability to access it, tell who is logged on, and perform
remote shell
operations.
About the Author
Emmett Dulaney has contributed to several books on
UNIX, and is currently
a product developer for New Riders Publishing. He can
be reached on the
Internet as Edulaney@Newrider.mhs.compuserve.com.
|