The Short Life and Hard Times of An Internet Packet
Dorian Deane
The Header
As more and more LAN users beg to be connected to the
rest of the
world, it becomes increasingly important for systems
administrators,
most of whom are probably already familiar with the
TCP/IP protocol
suite, to understand how that suite operates in wide
area networks.
To show the relationship between TCP/IP and internetwork
communications,
this article follows the life of a single internet packet
from its
construction, to long voyage, to final arrival at its
destination.
Where possible along the way, I have tried to add practical
information
about telnet, a TCP-based application most commonly
used to provide
a terminal-like interface on remote hosts.
The Synchronization
Though data communications protocols are often presented
in an operating
system-neutral and hardware-neutral manner, I prefer
to stick with
UNIX and the ubiquitous IEEE 802.3 ethernet, sacrificing
the needs
of the few for the good of the many. I occasionally
refer to /etc/rc.local,
which is found in most BSD-based systems, so if you
run a System V
shop, you'll have to make the appropriate translations
to your own
rc files.
To understand how the TCP/IP protocol stack translates
to the construction
of an internet packet, it's useful to think of the process
as protocols
encapsulated within protocols (see Figure 1). Each protocol
adds its
own header to the protocol before it -- ethernet adds
a header
to an IP packet, IP adds a header to the TCP packet
it receives, and
TCP adds a header to the "data," which may
really be data
from a user's perspective or may be yet another protocol
above that
data. As a case in point, ethernet sees the IP packet
as nothing more
than data. There is an exception to this rule, however:
each protocol
layer has to know which protocol it has just encapsulated,
so that
when its counterpart at the other end has removed its
own header,
it knows where to send the remaining data. The ethernet
header has
a "Type" field for this purpose and IP defines
a "Protocol"
field. TCP uses ports, which I'll discuss later.
Keep in mind that, despite such common usage as "TCP
does this
and IP does that," such references should be protocol
definitions
rather than descriptions of actual software or hardware.
We are, for
the most part, only addressing protocol definitions
here -- not
protocol implementations. It would therefore be more
correct to say,
"TCP defines this functionality and IP defines
that functionality."
For reasons of convenience, I've retained the common
usage.
The Data
This article traces the very first TCP packet produced
by telnet
as it tests the willingness of a remote host to respond.
The first
TCP "packet" does not carry any real telnet
data --
rather, telnet merely asks a lower-level library routine
to
open up a connection and then waits for a reply from
TCP before beginning
its own negotiations. Remember that telnet is an application
like rlogin (another program to provide remote login
service)
or ftp (the file transfer program, not the protocol),
and sits
logically on top of TCP. (If you have a UNIX machine
but no network,
you can see what telnet looks like by telneting to your
own
machine through the loopback interface with "telnet
127.0.0.1"
-- Unless your machine is non-standard, you will see
a login prompt
on your local machine.)
When a user enters "telnet nic.ddn.mil," a
message like "Trying
192.112.36.5 ..." will appear on the screen and
then, with luck,
"Connected to nic.ddn.mil." The IP number
(192.112.36.5) means
that telnet was able to find the internet address for
nic.ddn.mil
and "Connected" means that your host has opened
up a TCP connection
with nic, but the remote telnet server (telnetd) has
not yet
responded. If you had instead gotten a message like
"nic.ddn.mil:
unknown host," it would not necessarily mean the
host doesn't
exist, it might only mean that telnet was unable to
find the
IP address in the name databases your machine uses.
A name database
is usually a flat file like /etc/hosts, or a name service
such
as NIS (Network Information Services, which used to
be Sun's Yellow
Pages) or DNS (the Domain Name System). Thus, an "unknown
host"
message may mean that you have an incomplete /etc/hosts
file,
a problem with a name server, a problem with your local
IP router,
a recalcitrant remote router, or any combination thereof.
In the Internet protocol stack, telnet is the application
program,
and as such, it doesn't care much about the address
you give it. Telnet
just looks for the IP number in the appropriate name
translation database,
and if it can verify that there's an existing IP address
for the name,
it passes its initial query to TCP. Most versions of
telnet
accept either a logical name or the IP address itself.
TCP, on the other hand, is not much concerned with IP
numbers, since
it's not responsible for routing. TCP keeps track of
port numbers,
and the two end points which comprise the telnet session
are uniquely
identified by a port and host number pair. If you look
at /etc/services,
you will see various "well-known" TCP port
numbers. These
ports are not ports in the normal sense of the word;
they do not refer
to absolute I/O addresses or any other sort of physical
interface.
Rather, TCP uses the numbers in /etc/services to identify
the
upper-level protocol, such as Simple Mail Transfer Protocol
(25) and
Network News Transfer Protocol (119). When a telnet
client
wants to open a connection with the telnet daemon (i.e.,
telnet
server), it usually sends its initial query to the port
it finds in
/etc/services, which should be 23 (to talk directly
to the
SMTP server instead, try "telnet hostname 25,"
and, so you
don't get stuck there, type "help" and a carriage
return to
see how to proceed).
Once the message, "Trying 192.112.36.5 ..."
is printed on
the screen, telnet blocks, waiting for a reply from
TCP. While
TCP was designed to accept data at this point, most
versions of telnet
cautiously ask TCP to establish a connection before
beginning their
option negotiations or presenting a login prompt. This
first TCP packet,
then, consists only of the header. The header contains
the host IP
number and TCP port number pair, as well as fields that
allow TCP
to keep track of which packets have been sent, which
packets have
been acknowledged, and which have not. A checksum added
by TCP on
the originating host allows for detection of corrupted
data. TCP also
reassembles packets that may have been fragmented by
intermediate
routers on the Internet and sets timers in order to
keep track of
potentially lost packets or to stop sending in the case
of an overburdened
network. In addition, TCP keeps track of packet ordering
-- not
a small task given that most packets inserted into the
Internet go
through multiple routers by no consistent path and therefore
can easily
arrive out of sequence. This is an area of great complexity
and TCP
is, without doubt, the most complicated layer in the
protocol stack.
Among other things, TCP is responsible for making applications
such
as telnet believe they have their own private connection
to
a remote host, and this requires some real sleight-of-hand
when dealing
with packet-switched networks such as the Internet,
where even the
packet fragments can arrive out of order. For the purposes
of this
article, however, it's useful just to think of TCP as
the manager
of a virtual circuit -- that circuit being, for now,
a telnet
connection between two hosts. In more formal terms,
TCP is responsible
for a reliable, end-to-end, full duplex, virtual circuit
between two
hosts.
After having added its header, TCP passes the packet
to IP. Both telnet
and TCP can now sit back and wait for IP to come back
with a reply.
TCP never talks directly to remote hosts; instead, TCP
talks to IP.
In this scenario using UNIX and ethernet, however, IP
doesn't talk
to remote hosts either; technically, IP talks to ethernet
(and ARP).
Unlike TCP, which tries to manage everything (keeping
a finger on
each packet not yet acknowledged, verifying header and
data integrity
with checksums, resending packets when necessary, etc.),
the IP module
on your host is happy as long as things appear to be
working, regardless
of reality. After deciding that the packet is destined
for a site
beyond the local network, IP adds its own header and
sends the packet
to ethernet, telling it the ethernet address of the
local router.
IP doesn't care about the packet order. IP doesn't even
care if the
packet got there.
Again in formal terms, IP provides a connectionless,
unreliable, best-effort
delivery system. It is connectionless because it doesn't
try to present
the layer above it with the appearance of an end-to-end
circuit the
way TCP does. It is unreliable because it doesn't make
any guarantees
about what happens to the data after passing it on to
ethernet.
The IP address of the default route which the local
host will use
to direct packets to remote sites is usually added with
a command
in /etc/rc.local like
route add default nn.nn.nn.nn h
where the n's represent an IP address (therefore
each "nn" represents some number between 1
and 254,
inclusive) and "h" represents a "metric"
or
hop count to the default router.
How does IP know whether the destination address is
local or not?
Easy! IP learns which is the network portion of its
own address by
examining the all-important subnet mask. This mask is
usually set
with ifconfig in /etc/rc.local or found by sending an
ICMP "address mask request" (see Figure 2).
As commonly used,
an octet of all binary ones (hex 255) indicates a network
or subnet
portion and all zeroes defines the host portion of the
address. Note
that if a Class B subnet is connected via an IP router,
then a Class
C address mask would be used to indicate that packets
destined for
other subnets must use a router.
IP is used for packets destined for hosts on the local
network as
well as for routing data across networks. Although you
might think
you could get by with just TCP and ethernet for local
deliveries,
in the current design, TCP talks to logical ports and
IP handles the
internet addresses. The extra level of indirection gives
you much
more flexibility. ARP uses ethernet broadcasts to find
the hardware
(ethernet) address of the target machine given nothing
more than the
internet address. ARP is a separate protocol, used to
solve just this
one problem, and is not strictly part of the TCP or
IP stack. But
without ARP, it would be necessary to keep a table of
IP and ethernet
mappings for each host on the local network. This is
done in some
cases, but it is not the preferred method.
On the assumption that all machines know their own ethernet
address,
an ARP request is broadcast to every host on the local
net. In ethernet,
as in IP, a broadcast means a ones-filled address field,
normally
shown in hexadecimal notation as ff:ff:ff:ff:ff:ff.
Each host reads
the broadcast ethernet frame, sees from the "Type"
field
that this is an ARP request, and passes it to the ARP
module, which
then decides if it should reply. If ARP sees its own
host IP address,
it sends a message to the initial requestor and includes
its own ethernet
address as data. The ARP request packet includes the
IP and ethernet
addresses of the requestor to facilitate the reply.
Once the reply
is received, the host will hold the IP-to-ethernet mapping
in a cache
for a certain time until it is either reconfirmed or
flushed as being
out of date. On most machines, "arp -a" shows
the contents
of the ARP cache. Even if your network is connected
to subnets via
an IP router, you don't see ethernet address of any
remote machines;
ethernet addresses are meaningless outside of a local
network.
Using tcpdump (public domain software available from
your favorite
archive site) or an equivalent such as etherfind (on
Suns),
you can see the ARP requests sent out each time a higher-level
command
like IP needs to get a hardware address. Again, neither
the application
(telnet, in this case), nor TCP, nor IP involves itself
in
such details -- telnet and TCP wait on IP, which waits
for
a reply from ARP, which, for the telnet session shown
here,
replies with
oingo# ./tcpdump arp
14:30:03.372965 arp who-has boingo.littlenet.com tell \
oingo.littlenet.com
14:30:03.373425 arp reply boingo.littlenet.com is-at 8:0:20:fc:b6:d5
The reply comes immediately after the request, telling
oingo.littlenet.com that boingo's ethernet address is
8:0:20:fc:b6:d5. To try this yourself, run a utility
such as
tcpdump, and from another window (or another machine),
try
to telnet to a host not already listed in the arp cache.
To retrace briefly the path of the telnet packet: at
this point,
it has been passed from telnet to TCP to IP. IP used
the subnet
mask to verify that nic.ddn.mil was not a local address.
Then
IP asked ARP for the ethernet address of the default
router. Once
IP knows the ethernet address of the router, it passes
the packet
off to ethernet. The ethernet driver slaps its own header
in front
of the data, adds a checksum as a trailer to the packet,
and (finally!)
sends the packet out on the wire. Each machine on the
LAN sees the
packet and reads the ethernet address, but only one
machine, the router,
sees its own address in the destination field and looks
at the packet
further. (The rest of the machines throw the packet
away, never having
interrupted the operating system.) The router's ethernet
interface
reads the ethernet frame into local memory, verifies
data integrity
with the checksum trailer, strips off the ethernet address,
and copies
the remaining IP packet into the queue.
The next routing decision is often the simplest. For
many corporate
LANs, there is only one route for leaving the subnet.
From the telnet
host, the packet will travel to that single router.
The IP layer in
the router sees from the IP address in the header that
the packet
is indeed a machine it knows how to reach so it will
continue with
the processing rather than discarding the packet or
generating
an ICMP message telling the host to use some other router.
Routers
keep track of which networks they can reach by talking
to their neighbors,
using protocols like Routing Information Protocol (on
relatively simple
networks) or Shortest Path First (for larger networks),
or any of
several others.
The packet is now queued for injection onto the serial
line. When
its turn comes, it is passed on to the next router,
usually via a
serial line protocol such as SDLC or HDLC.
From there (assuming a very simple network), the packet
will be passed
over a serial line to the router of the Internet service
provider
-- an organization such as BARRnet, SURAnet, or Alternet.
Once
there, any number of paths through the Internet are
possible, depending
on which routes are least congested, most expensive
to reach, and
so on.
Using traceroute (another excellent public domain program),
you can follow the route taken by a packet as it forges
its way across
the internet, as in Figure 3 (in which I have changed
some of the
addresses for reasons of privacy).
You can see that packets do not necessarily follow a
direct path to
their destination. Not only that, no single packet is
guaranteed to
follow the same path as its predecessor.
Again, it is between these routers that protocols like
RIP, IGP, SPF,
and BGP are used, but these will not be discussed here.
For the purposes
of this article, the Internet as it exists beyond the
router that
sends the packet out can remain a black box -- better
yet, a black
cloud. This is not a particularly harmful assumption.
So long as you
have a good understanding of the local protocols and
how they interact
with your local routers, you can yell with slightly
more confidence
at whoever provides your link further downline.
The Trailer
In the case of the sample telnet packet, the last router
-- the
one directly attached to the ddn.mil network -- sends
the
packet on to its ethernet driver, possibly having first
made an ARP
request in order to discover nic's (nic.ddn.mil's) hardware
address. The ethernet driver adds its own header (just
as on the local
side, described earlier) and sends the packet out onto
the ethernet
wire. Because ethernet is a broadcast medium, all machines
on the
ddn.mil network (or likely a subnet thereof) will see
the packet
but only host "nic" will save it and pass
it to IP.
Depending on what it finds in the ethernet header's
"Type"
field, it might pass the packet to some other protocol,
such as Xerox
Network Standard used by 3Com, Novell's IPX, or any
of several dozen
others.
The IP module also verifies the integrity of its own
header, using
the checksum field produced on the originating side,
and after checking
the "Protocol" field, passes the packet on
to TCP.
TCP, project manager that it is, acknowledges the packet
and tries
to open a connection to telnet's well-known port number.
This
awakens the telnet daemon, telnetd, which will then
either fork a copy of itself to manage a new telnet
session
or, for any of a number of reasons, refuse the connection.
A connection
might be refused if the telnet service is not recognized
by
inetd (something of a grandfather daemon), or if the
router
on the far end disallows packets from certain addresses.
It is not uncommon to see "Connected to [host name]"
on the
screen and then never get a login prompt. telnet doesn't
have
much to do with that first "Connected" message
-- all the
message means is that TCP was able to send a single
packet and receive
a single packet in reply. So if you are left hanging
after the "Connected"
message, it is likely that telnetd or inetd on the remote
end is having a problem.
The End of the Journey
Sometimes, when I dial into a computer at the office
and from there
telnet into a machine down the road, I am amazed at
the nearly
instantaneous response time -- especially when I consider
the route
my single keystroke has taken before it can be echoed
on my screen.
In my situation, our East Coast LAN's connection to
the Internet is
via a 56 Kbps leased line to another LAN in Mountain
View, California.
When I dial in from home, I use a VT-100 terminal emulation
program.
Ignoring the whole issue of how many hundreds of CPU
cycles the machine
at my desk uses just to pass the keystroke to the emulation
program,
assume, for now, my program sends the character I have
typed directly
(more or less) to the asynchronous port and on to the
modem. The modem
sends the character at 9600 bps to a terminal server
(asynchronous
gateway) where I have previously opened a telnet connection
(in character mode) to a UNIX machine on the same LAN.
The terminal
server encapsulates the character in an ethernet frame
and passes
it across the LAN to the host. The host passes it up
the protocol
stack -- ethernet to IP to TCP to telnet daemon -- and
from there it goes "across" the operating
system to the local
half of another telnet connection, which I opened with
the
machine down the street. This telnet program sends the
character
back down the stack and onto the same LAN to reach our
router. The
router sends it to another router in Mountain View,
where it is re-encapsulated
within an ethernet frame and put back on the ethernet
(in Mountain
View). From there, it is picked up by another router
which then passes
the lone keystroke to a router owned by BARRNet.
Typically, on its way back across the country to the
machine down
the road from me, the packet will pass through no less
than half a
dozen routers before reaching yet another ethernet LAN,
which passes
the keystroke to telnetd running on that machine. So
far, four
telnets are involved: one on the terminal server into
which
I originally dialed, one telnetd server on the local
machine
at my office, another telnet client on the same local
machine,
and another telnetd at the machine down the road. On
that machine,
telnetd echoes the character, which passes back down
the protocol
stack, onto the ethernet, over to the router... In other
words, the
character then works its way across the United States
twice as it
traces the same jagged "triangle" back to
my office machine.
And from there it must travel back across the ethernet
to the terminal
server, out the serial port and across the phone line
to reach my
screen at home. And, if it is after-hours or during
the weekend when
congestion is not bad, the single keystroke makes the
entire round
trip so fast it appears to my eye to be nearly instantaneous.
Most of this traffic could have been avoided if I had
used telnet
in line-buffered mode so that each line would be sent
as one packet
only after I hit RETURN, but this makes many applications
difficult
or impossible to use. As a result, when telnet is used
as a
remote terminal interface, it almost always defaults
to character
mode, thus causing hundreds of "tinygrams"
to fly across the
network, spawning context switches here and there along
the way.
Further Reading
Comer, Douglas. Internetworking with TCP/IP:
Principles, Protocols, and Architecture. Englewood Cliffs,
NJ:
Prentice Hall, 1991. ISBN 0-13-468505-9.
This is Volume I. You don't really need Volume II unless
you plan to write networking code. This excellent book,
along with
the second volume, is considered by many to be the TCP/IP
bible.
Hedrick, Charles. Introduction to the Internet
Protocols. CSFG, Rutgers University.
An excellent overview and best of all, it's free. You
can get it via anonymous ftp from several dozen archive
sites
around the country, usually under the name of tcp-ip-intro.doc
for the ASCII version and tcp-ip-intro.ps for the Postscript
version.
Malamud, Carl. Analyzing Sun Networks. Van
Nostrand Reinhold, 1992. ISBN 0-442-00366-8.
The first half presents a very good explanation of
the TCP/IP protocol suite; only in its second half does
it become
somewhat Sun-specific.
Leffler, McKusick, Karels, Quarterman. Design
and Implementation of the 4.3BSD Unix Operating System.
Addison
Wesley, 1990. ISBN 0-201-06196-1.
A good book to have around, particularly if you administer
BSD-based machines. Some consider this the Berkeley
UNIX bible, though
I wouldn't go quite that far.
Request for Comments: For anyone really interested
in the details, the RFCs are a good place to look. RFC
854/5 describes
telnet. RFC 793 describes TCP (save this one for last
-- it's fairly
complicated). rfc-index lists all the RFCs. These are
available
for anonymous ftp from several sites, but nic.ddn.mil
is a good one.
About the Author
Dorian Deane works in the network configuration group
for House
Information Systems, supporting the U.S. House of Representatives.
You may contact him at ddeane@oxygen.house.gov.
|