Cover V02, I01
Article
Figure 1
Figure 2
Figure 3
Sidebar 1
Sidebar 2

jan93.tar


The Short Life and Hard Times of An Internet Packet

Dorian Deane

The Header

As more and more LAN users beg to be connected to the rest of the world, it becomes increasingly important for systems administrators, most of whom are probably already familiar with the TCP/IP protocol suite, to understand how that suite operates in wide area networks. To show the relationship between TCP/IP and internetwork communications, this article follows the life of a single internet packet from its construction, to long voyage, to final arrival at its destination. Where possible along the way, I have tried to add practical information about telnet, a TCP-based application most commonly used to provide a terminal-like interface on remote hosts.

The Synchronization

Though data communications protocols are often presented in an operating system-neutral and hardware-neutral manner, I prefer to stick with UNIX and the ubiquitous IEEE 802.3 ethernet, sacrificing the needs of the few for the good of the many. I occasionally refer to /etc/rc.local, which is found in most BSD-based systems, so if you run a System V shop, you'll have to make the appropriate translations to your own rc files.

To understand how the TCP/IP protocol stack translates to the construction of an internet packet, it's useful to think of the process as protocols encapsulated within protocols (see Figure 1). Each protocol adds its own header to the protocol before it -- ethernet adds a header to an IP packet, IP adds a header to the TCP packet it receives, and TCP adds a header to the "data," which may really be data from a user's perspective or may be yet another protocol above that data. As a case in point, ethernet sees the IP packet as nothing more than data. There is an exception to this rule, however: each protocol layer has to know which protocol it has just encapsulated, so that when its counterpart at the other end has removed its own header, it knows where to send the remaining data. The ethernet header has a "Type" field for this purpose and IP defines a "Protocol" field. TCP uses ports, which I'll discuss later.

Keep in mind that, despite such common usage as "TCP does this and IP does that," such references should be protocol definitions rather than descriptions of actual software or hardware. We are, for the most part, only addressing protocol definitions here -- not protocol implementations. It would therefore be more correct to say, "TCP defines this functionality and IP defines that functionality." For reasons of convenience, I've retained the common usage.

The Data

This article traces the very first TCP packet produced by telnet as it tests the willingness of a remote host to respond. The first TCP "packet" does not carry any real telnet data -- rather, telnet merely asks a lower-level library routine to open up a connection and then waits for a reply from TCP before beginning its own negotiations. Remember that telnet is an application like rlogin (another program to provide remote login service) or ftp (the file transfer program, not the protocol), and sits logically on top of TCP. (If you have a UNIX machine but no network, you can see what telnet looks like by telneting to your own machine through the loopback interface with "telnet 127.0.0.1" -- Unless your machine is non-standard, you will see a login prompt on your local machine.)

When a user enters "telnet nic.ddn.mil," a message like "Trying 192.112.36.5 ..." will appear on the screen and then, with luck, "Connected to nic.ddn.mil." The IP number (192.112.36.5) means that telnet was able to find the internet address for nic.ddn.mil and "Connected" means that your host has opened up a TCP connection with nic, but the remote telnet server (telnetd) has not yet responded. If you had instead gotten a message like "nic.ddn.mil: unknown host," it would not necessarily mean the host doesn't exist, it might only mean that telnet was unable to find the IP address in the name databases your machine uses. A name database is usually a flat file like /etc/hosts, or a name service such as NIS (Network Information Services, which used to be Sun's Yellow Pages) or DNS (the Domain Name System). Thus, an "unknown host" message may mean that you have an incomplete /etc/hosts file, a problem with a name server, a problem with your local IP router, a recalcitrant remote router, or any combination thereof.

In the Internet protocol stack, telnet is the application program, and as such, it doesn't care much about the address you give it. Telnet just looks for the IP number in the appropriate name translation database, and if it can verify that there's an existing IP address for the name, it passes its initial query to TCP. Most versions of telnet accept either a logical name or the IP address itself.

TCP, on the other hand, is not much concerned with IP numbers, since it's not responsible for routing. TCP keeps track of port numbers, and the two end points which comprise the telnet session are uniquely identified by a port and host number pair. If you look at /etc/services, you will see various "well-known" TCP port numbers. These ports are not ports in the normal sense of the word; they do not refer to absolute I/O addresses or any other sort of physical interface. Rather, TCP uses the numbers in /etc/services to identify the upper-level protocol, such as Simple Mail Transfer Protocol (25) and Network News Transfer Protocol (119). When a telnet client wants to open a connection with the telnet daemon (i.e., telnet server), it usually sends its initial query to the port it finds in /etc/services, which should be 23 (to talk directly to the SMTP server instead, try "telnet hostname 25," and, so you don't get stuck there, type "help" and a carriage return to see how to proceed).

Once the message, "Trying 192.112.36.5 ..." is printed on the screen, telnet blocks, waiting for a reply from TCP. While TCP was designed to accept data at this point, most versions of telnet cautiously ask TCP to establish a connection before beginning their option negotiations or presenting a login prompt. This first TCP packet, then, consists only of the header. The header contains the host IP number and TCP port number pair, as well as fields that allow TCP to keep track of which packets have been sent, which packets have been acknowledged, and which have not. A checksum added by TCP on the originating host allows for detection of corrupted data. TCP also reassembles packets that may have been fragmented by intermediate routers on the Internet and sets timers in order to keep track of potentially lost packets or to stop sending in the case of an overburdened network. In addition, TCP keeps track of packet ordering -- not a small task given that most packets inserted into the Internet go through multiple routers by no consistent path and therefore can easily arrive out of sequence. This is an area of great complexity and TCP is, without doubt, the most complicated layer in the protocol stack. Among other things, TCP is responsible for making applications such as telnet believe they have their own private connection to a remote host, and this requires some real sleight-of-hand when dealing with packet-switched networks such as the Internet, where even the packet fragments can arrive out of order. For the purposes of this article, however, it's useful just to think of TCP as the manager of a virtual circuit -- that circuit being, for now, a telnet connection between two hosts. In more formal terms, TCP is responsible for a reliable, end-to-end, full duplex, virtual circuit between two hosts.

After having added its header, TCP passes the packet to IP. Both telnet and TCP can now sit back and wait for IP to come back with a reply. TCP never talks directly to remote hosts; instead, TCP talks to IP. In this scenario using UNIX and ethernet, however, IP doesn't talk to remote hosts either; technically, IP talks to ethernet (and ARP).

Unlike TCP, which tries to manage everything (keeping a finger on each packet not yet acknowledged, verifying header and data integrity with checksums, resending packets when necessary, etc.), the IP module on your host is happy as long as things appear to be working, regardless of reality. After deciding that the packet is destined for a site beyond the local network, IP adds its own header and sends the packet to ethernet, telling it the ethernet address of the local router. IP doesn't care about the packet order. IP doesn't even care if the packet got there.

Again in formal terms, IP provides a connectionless, unreliable, best-effort delivery system. It is connectionless because it doesn't try to present the layer above it with the appearance of an end-to-end circuit the way TCP does. It is unreliable because it doesn't make any guarantees about what happens to the data after passing it on to ethernet.

The IP address of the default route which the local host will use to direct packets to remote sites is usually added with a command in /etc/rc.local like

route add default nn.nn.nn.nn h

where the n's represent an IP address (therefore each "nn" represents some number between 1 and 254, inclusive) and "h" represents a "metric" or hop count to the default router.

How does IP know whether the destination address is local or not? Easy! IP learns which is the network portion of its own address by examining the all-important subnet mask. This mask is usually set with ifconfig in /etc/rc.local or found by sending an ICMP "address mask request" (see Figure 2). As commonly used, an octet of all binary ones (hex 255) indicates a network or subnet portion and all zeroes defines the host portion of the address. Note that if a Class B subnet is connected via an IP router, then a Class C address mask would be used to indicate that packets destined for other subnets must use a router.

IP is used for packets destined for hosts on the local network as well as for routing data across networks. Although you might think you could get by with just TCP and ethernet for local deliveries, in the current design, TCP talks to logical ports and IP handles the internet addresses. The extra level of indirection gives you much more flexibility. ARP uses ethernet broadcasts to find the hardware (ethernet) address of the target machine given nothing more than the internet address. ARP is a separate protocol, used to solve just this one problem, and is not strictly part of the TCP or IP stack. But without ARP, it would be necessary to keep a table of IP and ethernet mappings for each host on the local network. This is done in some cases, but it is not the preferred method.

On the assumption that all machines know their own ethernet address, an ARP request is broadcast to every host on the local net. In ethernet, as in IP, a broadcast means a ones-filled address field, normally shown in hexadecimal notation as ff:ff:ff:ff:ff:ff. Each host reads the broadcast ethernet frame, sees from the "Type" field that this is an ARP request, and passes it to the ARP module, which then decides if it should reply. If ARP sees its own host IP address, it sends a message to the initial requestor and includes its own ethernet address as data. The ARP request packet includes the IP and ethernet addresses of the requestor to facilitate the reply. Once the reply is received, the host will hold the IP-to-ethernet mapping in a cache for a certain time until it is either reconfirmed or flushed as being out of date. On most machines, "arp -a" shows the contents of the ARP cache. Even if your network is connected to subnets via an IP router, you don't see ethernet address of any remote machines; ethernet addresses are meaningless outside of a local network.

Using tcpdump (public domain software available from your favorite archive site) or an equivalent such as etherfind (on Suns), you can see the ARP requests sent out each time a higher-level command like IP needs to get a hardware address. Again, neither the application (telnet, in this case), nor TCP, nor IP involves itself in such details -- telnet and TCP wait on IP, which waits for a reply from ARP, which, for the telnet session shown here, replies with

oingo# ./tcpdump arp
14:30:03.372965 arp who-has boingo.littlenet.com tell \

oingo.littlenet.com
14:30:03.373425 arp reply boingo.littlenet.com is-at 8:0:20:fc:b6:d5

The reply comes immediately after the request, telling oingo.littlenet.com that boingo's ethernet address is 8:0:20:fc:b6:d5. To try this yourself, run a utility such as tcpdump, and from another window (or another machine), try to telnet to a host not already listed in the arp cache.

To retrace briefly the path of the telnet packet: at this point, it has been passed from telnet to TCP to IP. IP used the subnet mask to verify that nic.ddn.mil was not a local address. Then IP asked ARP for the ethernet address of the default router. Once IP knows the ethernet address of the router, it passes the packet off to ethernet. The ethernet driver slaps its own header in front of the data, adds a checksum as a trailer to the packet, and (finally!) sends the packet out on the wire. Each machine on the LAN sees the packet and reads the ethernet address, but only one machine, the router, sees its own address in the destination field and looks at the packet further. (The rest of the machines throw the packet away, never having interrupted the operating system.) The router's ethernet interface reads the ethernet frame into local memory, verifies data integrity with the checksum trailer, strips off the ethernet address, and copies the remaining IP packet into the queue.

The next routing decision is often the simplest. For many corporate LANs, there is only one route for leaving the subnet. From the telnet host, the packet will travel to that single router. The IP layer in the router sees from the IP address in the header that the packet is indeed a machine it knows how to reach so it will continue with the processing rather than discarding the packet or generating an ICMP message telling the host to use some other router. Routers keep track of which networks they can reach by talking to their neighbors, using protocols like Routing Information Protocol (on relatively simple networks) or Shortest Path First (for larger networks), or any of several others.

The packet is now queued for injection onto the serial line. When its turn comes, it is passed on to the next router, usually via a serial line protocol such as SDLC or HDLC.

From there (assuming a very simple network), the packet will be passed over a serial line to the router of the Internet service provider -- an organization such as BARRnet, SURAnet, or Alternet. Once there, any number of paths through the Internet are possible, depending on which routes are least congested, most expensive to reach, and so on.

Using traceroute (another excellent public domain program), you can follow the route taken by a packet as it forges its way across the internet, as in Figure 3 (in which I have changed some of the addresses for reasons of privacy).

You can see that packets do not necessarily follow a direct path to their destination. Not only that, no single packet is guaranteed to follow the same path as its predecessor.

Again, it is between these routers that protocols like RIP, IGP, SPF, and BGP are used, but these will not be discussed here. For the purposes of this article, the Internet as it exists beyond the router that sends the packet out can remain a black box -- better yet, a black cloud. This is not a particularly harmful assumption. So long as you have a good understanding of the local protocols and how they interact with your local routers, you can yell with slightly more confidence at whoever provides your link further downline.

The Trailer

In the case of the sample telnet packet, the last router -- the one directly attached to the ddn.mil network -- sends the packet on to its ethernet driver, possibly having first made an ARP request in order to discover nic's (nic.ddn.mil's) hardware address. The ethernet driver adds its own header (just as on the local side, described earlier) and sends the packet out onto the ethernet wire. Because ethernet is a broadcast medium, all machines on the ddn.mil network (or likely a subnet thereof) will see the packet but only host "nic" will save it and pass it to IP. Depending on what it finds in the ethernet header's "Type" field, it might pass the packet to some other protocol, such as Xerox Network Standard used by 3Com, Novell's IPX, or any of several dozen others.

The IP module also verifies the integrity of its own header, using the checksum field produced on the originating side, and after checking the "Protocol" field, passes the packet on to TCP.

TCP, project manager that it is, acknowledges the packet and tries to open a connection to telnet's well-known port number. This awakens the telnet daemon, telnetd, which will then either fork a copy of itself to manage a new telnet session or, for any of a number of reasons, refuse the connection. A connection might be refused if the telnet service is not recognized by inetd (something of a grandfather daemon), or if the router on the far end disallows packets from certain addresses.

It is not uncommon to see "Connected to [host name]" on the screen and then never get a login prompt. telnet doesn't have much to do with that first "Connected" message -- all the message means is that TCP was able to send a single packet and receive a single packet in reply. So if you are left hanging after the "Connected" message, it is likely that telnetd or inetd on the remote end is having a problem.

The End of the Journey

Sometimes, when I dial into a computer at the office and from there telnet into a machine down the road, I am amazed at the nearly instantaneous response time -- especially when I consider the route my single keystroke has taken before it can be echoed on my screen.

In my situation, our East Coast LAN's connection to the Internet is via a 56 Kbps leased line to another LAN in Mountain View, California. When I dial in from home, I use a VT-100 terminal emulation program. Ignoring the whole issue of how many hundreds of CPU cycles the machine at my desk uses just to pass the keystroke to the emulation program, assume, for now, my program sends the character I have typed directly (more or less) to the asynchronous port and on to the modem. The modem sends the character at 9600 bps to a terminal server (asynchronous gateway) where I have previously opened a telnet connection (in character mode) to a UNIX machine on the same LAN. The terminal server encapsulates the character in an ethernet frame and passes it across the LAN to the host. The host passes it up the protocol stack -- ethernet to IP to TCP to telnet daemon -- and from there it goes "across" the operating system to the local half of another telnet connection, which I opened with the machine down the street. This telnet program sends the character back down the stack and onto the same LAN to reach our router. The router sends it to another router in Mountain View, where it is re-encapsulated within an ethernet frame and put back on the ethernet (in Mountain View). From there, it is picked up by another router which then passes the lone keystroke to a router owned by BARRNet.

Typically, on its way back across the country to the machine down the road from me, the packet will pass through no less than half a dozen routers before reaching yet another ethernet LAN, which passes the keystroke to telnetd running on that machine. So far, four telnets are involved: one on the terminal server into which I originally dialed, one telnetd server on the local machine at my office, another telnet client on the same local machine, and another telnetd at the machine down the road. On that machine, telnetd echoes the character, which passes back down the protocol stack, onto the ethernet, over to the router... In other words, the character then works its way across the United States twice as it traces the same jagged "triangle" back to my office machine. And from there it must travel back across the ethernet to the terminal server, out the serial port and across the phone line to reach my screen at home. And, if it is after-hours or during the weekend when congestion is not bad, the single keystroke makes the entire round trip so fast it appears to my eye to be nearly instantaneous.

Most of this traffic could have been avoided if I had used telnet in line-buffered mode so that each line would be sent as one packet only after I hit RETURN, but this makes many applications difficult or impossible to use. As a result, when telnet is used as a remote terminal interface, it almost always defaults to character mode, thus causing hundreds of "tinygrams" to fly across the network, spawning context switches here and there along the way.

Further Reading

Comer, Douglas. Internetworking with TCP/IP: Principles, Protocols, and Architecture. Englewood Cliffs, NJ: Prentice Hall, 1991. ISBN 0-13-468505-9. This is Volume I. You don't really need Volume II unless you plan to write networking code. This excellent book, along with the second volume, is considered by many to be the TCP/IP bible.

Hedrick, Charles. Introduction to the Internet Protocols. CSFG, Rutgers University. An excellent overview and best of all, it's free. You can get it via anonymous ftp from several dozen archive sites around the country, usually under the name of tcp-ip-intro.doc for the ASCII version and tcp-ip-intro.ps for the Postscript version.

Malamud, Carl. Analyzing Sun Networks. Van Nostrand Reinhold, 1992. ISBN 0-442-00366-8. The first half presents a very good explanation of the TCP/IP protocol suite; only in its second half does it become somewhat Sun-specific.

Leffler, McKusick, Karels, Quarterman. Design and Implementation of the 4.3BSD Unix Operating System. Addison Wesley, 1990. ISBN 0-201-06196-1. A good book to have around, particularly if you administer BSD-based machines. Some consider this the Berkeley UNIX bible, though I wouldn't go quite that far.

Request for Comments: For anyone really interested in the details, the RFCs are a good place to look. RFC 854/5 describes telnet. RFC 793 describes TCP (save this one for last -- it's fairly complicated). rfc-index lists all the RFCs. These are available for anonymous ftp from several sites, but nic.ddn.mil is a good one.

About the Author

Dorian Deane works in the network configuration group for House Information Systems, supporting the U.S. House of Representatives. You may contact him at ddeane@oxygen.house.gov.