Implementing Voice and Data Convergence

Gilbert Held

In my previous article, “The Emergence of Convergence” (Sys Admin, April 2000), I examined the rationale for integrating voice and data over a common infrastructure. I also noted that there are certain characteristics associated with human speech that cause the transmission of digitized voice to be dependent upon the end-to-end delay or latency associated with the communications infrastructure. In this article, I will focus on a number of practical implementation considerations that can make the integration of voice and data over a common infrastructure a reality. Each of the implementation considerations covered may only reduce end-to-end latency delays by a few to tens of milliseconds. However, the savings add up, and (to paraphrase a quote I like to use) a few milliseconds saved here and there can make convergence a reality!

Delay Considerations

As noted previously, voice begins to become awkward to the recipient when cumulative delays exceed approximately a quarter (250 milliseconds) of a second. When the cumulative delay reaches that value, it is common for one person to begin speaking because he believes the other person has completed his or her portion of a conversation. The resulting conversation is awkward, and both parties may back off in a manner similar to the effect of an Ethernet collision. Unfortunately, the awkward conversation will continue unless the parties begin to employ Citizen Band (CB) strategy and use the word “over”. Because it is unlikely we will perform business conversations in this manner, a preferable solution is to reduce the delay time. Thus, in this article I illustrate methods you can use to minimize delays associated with packets transporting digitized voice.

Data Flow

Understanding the methods that can be used to minimize packet delays requires a frame of reference. Because many people are interested in moving voice over TCP/IP-based networks (including the Internet), I will focus on this networking environment. Figure 1 illustrates the use of the Internet by a company to connect two geographically separated offices. In this example, I assume that at each location the company is using routers with two voice ports that, when connected to the PBX at each location, enable two voice calls to flow along with LAN traffic. Because the Internet is a TCP/IP-based network, the methods discussed will be applicable to transmission via the Internet as well as on private TCP/IP-based networks (or intranets).

Employ Static Routing on Edge Devices

One of the first router configurations you should consider is the routing protocol. For a router functioning as an edge device, you should use a static route that points to the Internet Service Provider's (ISP's) network entry router (if transmitting via the Internet). If your organization operates a private TCP/IP network and has a branch office connected to a router serving several offices from a central location, you should also employ static routing at the branch office. For both situations, the use of static routing eliminates periodic router table updates during which the transmission of data is suspended. Although routing table updates have minimal effect upon data transmission, the periodic delays when routing tables are transferred do add to the latency or delay of packets. Because point-to-point transmission through the Internet or a private TCP/IP network (similar to the network shown in Figure 1) results in traffic passing through two edge devices, employing static routing at both ends doubles the elimination of unnecessary routing table updates and associated delay.

Access List Statement Placement

Most organizations that use routers, especially for Internet connectivity, use access lists as a first line of defense for their internal network. An access list is a series of permit or deny statements that govern the flow of packets. By applying an access list to a router port in an inbound or outbound direction, we form a security policy.

Access lists are processed sequentially, top down. One of the most common statements at the beginning of an access list is an anti-spoofing statement. Such statements comprise a series of deny statements constructed to filter out packets with source addresses commonly used by hackers. Examples of spoofed source addresses include the three blocks of IP addresses for private Internets defined in RFC 1918, as well as the network addresses of the organization operating the router. After anti-spoofing statements, an organization will typically include a series of permit statements to allow packets containing a specific protocol or specific source and destination IP address. Because every access list contains an implicit “deny all” at the end of the list, the previously mentioned access list structure first blocks packets with suspicious source IP addresses, allows packets whose composition is defined, and then filters everything else. A typical access list can consist of 20 to 30 or more statements, with the contents of each packet compared to the parameters of each statement until either a match occurs or the implicit deny all is reached.

The matching of packet data against the contents of a router's access list introduces several delays. First, the packet must be placed in a buffer as its contents are compared against the parameters of the statements in the access list. Thus, a minimum amount of latency is introduced to packet flow since the router must store the packet before determining whether it can forward the packet through an interface. Then during the comparison process, the router must check one or more parameters associated with each access list statement until a match occurs or the end of the list is reached. This introduces a variable packet delay, which further affects the latency.

Although the latest routers are based upon modern high-performance microprocessors, not all organizations can afford to replace every router every few years. This means that the number of comparisons the router makes as it compares fields of the packet header against access list parameters can result in a substantial delay. Because packets containing digitized voice are non-executable and are only reconverted to analog by a voice gateway, there is little advantage to checking such packets for protocol spoofing. Thus, by placing applicable permissions for packets containing digitized voice in access statements at the beginning of a router's access list, you can minimize the processing time.

Authentication and Encryption

In addition to access lists, many organizations employ authentication and encryption to enhance data security. Although it is possible to authenticate the person whose conversation is being transported by packets carrying digitized voice in a manner similar to methods used to authenticate data, is it really necessary? If you are attempting to transport voice over a corporate network and employees commonly talk with one another, it may be possible to use their existing knowledge of one another's voices as an authentication method. While authentication does not effect packet latency, it effects the time required to initiate a digitized call. Thus, eliminating authentication will make the call setup process faster.

In regard to encryption, there are two methods commonly used for packet processing. The first involves encrypting the information field of the packet and re-computing the cycle redundancy check (CRC) in the packet trailer. This technique adds some delay both for the encryption algorithm and processing time associated with the re-computation of the CRC. Additionally, because the encryption hardware or software must look into the packet to locate the information field and CRC field, additional time is required, which further delays the flow of packets.

The second method of packet encryption stores the packet, extracts its header, and forms a new packet by encrypting the old packet and adding the old header to the newly formed packet, which also contains a newly formed CRC. While this encryption technique is slightly easier to perform, it also results in a variable delay based upon the length of the packet. Thus, unless absolutely necessary, you should consider transmitting digitized voice without encrypting the packets containing voice. Even if your organization periodically uses a data network to transport voice, you should note that intercepting packets transmitting digitized voice does not necessarily provide easy to understand information. There are more than 20 voice coders in use, ranging from International Telecommunications Union (ITU) standardized codecs to proprietary coders. Thus, even if a digitized conversation is intercepted, it could prove difficult for a third party to rapidly decipher the conversation. Obviously, if your organization is transporting highly confidential information via voice conversations, the elimination of encryption would not be appropriate.

Codec Selection

One method for reducing packet transit delay is selecting a different voice coder. Generally, there are three metrics involved in the selection of a voice coder: its mean optimum score (MOS), its bandwidth, and its algorithm coding delay.

The MOS is a subjective rating system that grades the quality of reproduced speech generated by different voice coders on a scale from 0 to 5, with 5 being the best. This rating system is subjective because scoring requires one person to talk and another to listen, and humans have different hearing acuity. Despite the subjective scoring, coders with high bandwidth, such as Pulse Code Modulation's (PCM's) 64-Kbps data rate, have a higher MOS than coders operating at a lower data rate. The third metric governing codecs, coding algorithm delay, is inversely proportional to bandwidth and proportional to its MOS score. That is, the lower the bandwidth of the coder, the longer the time required to encode a voice sample into a data packet, while the lower its MOS score. So, if we select a low bandwidth coder in an attempt to support more simultaneous calls over a common communications circuit, we will lower the quality of reconstructed voice as well as increase end-to-end delay or latency.

Commonly used codecs have coding algorithm delays ranging from under 1 ms to approximately 30 ms. For example, PCM and Adaptive Differential PCM (ADPCM), both of which represent toll quality reproduced voice, have an algorithm delay under 1 ms and an MOS typically above 4.2. In comparison, a dual-rate codec, referred to as G.723.1, that operates at either 6.1 or 5.3 Kbps has an MOS of 3.8 and an algorithm delay of 30 ms. In between these codecs are almost two dozen standardized and proprietary codecs that have different MOS scores, bandwidth requirements, and algorithm delays. Because most voice modules used by routers, voice gateways, and similar products support multiple codecs, it may be possible to reduce delay by 20 ms or more by selecting one coder instead of another. Since a voice conversation begins to significantly deteriorate when end-to-end latency exceeds 250 ms, changing a coder can often reduce latency to acceptable levels.

Access Line

The operating rate of the access line from the organization's location into the Internet or private intranet governs ingress and egress delays. Although a 56-Kbps line obviously has a longer delay than a T1 line operating at 1.544 Mbps, the difference between the two with respect to packet delay may not be obvious. Thus, I'll examine the delays associated with different types of access lines.

The delay at ingress and egress to a packet network depends upon both the operating rate of the access line and the length of the packet. Assume each piece of digitized voice is transported via the User Datagram Protocol (UDP) and uses the Real Time Protocol (RTP) for time stamping each packet. This would minimize the length of the packet header and allow a packet transporting a small piece of a digitized voice to have a length of 32 bytes. Then, at a line access rate of 56 Kbps, the ingress and the egress delays become:

32 bytes * 8 bits/byte = .00457 seconds
      56 Kbps

Because a packet entering a network must exit the network, the combined ingress and egress delays for a 32-byte packet become 2 * 0.00457or 9.14 ms. If a T1 line is used, the ingress and egress delays become:

32 bytes * 8 bits/byte = 0.0001666 seconds
      1.536 Mbps

Note that although a T1 line operates at 1.544 Mbps, 8000 bps is used for framing. This limits the data transport capability of the T1 line to 1.536 Mbps. Even so, the packet delay associated with the use of a T1 line is substantially reduced to 0.166 ms for ingress and egress, or a total line access delay of 2 * 0.166 or 0.332 ms, if T1 lines connect two locations to the Internet or a corporate intranet.

Based upon the preceding, you can expect access line delays to range from a high of 9.14 ms when 56-Kbps lines are used, to 0.332 ms when T1 lines are used. Between these two access line operating rates you can select a variety of fractional T1 rates as multiples of n * 64 Kbps, such as 64 Kbps, 128 Kbps, 192 Kbps, and so on. This means that it is also possible to reduce another component of end-to-end delay by adjusting the operating rate of the ingress access line, egress access line, or both.

Network Delay

Perhaps the largest delay that voice packets will encounter are network delays. Such delays result from routers within a network having to examine the header in each packet, query its routing table, and select an output port for transferring the packet based upon the contents of the routing table. While the router is querying its routing table, it must place the packet in a buffer. Additionally, because packets can arrive on several ports that will be forwarded by the router over a common output port, the routing and processing delays associated with each router in a network are random.

Although it is often difficult to predict network delay because it varies throughout the day by changes in network traffic, we have two tools to facilitate delay computations. Those tools are the ping and traceroute (which Microsoft Windows renamed tracert) programs built into most operating systems. You can use ping to obtain the round-trip delay from a router at the egress into a network to a router at the egress from the network. Doing so provides an indication of network delay at a particular time when the ping was executed. To ensure that you obtain an accurate measurement of network delay, you should periodically use the ping utility throughout the business day. Addditionally, consider using the dash T (-t) option included with some versions of ping, which results in continuous pinging at periodic intervals. Then, pipe the results to a file you can import into a spreadsheet for statistical analysis.

Although the round trip delay between many sites on the Internet are under 200 ms, occasionally delays up to 400 ms or more can be experienced when accessing other locations. While the resulting one way delay of 200 ms leaves little room for the other delays mentioned in this article, a relatively long delay is no reason to quit the battle. Instead, you should consider the use of tracert to locate the source of delays through the network.

Through the use of tracert, you may be able to locate one or more routers that contribute a disproportionate amount of network delay on the path between ingress and egress locations. If this occurs, you may be able to contact your ISP or your intranet administrator and request consideration of two possible options. It may be possible to create a route around a dated or heavily utilized router that reduces network delay. If this is not possible, the potential loss of a valued client may entice an ISP to upgrade a router ahead of schedule. Thus, just because the use of ping may indicate a delay that could adversely affect the ability to transport digitized voice over a packet network, you should remember what Yogi Berra would say, “It ain't over till it's over,” and use tracert to determine whether an alternate route or router upgrade could change the situation.

About the Author

Gilbert Held is an award-winning lecturer and author who specializes in the field of data communications. Some of Gil's recent books include Voice and Data Internetworking, 2nd ed. and Cisco Security Architecture (co-authored with Kent Hundley), both published by McGraw Hill. Gil can be reached via email at: gil_held@yahoo.com.