Cover V09, I04
Figure 1
Figure 2
Figure 3
Sidebar 1


The Emergence of Convergence

Gilbert Held

During the 1960s, a popular movie was “The Graduate”, starring Dustin Hoffman and Ann Bancroft. In one memorable scene, Bancroft's husband in the movie took Hoffman's character aside and, with one word, told him about the future -- that word was “plastics”. In the year 2000, we have a new word that helps express the future direction of communications and is the focus of this article -- that word is “convergence”. It is used in reference to the evolving integration of voice and data communications into a common networking infrastructure.

This article, the first of a short series covering the integration of voice and data communications, provides an overview of the rationale for convergence along with some of the technical issues that need to be resolved for theory and limited implementation to become a reality. In future articles, I will focus on practical techniques that can be employed to implement convergence by transmitting voice over IP and frame relay, as well as the economics associated with merging voice and data. Similar to any emerging technology, convergence will not instantly become a reality for all organizations, and some organizations may elect to stay off the “convergence train”. Because organizations have different networking requirements, a solution for one organization may be unpalatable for another. However, by becoming conversant with the rationale and technical aspects involving the merger of voice and data networks, you can determine whether convergence will work for you.

Driving Forces

Convergence is relatively new in the field of communications technology. It represents the merging of voice and data transport applications onto a common network infrastructure. The rationale for convergence results from a troika of technological developments, network management issues, and economics. Packet switching was developed long after circuit switching and represents a more efficient mechanism for transporting information, since data is transported only when there is information to transmit. When recently developed low bit-rate coders are used to digitize voice for transmission over a packet network, it is possible to be 24 to 48 times more efficient than with circuit switching. Additionally, if a common network infrastructure can be used (instead of separate voice and data networks), management becomes simplified, the diversity of equipment required for operating separate networks is reduced, and it may be possible to reassign some employees to other duties. When more efficient technology and management are utilized, they create a potential for substantial savings.

While convergence is the merging of voice and data onto a common infrastructure, it does not define how this will occur. Many readers may use different Internet telephony products with a microphone and speaker providing voice I/O, that represents an individual home method of convergence. In a business environment, gateways that translate 64 Kbps PCM-encoded voice into low-bit rate digitized voice for transmission over packet networks are being connected to the corporate PBX. This allows an employee to use a standard telephone and simply dial a new prefix code to be routed through the PBX to the voice gateway. Other methods of convergence include the addition of voice modules to routers and Frame Relay Access Devices (FRADs) that allow either a PBX trunk line or individual telephone sets to be cabled directly into a voice module. To understand the rationale behind convergence, I'll provide an overview of why the existing telephone network is based upon a fixed transmission structure that enables the use of packet networks to provide higher levels of efficiency.

Network Evolution

The most commonly used network today is the Public Switched Telephone Network (PSTN). The PSTN was designed as a voice transport mechanism with twisted-pair wire routed from telephone company offices to subscribers. The twisted-pair wiring was limited to carrying approximately 4 KHz of bandwidth. The 4-KHz bandwidth limits resulted from economics, since frequency division multiplexing (FDM), (which was initially used to place multiple calls on trunks routed between central office switches) can transport more simultaneous calls when the 20-KHz range of frequencies associated with the human voice is filtered into a 4-KHz band.

As telephone use became ubiquitous, so did cable congestion in urban areas. In an attempt to relieve such congestion, telephone companies introduced the T-carrier during the 1960s along with Pulse Code Modulation (PCM), which was used to place 24 digitized conversations on a common cable pair. The resulting technology, which was referred to as a digital channel bank, consisted of several components as shown in Figure 1.

The codec (coder-decoder) contains the logic necessary to convert each analog voice input into a 64-Kbps PCM encoded data stream and to restore a received PCM encoded data stream into its original analog signal. The time division multiplexer (TDM) is responsible for multiplexing 24 digitized voice conversations as well as performing a reverse demultiplexing process. Because the transmission of data multiplexed by time requires a receiver to know the positions of data, a mechanism was required to indicate the location where one frame of multiplexed data ended and another began. This mechanism was fulfilled by the addition of a framing bit to each sequence of 24 8-bit digitized samples, and hopefully explains for many non-communications professionals a great mystery of life (see the sidebar “The Facts about T1”.)

The last major component of the channel bank is a line driver (Figure 1). The line driver initially performed two key functions that were required due to the characteristics of the copper wire used to interconnect channel banks, and the use of repeaters to extend the cabling distance between channel banks. First, the line driver converted the unipolar signaling of the multiplexer into a bipolar signal more suitable for transmission over distances of thousands of feet. Second, because repeaters must periodically receive a pulse to synchronize their operation, they cannot tolerate a long string of 0 bits. Thus, the line driver would periodically insert a “1” bit into the digitized voice data stream that was not noticeable to the human ear when the resulting received data stream was converted back into an analog signal.

The early digital channel bank of the 1960s evolved into the now ubiquitous T1 transmission facility as well as a grouping of 28 T1s, referred to as a T3 transmission line or circuit. Similarly, the existing telephone company infrastructure evolved from electromechanical switches to complex electronic switches, with some capable of supporting hundreds of thousands of simultaneous calls.

When a call is established over the PSTN, electronic switches at various telephone company offices establish a physical path for the call. That path consists of reserved 64-Kbps time slots through different switches and 64-Kbps time slots on different transmission facilities linking or interconnecting each switch for the duration of the call. The path reservation process involves switching time slots on different circuits and is commonly referred to as circuit switching. Once a call is established, the 64-Kbps time slot remains reserved for use by one conversation regardless of the amount of conversation. Because humans typically converse in half-duplex (unless we shout at each other), the reservation of a 64-Kbps time slot is inefficient. Furthermore, we normally pause periodically as we speak. Probably less than 30 to 40 percent of the 64-Kbps time slot is used during a typical voice conversation.

Although the PSTN works well, it is inefficient when compared to the operation of packet networks. Additionally, more modern voice digitization techniques make it possible to use packet networks to transport digitized voice with greater efficiency than is possible with PSTN. To illustrate the efficiency of the PSTN in comparison to voice moving over a packet network, I'll address the general characteristics of a packet network.

Packet Network Operations

Over the past 40 years, several types of packet networks evolved into common use. Although most people are familiar with the Internet, it is not the only commonly used packet network. Two other popular packet networks are X.25 networks and frame relay networks. Although the technical characteristics of each packet network differ, their basic design goal and method of operation are similar. Each was originally developed to transport data efficiently and economically, and requires hardware or software to convert conventional data streams into packets.

Figure 2 illustrates the general operation of a packet network. Terminal devices, which can include dumb data terminals or more modern PCs that cannot directly form packets, are connected to a packet assembler/disassembler (PAD), which forms packets. Here the term PAD is used generically and should not be confused with the X.25 PAD that is specifically designed to convert data streams from a non-packet-forming device into X.25 packets.

Note the mixture of packets flowing between switch 1 and switch 3 in Figure 2. This illustrates one of the key features of packet networks -- their ability to share common transmission facilities among different data sources. Another feature of packet networks is that packets are only formed and transmitted over the packet network when there is data to transmit. Because packets from different data sources share a common transmission facility by time, a packet network represents a type of time division multiplexing. However, instead of using fixed time slots that are employed by channel banks, a packet network only transmits packets when a data source is active. To define how to route each packet, addressing information is added to each one. This technique in which packets are addressed, interleaved by time, and only transmitted when there is data to send is referred to as statistical multiplexing. Now that I've addressed the operation of packet networks, I'll compare circuit switching to packet switching technologies.

Circuit Versus Packet Switching

Figure 3 compares circuit switching and packet switching technologies. Because circuit switching results in the establishment of a time slot that represents a portion of a circuit's capacity for the duration of a conversation, it represents a poor method of bandwidth utilization. In comparison, packet switching allows multiple data sources to share the entire capacity of a transmission facility and represents a better method of bandwidth utilization.

The structure of a packet network requires each node to examine the destination address of each packet as a mechanism to make forwarding decisions. In comparison, a circuit-switched network allocates a channel or time slot from source to destination. Thus, there is a variable delay associated with the use of packet networks that depends upon the number of nodes between source and destination, while circuit switching delay is primarily propagation delay and is minimal. Although network delay has a marginal effect upon the transmission of data, this characteristic of packet networks is a major stumbling block, delaying the convergence of voice and data onto a common packet network infrastructure.

The ability of several data sources to share a packet network's common transmission facilities makes the use of a packet network more efficient than a circuit switched network. While more efficient than a circuit switched network, a packet network has two additional characteristics that, when compared to a circuit-switched network, can be considered as hazardous to the flow of data. First, most packet networks are designed to lose data. For example, when the arrival rate of data at a router exceeds its service rate, it will place data in its buffer. However, if the arrival rate of data continues to exceed the router's service rate, it will be forced to drop packets. In a data transmission environment, the originator can set a timer and if no response to a packet is received when the timer decrements to zero, the packet is simply retransmitted. Unfortunately, real-time digitized voice transported via packets cannot be retransmitted since the delay would result in an awkward piece of reconstructed voice out of time alignment with previously reconstructed voice. Second, most packet networks will only attempt a predefined number of retransmissions. If the timer continues to expire without an acknowledgement, the session may be dropped -- a situation commonly referred to as a planned disconnect.

Although the planned data loss and planned disconnect of packet networks are obstacles to transmitting real-time voice, the biggest obstacle is its variable network delay. The ability to successfully digitize voice and transport it over a packet network so that at its destination it can be reconstructed to sound natural requires an end-to-end delay of under 200 to 250 ms. Otherwise, the delay between two parties in a conversation becomes too great for one person to understand that the other has finished speaking. Many times both parties will attempt to talk at the same time and only by reverting to a “CB” mode of operation and saying “over” can you have an orderly conversation. Because a good portion of the delay associated with the use of packet networks to transmit voice is directly related to the use of voice digitization methods, I'll address that topic next.

Voice Digitization

As previously noted, the PSTN uses PCM to digitize voice. While PCM results in a “toll-quality” reproduced voice, its 64-Kbps bandwidth requirement is inefficient when compared to more modern voice digitization techniques.

Beginning in the 1970s, engineers experimented with a number of techniques to digitize voice into a low bit rate. Through the use of digital signal processor (DSP) chips and various encoding schemes, a family of code excited linear prediction (CELP; pronounced Kleep) algorithms were developed, enabling voice to be digitized at data rates as low as 5.1 Kbps. When such voice digitization algorithms are used in conjunction with packet networks, it is possible for a packet network to theoretically obtain an efficiency 24 to 48 times that of a circuit-switched network. Because traditional telephone companies have billions of dollars invested in equipment developed to support circuit-switched PCM digitized calls, their ability to rapidly migrate to a new technology is rather limited. In comparison, a relatively new communications carrier, Quest Communications, is developing a packet network to route calls over a high-speed fiber optic infrastructure it recently installed across the United States.

Although there are considerable benefits from routing both voice and data over a common network infrastructure, the ability to transport voice over a packet network has certain key restrictions. As previously mentioned, network delay is a major constraint. Another constraint is the delay associated with the use of low bit-rate voice digitization techniques. For example, the algorithm used to produce a 5.1-Kbps voice digitization data rate results in a 30 ms delay. In comparison, PCM has an algorithm delay of a few µs, which is several orders of magnitude less. When you consider the delay associated with low bit-rate voice digitization techniques with the delays associated with routing voice encoded data through a packet network, it is relatively easy to obtain a total delay that adversely affects the ability to reconstruct natural sounding voice at its destination. One area where both equipment vendors and communications carriers are attacking this problem is by attempting to establish a quality of service (QoS) that will guarantee the maximum delay through a packet network.


If you follow the communications-oriented trade press, you have probably received an alphabet soup of mnemonics about establishing a quality of service on the Internet and frame relay networks as well as on ingress and egress LANs. Some of the more common mnemonics include 802.1p, which represents a priority scheme for frames flowing through LAN switches, RSVP (ReSerVation Protocol), which is designed to allocate bandwidth through a TCP/IP network, and Multi Protocol Label Switching (MPLS), which is a technique to route packets more efficiently through a packet network. Each of these techniques, as well as frame relay service level agreements (SLAs) and other techniques, were developed to provide a QoS that will minimize delay through different networks. In doing so, it would then become possible to transmit voice over different types of packet networks and to make convergence a possibility.

Currently, the ability to merge voice and data can be accomplished on private TCP/IP networks and over public and private frame relay networks, since frame relay provides the capability to obtain a predefined network delay. In comparison, the current inability to achieve a QoS on the Internet means that the ability to reconstruct digitized voice may or may not be acceptable. While Internet telephony is certainly growing, I will examine in a future article why the use of the Internet to transport digitized voice may not presently be a viable business solution. How can a communications carrier construct a packet network to transmit digitized voice? How can an organization use communications facilities from a carrier to construct a private packet network to transmit both voice and data? The answer to both questions is “bandwidth”. By carefully monitoring delay and increasing bandwidth when needed, you can ensure that digitized voice can be transported across a packet network and faithfully reconstructed at its destination. For example, since reconstructed voice begins to sound awkward if the one-way delay exceeds between 200 and 250 ms, periodically measuring the delay will help determine the need to obtain a higher speed wide area network transport facility. However, bandwidth is not free, and when considering convergence for your organization, you must also consider the economics associated with convergence.

About the Author

Gilbert Held is an award-winning lecturer and author who specializes in the field of data communications. Some of Gil's recent books include Voice and Data Internetworking, 2nd ed. and Cisco Security Architecture (co-authored with Kent Hundley), both published by McGraw Hill. Gil can be reached via email at