(M)  s i s t e m a   o p e r a c i o n a l   m a g n u x   l i n u x ~/ · documentação · suporte · sobre

  Next Previous Contents

4. Technical info about VoIP

Here we see some important info about VoIP, needed to understand it.

4.1 Overview on a VoIP connection

To setup a VoIP communication we need:

  1. First the ADC to convert analog voice to digital signals (bits)
  2. Now the bits have to be compressed in a good format for transmission: there is a number of protocols we'll see after.
  3. Here we have to insert our voice packets in data packets using a real-time protocol (typically RTP over UDP over IP)
  4. We need a signaling protocol to call users: ITU-T H323 does that.
  5. At RX we have to disassemble packets, extract datas, then convert them to analog voice signals and send them to sound card (or phone)
  6. All that must be done in a real time fashion cause we cannot waiting for too long for a vocal answer! (see QoS section)

                        Base architecture

Voice )) ADC - Compression Algorithm -  Assembling RTP in TCP/IP -----
                                                         ---->      |
                                                         <----      |
Voice (( DAC - Decompress. Algorithm -  Disass. RTP from TCP/IP  -----

4.2 Analog to Digital Conversion

This is made by hardware, typically by card integrated ADC.

Today every sound card allows you convert with 16 bit a band of 22050 Hz (for sampling it you need a freq of 44100 Hz for Nyquist Principle) obtaining a throughput of 2 bytes * 44100 (samples per second) = 88200 Bytes/s, 176.4 kBytes/s for stereo stream.

For VoIP we needn't a 22 kHz bandwidth (and also we needn't 16 bit!): next we'll see other coding used for it.

4.3 Compression Algorithms

Now that we have digital data we may convert it to a standard format that could be quickly transmitted.

PCM, Pulse Code Modulation, Standard ITU-T G.711

  • Voice bandwidth is 4 kHz, so sampling bandwidth has to be 8 kHz (for Nyquist).
  • We represent each sample with 8 bit (having 256 possible values).
  • Throughput is 8000 Hz *8 bit = 64 kbit/s, as a typical digital phone line.
  • In real application mu-law (North America) and a-law (Europe) variants are used which code analog signal a logarithmic scale using 12 or 13 bits instead of 8 bits (see Standard ITU-T G.711).

ADPCM, Adaptive differential PCM, Standard ITU-T G.726

It converts only the difference between the actual and the previous voice packet requiring 32 kbps (see Standard ITU-T G.726).

LD-CELP, Standard ITU-T G.728
CS-ACELP, Standard ITU-T G.729 and G.729a
MP-MLQ, Standard ITU-T G.723.1, 6.3kbps, Truespeech
ACELP, Standard ITU-T G.723.1, 5.3kbps, Truespeech
LPC-10, able to reach 2.5 kbps!!

This last protocols are the most important cause can guarantee a very low minimal band using source coding; also G.723.1 codecs have a very high MOS (Mean Opinion Score, used to measure voice fidelity) but attention to elaboration performance required by them, up to 26 MIPS!

4.4 RTP Real Time Transport Protocol

Now we have the raw data and we want to encapsulate it into TCP/IP stack. We follow the structure:

VoIP data packets
    I,II layers

VoIP data packets live in RTP (Real-Time Transport Protocol) packets which are inside UDP-IP packets.

First, VoIP don't use TCP cause it is too heavy for real time application, so instead UDP (datagram) is used.

In UDP we cannot ordering packets in arrive time (which is a must in VoIP) because there isn't connection idea, each packet is independent from others (datagram concept); so we have to introduce a new protocol, such as RTP, able to manage this.

                    Real Time Transport Protocol
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
   |                           timestamp                           |
   |           synchronization source (SSRC) identifier            |
   |            contributing source (CSRC) identifiers             |
   |                             ....                              |


  • V indicates the version of RTP used
  • P indicates the padding, a byte not used at bottom packet to reach the parity packet dimension
  • X is the presence of the header extension
  • CC field is the number of CSRC identifiers following the fixed header. CSRC field are used, for example, in conference case.
  • M is a marker bit
  • PT payload type

For a complete description of RTP protocol and all its applications see relative RFCs 1889 and 1890 .

4.5 RSVP

There are also other protocols used in VoIP, like RSVP, that can manage Quality of Service (QoS).

RSVP is a signaling protocol that requests a certain amount of bandwidth and latency in every network hop that supports it.

For detailed info about RSVP see the RFC 2205

4.6 Quality of Service (QoS)

We said many times that VoIP applications require a real-time data streaming cause we expect an interactive data voice exchange.

Unfortunately, TCP/IP cannot guarantee this kind of purpose, it just make a " best effort" to do it. So we need to introduce tricks and policies that could manage the packet flow in EVERY router we cross.

So here are:

  1. TOS field in IP protocol to describe type of service: high values indicate low urgency while more and more low values bring us more and more real-time urgency
  2. Queuing packets methods:
    1. FIFO (First in First Out), the more stupid method that allows passing packets in arrive order.
    2. WFQ (Weighted Fair Queuing), consisting in a fair passing of packets (for example, FTP cannot consume all available bandwidth), depending on kind of data flow, typically one packet for UDP and one for TCP in a fair fashion.
    3. CQ (Custom Queuing), users can decide priority.
    4. PQ (Priority Queuing), there is a number (typically 4) of queues with a priority level each one: first, packets in the first queue are sent, then (when first queue is empty) starts sending from the second one and so on.
    5. CB-WFQ (Class Based Weighted Fair Queuing), like WFQ but, in addition, we have classes concept (up to 64) and the bandwidth value associated for each one.
  3. Shaping capability, that allows to limit the source to a fixed bandwidth in:
    1. download
    2. upload
  4. Congestion Avoidance, like RED (Random Early Detection).

For an exhaustive information about QoS see Differentiated Services at IETF.

4.7 H323 Signaling Protocol

H323 protocol is used, for example, by Microsoft Netmeeting to make VoIP calls.

This protocol allow a variety of elements talking each other:

  1. Terminals, clients that initialize VoIP connection. Although terminals could talk together without anyone else, we need some additional elements for a scalable vision.
  2. Gatekeepers, that essentially operate:
    1. address translation service, to use names instead IP addresses
    2. admission control, to allow or deny some hosts or some users
    3. bandwidth management
  3. Gateways, points of reference for conversion TCP/IP - PSTN.
  4. Multipoint Control Units (MCUs) to provide conference.
  5. Proxies Server also are used.

h323 allows not only VoIP but also video and data communications.

Concerning VoIP, h323 can carry audio codecs G.711, G.722, G.723, G.728 and G.729 while for video it supports h261 and h263.

More info about h323 is available at Openh323 Standards , at this h323 web site and at its standard description: ITU H-series Recommendations .

You can find it implemented in various application software like Microsoft Netmeeting , Net2Phone , DialPad , ... and also in freeware products you can find at Openh323 Web Site .

Next Previous Contents