4.1 Encoding-Independent Recommendations

Connected: An Internet Encyclopedia
4.1 Encoding-Independent Recommendations

Up: Connected: An Internet Encyclopedia
Up: Requests For Comments
Up: RFC 1890
Up: 4. Audio
Prev: 4. Audio
Next: 4.2 Guidelines for Sample-Based Audio Encodings

4.1 Encoding-Independent Recommendations

For applications which send no packets during silence, the first packet of a talkspurt (first packet after a silence period) is distinguished by setting the marker bit in the RTP data header. Applications without silence suppression set the bit to zero.

The RTP clock rate used for generating the RTP timestamp is independent of the number of channels and the encoding; it equals the number of sampling periods per second. For N-channel encodings, each sampling period (say, 1/8000 of a second) generates N samples. (This terminology is standard, but somewhat confusing, as the total number of samples generated per second is then the sampling rate times the channel count.)

If multiple audio channels are used, channels are numbered left-to- right, starting at one. In RTP audio packets, information from lower-numbered channels precedes that from higher-numbered channels. For more than two channels, the convention followed by the AIFF-C audio interchange format should be followed [1], using the following notation:

   l    left
   r    right
   c    center
   S    surround
   F    front
   R    rear

   channels    description                 channel
                               1     2     3     4     5     6
   ___________________________________________________________
   2           stereo          l     r
   3                           l     r     c
   4           quadrophonic    Fl    Fr    Rl    Rr
   4                           l     c     r     S
   5                           Fl    Fr    Fc    Sl    Sr
   6                           l     lc    c     r     rc    S
   Samples for all channels belonging to a single sampling instant must
   be within the same packet. The interleaving of samples from different
   channels depends on the encoding. General guidelines are given in
   Section 4.2 and 4.3.

The sampling frequency should be drawn from the set: 8000, 11025, 16000, 22050, 24000, 32000, 44100 and 48000 Hz. (The Apple Macintosh computers have native sample rates of 22254.54 and 11127.27, which can be converted to 22050 and 11025 with acceptable quality by dropping 4 or 2 samples in a 20 ms frame.) However, most audio encodings are defined for a more restricted set of sampling frequencies. Receivers should be prepared to accept multi-channel audio, but may choose to only play a single channel.

The following recommendations are default operating parameters. Applications should be prepared to handle other values. The ranges given are meant to give guidance to application writers, allowing a set of applications conforming to these guidelines to interoperate without additional negotiation. These guidelines are not intended to restrict operating parameters for applications that can negotiate a set of interoperable parameters, e.g., through a conference control protocol.

For packetized audio, the default packetization interval should have a duration of 20 ms, unless otherwise noted when describing the encoding. The packetization interval determines the minimum end-to- end delay; longer packets introduce less header overhead but higher delay and make packet loss more noticeable. For non-interactive applications such as lectures or links with severe bandwidth constraints, a higher packetization delay may be appropriate. A receiver should accept packets representing between 0 and 200 ms of audio data. This restriction allows reasonable buffer sizing for the receiver.

Next: 4.2 Guidelines for Sample-Based Audio Encodings

Connected: An Internet Encyclopedia
4.1 Encoding-Independent Recommendations