How to Select an Audio CODEC
By Jacques Fric
Managing Director,
Telecom Division
Since our first codec in 1988, ATA has installed several thousands of codecs in 25 countries, many of which are operating 24 hours a day. Selecting a codec is a difficult task which can only be made using objective criteria. This document is based on the combined experience of ATA, and our customers: Broadcast professionals like you.
To selecting your audio codec, you must decide what will be its main use (speech or music), what communication link you will be using (use over leased line, ISDN or POTS), and how you plan to operate it (interactive or broadcast-only programs). Audio quality is everything and is a factor of the analog circuit and digital circuit of the codec and the algorithm it uses. Important standardization efforts are in progress for codec algorithms in order to ensure interoperability between different vendor's equipment. The effect of data communication errors on the quality of the audio communication should not be neglected. According to your budget (Investment and operation) you will find the most suitable product.
As there is no unique answer to your requirements, ATA offers a very wide range of audio codecs from which you can select the one that is the best suited to your needs .
Speech or Music ?
For speech, a G.722 codec would be a good choice as it is a standard, it requires only 64 (56) kbit/s and adds almost no delay (5 ms) allowing live and interactive operation without expensive echo cancellation devices.
For music, a 15 kHz or 20 kHz codecs will be recommended. If live and interactive operation is required, a codec using the very low delay (6 ms) 4 SB CNET Algorithm, for instance, should be used. In case of one-way program transmission, an algorithm adding significant delay can be used (ISO MPEG Layer 1, 2, 3 or CNET TDAC for instance). These Algorithms have a delay between 60 ms and 250 ms.
Audio Quality
The audio quality depends on two main factors : the Audio quality of the linear codec included in the codec, and the type and the implementation of the algorithm used.
Audio Quality of the linear Codec Analog Interface
Do not be impressed by marketing hype such as 18 bits or 20 bits AD or DA converters as well as extravagant S/N ratio. Most of the time, the real performance of a codec is far under the capability of such AD / DA converters. The only way to check the quality of the linear codec analog interface is to use a distortion meter to test the S/N+THD versus frequency from 20 Hz to 20 kHz at full capacity of the codec, the level versus frequency, and the S/N+THD depending of level (Fig 1a, 1b, 1c) .
The distortion meter inserts a pure frequency at the Input of the codec and measures at the Output both the level and the ratio between the energy outside of the originating frequency (wide band noise and the harmonics, due to non linearity of the audio chain) and the energy of the originating frequency (see fig 2a, 2b). The S/N+THD is the Ratio between Signal and Noise plus Total Harmonic Distortion. This data is the most important quality criteria. As an example, in case of an analog interface, a measured S/N+THD better than 80 dB from 20 Hz to 20 kHz is excellent.
.
Physical interface must be compliant with profession standard (XLR, balanced). Isolation with built in transformers would be an advantage as it will provide real high voltage galvanic isolation. Only a few equipment include such transformer as it is difficult to built very low distortion transformers (especially at low frequencies such as 20 Hz). Check also the audio interface's overvoltage protection (accidental Voltage outage should not damage the codec).
Digital interface
In case of AES/EBU interface, the audio chain being digital from end to end, the quality must be better than 110 dB at full capacity over the entire bandwidth. So AES/EBU should preferably be used whenever possible. Physical interface must be compliant with profession standard (XLR, balanced). Beware that AES/EBU interfaces at the encoder need to work either in synchronous mode if the terminal can be synchronized (the encoder synchronizes the terminal with a clock derived from the network) or in asynchronous mode (the clock of the transmitting terminal is transmitted in asynchronous mode through the network to the receiving terminal). Check that your audio codec supports such feature. Be aware that in asynchronous mode, clock adjustment is performed by discrete phase shift of the transmitted clock and this may degrade the quality. So test the quality in real operation.
Audio Filters
They must be flat in the bandwidth ( less than 0.1 dB ripple) and steep out of the bandwidth for providing efficient "out of band" noise rejection.(in ATA codecs, for instance, a 235 th order digital filter is used for this purpose).
Electromagnetic Compatibility ( EMC ), Electrostatic Discharge Immunity and Safety standard Compliance An example for EMC test is given fig 3.:

Algorithms used
Compression algorithms are designed to reduce the amount of data exchanged on line without statistical "audible" degradation. Do not expect miracles of these algorithms. Since they rely on statistical properties, they are statistically good or excellent. Bit reduction is possible by eliminating both redundancy (more efficient encoding) in the audio signal and irrelevancy of some information which can not be heard according to the human ear capability (by using a mathematical psycho acoustic model of the human hearing). Except for the CCITT G.722 algorithm which is optimized for speech and very well defined, the other algorithms are most of the time a subject of controversy.
Standardization of 15 kHz and 20 kHz algorithms is a recent effort, so you will still find on the market some equipments including proprietary algorithms. Low delay (less than 6 ms) algorithms such as CNET 4SB and APT, for example, are not standardized. These algorithms are very useful for application requiring low delay (live and interactive programs). These algorithms are "time" oriented and bit reduction relies mainly on redundancy reduction by a more appropriate encoding of a highly time correlated signal (4 bands frequency splitting, for adaptation of bit allocation to required accuracy, noise shaping and prediction are used). The quality is good, and as they rely on a very simple psycho acoustic model. Multiple encoding are also possible (up to 3) without severe degrading. But they do not allow ratio "compression" better than 4.
Other algorithms rely on a more complex psycho acoustic model for irrelevancy removal and allow higher bit ratio compression (up to 12). They are more frequency oriented or hybrid (time/frequency). Redundancy in stereo audio information (joint stereo) can also be used in two modes : - Intensity stereo which is used in layer 1 and 2 (the stereo image may be modified, so be cautious about this feature as it may degrade some audio signals more than it improve others). - Main and Side stereo which is used in layer 3 (the stereo image is good, but the improvement is lower). In both. cases, simple or joint stereo processing is dynamically operated according to redundancy measurement of the current audio signal.
Psycho acoustic models define the audio information masked by high audio signal (to be removed) as well as the accuracy required for the non masked audio signal (the noise generated by the quantizing process is itself masked !). Illustration of masking process is given in Fig 4. Again be aware of the limits of the psycho acoustic model. The capabilities and the limits of human earring are not completely understood as it is a complex process involving not only the ear but also the brain. So the models used are simplified approximations of the "Ideal objective model". Stereo human hearing is different from mono, so the psycho acoustic model to be used depends on the type of signal, not to mention the differences in hearing between people depending on age. Improvement are expected in the domain of psycho acoustic and joint stereo processing. Beware that the most efficient the algorithm is in bit reduction, the worst it is in multiple encoding. So for high compression bit rate (15 kHz or 20 kHz at 64 kbit/s or less) it is not recommended to perform multiple encoding/decoding. This not because these algorithms are bad, but because they are so good that they took into account all the error margin at the first encoding: the decoded Audio signal is different of the original signal in such a way that the difference is just under the audible degradation ! If you do it twice, the second time the error will statistically be out of range of the masking threshold and will be audible.
Standardized Algorithms
The ISO committee has now standardized algorithms and specifications of use of these algorithms in the Audio MPEG layer 1, 2, 3 document. Layer 1 may be used for contribution (bit rate 192 kbit/s per mono channel) and layer 2 (bit rate 128 kbit/s per channel) for distribution. Layer 3 (bit rate 64 kbit/s and less) is more devoted to News and current Affairs but is also acceptable for final broadcast of musical programs.
Beware that for ISO layer 1,2,3, only the format of the frame is defined, so only the decoder is standardized. The encoder is not fully defined, so the quality depends on the quality of the implementation. So when vendors using algorithm (Layer 1, 2 ,3) refer to tests performed by the ISO or the CCIR for the quality of their Codecs, nothing proves that the quality of their implementation is equivalent, even if they use the same algorithm. The tests performed by the ISO or CCIR is a benchmark of the best implementation for each algorithm in order to compare different algorithms. The results are the upper limit in quality for each algorithm. The only way to test the quality of a Codec is to perform an subjective audio test with a pair of Codecs in real operation (over ISDN for instance).
The ISO Committee has defined a method for performing these tests. Short ABC sequences (15 seconds), were A is always the original cut of music and B either the original or the Coded, C being the opposite of B. Various types of music are tested as well as speech (male and female). Some pieces are very well known for their demanding quality such as triangle, glockenspiel, and especially harpsichord and "Suzanne Vega".
Trained listeners are preferred, especially for good algorithms, for rating the cuts with a 5 steps scale : - 5 excellent - 4 Good - 3 Acceptable - 2 Poor - 1 Bad. A good Algorithm must have no rating under "4" and an average of "4.5". An Excellent Algorithm must have no rating under "4.5" and an average of "4.75".
An Example of test report is given fig 5. Obviously this depends on the bit rate used for the test. The same algorithm does not have the same rating for various bit rates. The lower the bit rate is, the worse rating it gets. So, ask for the bit rate used for a rated algorithm. An alternative is to use special test equipments that are taking into account psycho acoustic properties in their measurement (FHG MNR for instance). As this equipment uses a special psycho acoustic model, according to remarks before about reliability of such model, this gives you only an estimate of the quality. But performing such tests is easier than formal ISO compliant subjective tests.
So depending of your application, select the most appropriate Codec and algorithm at the bit rate requested by your quality requirement. Take into account economic consideration such as investment and transmission cost. The lower the bit rate, the less expensive the cost of transmission. So most of the time, the right choice will be a compromise between quality and cost.
Ruggedness to transmission errors
CCITT G722 (7.5 kHz) and CNET 4 SB (15 kHz) algorithms are highly resistant. Other algorithms (ISO MPEG layer 1, 2, 3 and CNET TDAC) may be very sensitive to errors and need error protection capability. All digital Telecommunication networks are good but not perfect. For instance a BER (Bit Error Rate) of 0.0001% which is good enough for data transmission, may cause very disturbing audio pulsed noise every ten seconds !! In a live transmission of an audio program this is not acceptable. In the CNET TDAC algorithm a 493/511 BCH FEC (Forward Error Correction) is used allowing real time correction of up two single errors among 511 bits. This allows correct operation on medium quality telecom networks (BER < 0.01%). In ISO MPEG layer 1, 2, 3 Algorithms, there is provision for Error detection only (CRC). This is not suitable for transmission, so the CMTT Committee added some specifications for error correction (Flexible Reed Solomon FEC) allowing correction for correct operation on medium quality digital networks. This feature must be included in ISO MPEG layer 1, 2, 3 audio codecs.
Interoperability with other vendors Audio Codecs
In case of connection with other vendors Audio Codecs check first that all these Codecs comply with international standards. International standards are edited by CCITT (G722, H221, H242) and CMTT (Q58) for transmission and ISO (Layer 1, 2, 3) for audio coding algorithm. Most equipments claim compliance with these standards, but most of them are not able to communicate with other vendor's Audio Codecs. If interoperability is required, perform tests with several real Audio Codecs from different vendors.
Interoperability on 64 kbit/s networks
In many cases the resulting bit stream of the encoding process is greater than 64 kbit/s (128, 192, 256, 384 kbit/s). When such bit stream has to be transmitted over 64 kbit/s networks (ISDN for instance), a standard must be used for defining the way to split and recombine the bit streams. As the CMTT has now approved a standard for such operation, it is recommended to use equipments including such feature.
Ergonomy
The user's dream is to get a single box including everything inside. For instance, for transmission over ISDN, some products will need 2 Codecs 1U chassis along with an Inverse MUX and up to four ISDN terminal adapters, with all cables for connecting them to the codec and at least a terminal or a PC for configuring all the equipments. The ATA HiFi Scoop 3 ISDN combines all these feature in a single 1 U chassis. No need for a terminal, operation is possible from the front keypad /LCD interface. The only connections are the telephone RJ45 plugs for ISDN and the XLR plugs for Audio.
| HOME | PROFILE | PRODUCTS | NEWS | DEALERS | TECH. SUPPORT | CONTACT | SITE MAP |
| © Copyright 2007 ATA Audio Corporation, Inc.. All rights reserved. |