Along with some spirited rhetoric, they have published some TWELP 600 samples (including source). The comparison, especially in the 600 bit/s range, is very useful to my work.
I’ve extracted a random subset of the 600 bit/s a_eng.wav samples, broken up into a small chunks to make them easier to compare. Have a listen, and see what you think:
|Sample||Source||MELP 600e||Codec 2 700c||TWELP 600|
The samples do have quite a bit of background noise. The usual approach for noisy samples is to use a noise suppression algorithm first, e.g. we use the Speex noise suppression in FreeDV. However it’s also a test of the codecs robustness to background noise, so I didn’t perform any noise suppression for the Codec 2 samples.
I am broadly in agreement with their results. Using the samples provided, the TWELP codec appears to be comparable to MELP 2400, with Codec 2 2400 a little behind both. This is consistent with other Codec 2 versus MELP/AMBE comparisons at 2400 bits/s. That’s not a rate I have been focussing on, most of my work has been directed at lower rates required for HF Digital voice.
I think – for these samples – their 600 bit/s codec also does better than Codec 2 700C, but not by a huge margin. Their results support our previous findings that Codec 2 is as good as (or even a little better) than MELP 600e. It does depend on the samples used, as I will explain below.
DSP Innovations have done some fine work in handling non-speech signals, a common weakness with speech codecs in this range.
As to claims of superior technology, and “30 year old technology”:
- MELP 2400 was developed in the 1990’s, and DSP Innovations own results show similar speech quality, especially at 2400 bits/s.
- AMBE is in widespread use, and uses a very similar harmonic sinusoidal model to Codec 2.
- The fundamental work on speech compression was done in the 1970s and 80’s, and much of what we use today (e.g. in your mobile phone) is based on incremental advances on that.
- As any reader of this blog will know, Codec 2 has been under continual development for the past decade. I haven’t finished, still plenty of “DSP Innovation” to come!
While a fine piece of engineering, TWELP isn’t in a class of it’s own – it’s still a communications quality speech codec in the MELP/AMBE/Codec 2 quality range. They have not released any details of their algorithms, so they cannot be evaluated objectively by peer review.
PESQ and Perceptual evaluation of speech quality
DSP Innovations makes extensive use of the PESQ measure, for both this study and for comparisons to other competitors.
Speech quality is notoriously hard to estimate. The best way is through controlled subjective testing but this is expensive and time consuming. A utility to accurately estimate fine differences in speech quality would be a wonderful research tool. However in my experience (and the speech coding R&D community in general), such a tool does not exist.
The problem is even worse for speech codecs beneath 4 kbit/s, as they distort the signal so significantly.
The P.862 standard acknowledges these limits, and explicitly states in Table 3 “Factors, technologies and applications for which PESQ has not currently been validated … CELP and hybrid codecs < 4 kbit/s". The standard they are quoting does not support use of PESQ for their tests.
PESQ is designed for phone networks, and much higher bit rate codecs. In section 2 of the standard they present best-case correlation results of +/- 0.5 MOS points (note on a scale of 1-5, this is +/- 10% error). That’s when it is used for speech codecs > 4 kbit/s that it is designed for.
So DSP Innovations statements like “Superiority of the TWELP 2400 and MELPe 2400 over CODEC2 2400 is on average 0.443 and 0.324 PESQ appropriately” are unlikely to be statistically valid.
The PESQ algorithm (Figure 4a of the standard) throws away all phase information, keeping just the FFT power spectrum. This means it cannot evaluate aspects of the speech signal that are very important for speech quality. For example PESQ could not tell the difference between voiced speech (like a vowel) an unvoiced (like a consonant) with the same power spectrum.
DSP Innovations haven’t shown any error bars or standard deviations on their results. Even the best subjective tests will have error bars wider than the PESQ results DSP Innovations are claiming as significant.
I do sympathise with them. This isn’t a huge market, they are a small company, and subjective testing is expensive. Numbers look good on a commercial web site from a marketing sense. However I suggest you disregard the PESQ numbers.
Speech codecs tend to work well with some samples and fall over with others. It is natural to present the best examples of your product. DSP Innovations chose what speech material they would present in their evaluation of Codec 2. I have asked them to give me the same courtesy and code speech samples of my choice using TWELP. I have received no response to my request.
Support and Porting
An open source codec can be ported to another machine in seconds (rather than months that DSP Innovations quote) with a cross compiler. At no cost.
Having the source code makes minor problems easy to fix yourself. We have a community that can answer many questions. For tougher issues; well I’m available for paid support – just like DSP Innovations.
Also …. well open source is just plain cool. As a reminder, here are the reasons I started Codec 2, nearly 10 years ago.
To quote myself:
A free codec helps a large amount of people and promotes development and innovation. A closed codec helps a small number people make money at the expense of stifled business and technical development for the majority.
Open Source Low Rate Speech Codec Part 1, the post that started Codec 2.
P.862 PESQ standard.
CODEC2 vs TWELP on 2400 bps. DSP Innovations evaluate Codec 2, MELP, and TWELP at 2400 bits/s.
CODEC2 vs TWELP on 700 bps. DSP Innovations evaluate Codec 2, MELP, and TWELP at 600 (ish) bits/s.
AMBE+2 and MELPe 600 Compared to Codec 2. An earlier comparison, using samples from DSP Innovations.