Yesterday I was chatting on the #freedv IRC channel, and a good question was asked: how close is Codec 2 to AMBE+2 ? Turns out – reasonably close. I also discovered, much to my surprise, that Codec 2 700C is better than MELPe 600!
Samples
Original | AMBE+2 3000 | AMBE+ 2400 | Codec 2 3200 | Codec 2 2400 |
---|---|---|---|---|
Listen | Listen | Listen | Listen | Listen |
Listen | Listen | Listen | Listen | Listen |
Listen | Listen | Listen | Listen | Listen |
Listen | Listen | Listen | Listen | Listen |
Listen | Listen | Listen | Listen | Listen |
Listen | Listen | Listen | Listen | Listen |
Listen | Listen | Listen | Listen | Listen |
Listen | Listen | Listen | Listen | Listen |
Listen | Listen | Listen | Listen | Listen |
Original | MELPe 600 | Codec 2 700C |
---|---|---|
Listen | Listen | Listen |
Listen | Listen | Listen |
Listen | Listen | Listen |
Listen | Listen | Listen |
Listen | Listen | Listen |
Listen | Listen | Listen |
Listen | Listen | Listen |
Listen | Listen | Listen |
Listen | Listen | Listen |
Here are all the samples in one big tar ball.
Discussion
I don’t have a AMBE or MELPe codec handy so I used the samples from the DVSI and DSP Innovations web sites. I passed the original “DAMA” speech samples found on these sites through Codec 2 (codec2-dev SVN revision 3053) at various bit rates. Turns out the DAMA samples were the same for the AMBE and MELPe samples which was handy.
These particular samples are “kind” to codecs – I consistently get good results with them when I test with Codec 2. I’m guessing they also allow other codecs to be favorably demonstrated. During Codec 2 development I make a point of using “pathological” samples such as hts1a, cg_ref, kristoff, mmt1 that tend to break Codec 2. Some samples of AMBE and MELP using my samples on the Codec 2 page.
I usually listen to samples through a laptop speaker, as I figure it’s close to the “use case” of a PTT radio. Small speakers do mask codec artifacts, making them sound better. I also tried a powered loud speaker with the samples above. Through the loudspeaker I can hear AMBE reproducing the pitch fundamental – a bass note that can be heard on some males (e.g. 7), whereas Codec 2 is filtering that out.
I feel AMBE is a little better, Codec 2 is a bit clicky or impulsive (e.g. on sample 1). However it’s not far behind. In a digital radio application, with a small speaker and some acoustic noise about – I feel the casual listener wouldn’t discern much difference. Try replaying these samples through your smart-phone’s browser at an airport and let me know if you can tell them apart!
On the other hand, I think Codec 2 700C sounds better than MELPe 600. Codec 2 700C is more natural. To my ear MELPe has very coarse quantisation of the pitch, hence the “Mr Roboto” sing-song pitch jumps. The 700C level is a bit low, an artifact/bug to do with the post filter. Must fix that some time. As a bonus Codec 2 700C also has lower algorithmic delay, around 40ms compared to MELPe 600’s 90ms.
Curiously, Codec 2 uses just 1 voicing bit which means either voiced or unvoiced excitation in each frame. xMBE’s claim to fame (and indeed MELP) over simpler vocoders is the use of mixed excitation. Some of the spectrum is voiced (regular pitch harmonics), some unvoiced (noise like). This suggests the benefits of mixed excitation need to be re-examined.
I haven’t finished developing Codec 2. In particular Codec 2 700C is very much a “first pass”. We’ve had a big breakthrough this year with 700C and development will continue, with benefits trickling up to other modes.
However the 1300, 2400, 3200 modes have been stable for years and will continue to be supported.
Next Steps
Here is the blog post that kicked off Codec 2 – way back in 2009. Here is a video of my linux.conf.au 2012 Codec 2 talk that explains the motivations, IP issues around codecs, and a little about how Codec 2 works (slides here).
What I spoke about then is still true. Codec patents and license fees are a useless tax on business and stifle innovation. Proprietary codecs borrow as much as 95% of their algorithms from the public domain – which are then sold back to you. I have shown that open source codecs can meet and even exceed the performance of closed source codecs.
Wikipedia suggests that AMBE license fees range from USD$100k to USD$1M. For “one license fee” we can improve Codec 2 so it matches AMBE+2 in quality at 2400 and 3000 bit/s. The results will be released under the LGPL for anyone to use, modify, improve, and inspect at zero cost. Forever.
Maybe we should crowd source such a project?
Command Lines
This is how I generated the Codec 2 wave files:
~/codec2-dev/build_linux//src/c2enc 3200 9.wav - | ~/codec2-dev/build_linux/src/c2dec 3200 - - | sox -t raw -r 8000 -s -2 - 9_codec2_3200.wav
Links
DSP Innovations, MELPe samples. Can anyone provide me with TWELP samples from these guys? I couldn’t find any on the web that includes the input, uncoded source samples.
If we found a way to shave 100bps off Codec2 700C to make a Codec2 600, would it still sound better or worse than MELPe 600?
The fact that for 100bps extra overhead we provide worthwhile improvements whilst reducing algorithmic complexity though is great news. It means we’re very much in the ballpark. We’re a viable competitor, and it’s difficult to complain about our asking price. 🙂
Yes I think we can reach 600 bit/s (and below) with the same quality. The odd choice of 700 bit/s rate fell out due to other factors, e.g. the cohpsk HF modem symbol rate.
Hello David, have you ever thought about working together with the Xiph.org people (developers of the Opus and Vorbis codecs)?
73
Actually am in email contact with them atm.
Do you think both sides could benefit from each others work? Given that source code exchange is only possible one way (Opus/BSD > Codec2/(l)gpl).
73
I have been confronted with ambe+2 for many years at my 2 previous jobs (inmarsat i4 voice in particular) and we always admired how it fit the bill for this purpose. In the typical, noisy environment inside aircrafts (also in the cargo while on ground, doors open, APU, GPU, hydraulics etc. running during maintenance) this codec sounded awesome. Even if compared to the i3’s g711 (Swift64 ISDN) not to mention iridium and worst inmarsat classic aero voice (both IMBE?). We always suspected that ambe+2 was great at filtering out noise and encoding just voiced parts of the source signal. But had no real insight (you heard of IP rights??) Voice sounded slightly robotised but it was no problem identifying the speaker. One thing we found out to throw over its superiour intellegibility was using DECT handsets as an input source, g711 ISDN desk phones were fine.
There was one funny episode while discussing with our supplier for PBX systems about qualities of ambe and their preffered GSM FR codec, when their sales rep said why dont we just license AMBE then??? You know the rumours about fees 😉
Even my post is not very technical, i hope it gets you a feeling where AMBE+2 shines (at presumably 2400bps).
Unfortunately I’m no more having access to SATCOM terminals for tests with codec2. But I’m pretty convinced that preprocessing of the signal is most important, and if this is upset by eg. DECT’s ADPCM the codecs intelligibility falls apart.
I hope codec2 will surpass ambe and make it(‘s business model) obsolete.
Thanks for your time and efforts developping the algorithms and keeping a community alive here.
Thanks Diego – some interesting stories! Yes noise reduction at the input of a codec is really important. Lots more work to do on Codec 2 – we’ll fix the problems one by one.
Hi David, I love to tell stories of the good old times in aviation 😉
I’m loosly following your progress since i got to know about codec2 and am impressed by your persistence and your way of dealing with new problems, also outside of your field.
I have just one random OT question, which I think you might be able to answer. If not never mind…
Have you ever heard of a real world use of TCM (no not chineese medicine, trellis coded modulation) I always found the concept interesting, but never found any trace of it outside of academia.
Is it too “costly” to implement or impractical or is it just obsolete due to better alternatives?
I’ve read about TCM from time to time, and some colleagues have used it. I think it’s used on some of the NASA deep space probes and other sat coms closer to home. Wouldn’t be a CPU load issue, it’s just convolutional coding and modulation combined. I suspect the latest breed of FEC like Turbo Code and LDPC have made it obsolete.
Thanks!
So kind of what i suspected… Obsolete before in widespread use 🙂
The V32 QAM 9600 bit/s Telephone modem used TCM. There’s a PDF describing it in this repository “spra099.pdf”.
https://github.com/ObjectToolworks/v32qam
73
Thanks Steve
The TI app note seems quite interesting, i just read a bit accross. Have to take some time to read soon.
73
What about the comparison to TWELP?
As per Links section above – I can’t find any TWELP samples on the Internet. Can someone point me at some?