Royalty Free Codecs

Jean-Marc Valin (primary author of Speex and part of the CELT team) has made me aware of a proposal for developing a royalty-free audio codec under the banner of the IETF. The debates on the mailing list address some persistent issues in codec development: patents and royalties. What follows is a post I made to the IETF codec mailing list:

My name is David Rowe, I have a PhD in the field of speech compression and 20 years experience in design and real time implementation of speech codecs and other DSP algorithms. I helped found the Speex project and have been a minor contributor over the years. I have also run small businesses that use speech codecs and experienced first hand the pain of trying to use codecs like g729 with $40k license fees.

My experience as both a codec designer and a business entity using codecs is that codec royalties are a useless tax on business and telecommunication. The license fees benefit no-one but the people receiving the royalties. Why can I get a first class operating system like Linux for free but have to pay to use a tiny bit of software on it like g729?

We have the skills in the open source community to build better codecs using open source techniques. The world will be a better place for it.

Patent free, competitive DSP algorithms can and should be developed, and there are precedents:

  1. Speex and the various other open codecs.
  2. A patent free, royalty free line echo canceller (Oslec) which is successfully replacing expensive royalty based solutions. I recall much of the same FUD about patents when developing this algorithm. And yet it works, pushing out royalty-based code in many many thousands of cases, and it’s now part of the Linux kernel (try doing that with closed code).

What has held Speex adoption back is lack of standardisation – people have been forced to use royalty based codecs like g729 for bit-exact interoperability reasons. Well, it looks like we have a chance to fix that.

Speex has shown it’s possible to build a patent free codec. A lot of the algorithms involved in codecs are just well known math, e.g. transforms, vector and scalar quantisation. There usually alternative ways to perform a given operation if an annoying patent is in the way. It really is no big deal. Most of the underlying technology in modern speech codecs (DSP fundamentals, quantisation, LPC, pitch prediction etc) was published in the 60’s and 70’s.

Where the patents come in is that some one gets a good quality codec working using some clever technique in 1% of algorithm, then they patent that 1%. I would imagine they explicitly search for a novel technique in order to lock up their codec. Then when that codec is standardised we are stuck paying them royalties for bit-exact interoperability reasons.

g729 does not hold patents on linear prediction, or vector quantisation, or CELP, or Line Spectrum Pairs, or pitch prediction. All these techniques are also in Speex which sounds about the same at 8 kbit/s.

It’s easy to replace that 1% early in the codec design process – we simply make royalty free a priority rather than maximising royalties. In other words design the codec to help as many people as possible, rather than designing it to make a small number of people wealthy. Isn’t that what the IETF is all about?

I think it’s a great idea to release source code from day 1. Open source development has been shown to be superior to closed, so I am sure we can develop a better codec faster with open source, peer reviewed code.

Re funding an open development effort and travel to meetings the $ involved are trivial compared to the cumulative costs of license fees down the track for the closed approach. So it’s a great business decision to support an open codec.

15 thoughts on “Royalty Free Codecs”

  1. Well put. g729 and it’s ilk need to be knocked off their perch as “standard procedure”. The sooner royalty free codecs make their way to the top of the list the better for all worldwide.

    And cool! I didn’t notice that OSLEC has been making it’s way into the kernel, well done to all of you involved.

  2. Hear Hear!

    David, I know you have recently gotten interested in amateur radio, and by know I expect you have heard of Icom’s “D-Star” digital radio scheme. All of D-Star is available to the public, except the CODEC. They (foolishly, IMHO) used DVS’ AMBE CODEC, which is proprietary, patented, and closed.

    As far as I am concerned, the use of a proprietary CODEC for D-Star has doomed it from the start — I am not going to use it. I believe that many other radio amateurs feel the same way, for instance, here: http://thek3ngreport.blogspot.com/2008/08/proprietary-has-no-place-in-amateur.html

    A quick peek at the AMBE page on wikipedia (http://en.wikipedia.org/wiki/Advanced_Multi-Band_Excitation) shows that none other than Bruce Perens is thinking about an open-source CODEC for amateur radio. (http://codec2.org/)

    It is interesting to see the confluence of the VoIP and amateur radio stuff continue to evolve.

    Jeff, N1KDO

  3. (Love the blog and your projects by the way)

    First of all, there can be little doubt that the lack of freely available open sourced codecs has had a chilling effect on the adoption of internet video and (to a certain extent audio) on the Internet. This has allowed proprietary “solutions” like Flash (which is sort of an incidental solution to the Internet video problem) to gain hold. We are just beginning to see an honest to god VIDEO tag to show up in browsers, rather than as an add-on plugin.

    A similar problem faces VOIP. Lack of high-quality, interoperable open codecs adds to the cost of hardware, and prevents interoperable open source software from being as useful as it could.

    In the amateur radio, D-Star digital voice is becoming popular, but it’s centered around a patented voice codec. This effectively means that you have can’t write an implementation of D-Star without licensing the AMBE codec. Were it not for this dependence, we’d already have open source implementations of digital voice that would interoperate with this system. Instead, we are seeing increasing adoption of an inherently closed system in amateur radio in a way which benefits patent holders at the cost of stifling experimentation and innovation in amateur radio.

    I suspect that most of what keeps the open source community out of this arena is a) writing codecs (especially good ones) requires specialized knowledge and b) fear over being sued over inadvertently incorporating patented techniques. Perhaps you could comment on (or point me to) some of the techniques that the Speex team used to avoid getting slapped by frivilous patent infringment lawsuits. Or was there really no interest by codec manufacturers in doing so?

  4. @hads – thanks for your kind comments

    @jeff – Yes that is a bit sad about proprietary codecs being enshrined in a Ham Radio mode. MBE and MELP are fine codecs but guys like Jean-Marc have shown that there are plenty of clever people in the open source world who can do just as well. I spoke to Bruce about this initiative last year (I am mentioned in the http://codec2.org link). Something I will have a crack at one day, funded or not.

    @mark – good to see you like the blog. Yes a closed codec is totally against the Ham ethos, of sharing, learning, experimenting. It benefits no one. These guys would have locked down transistors and antennas if they had a chance. There is more to life than gathering as much money as possible for yourself.

    I think (a) is the main issue, the skills involved are fairly rare and you need to be highly motivated. But hey, if people can write great open source operating systems…..

    (b) is more FUD than anything. I studied the earlier DVSI codecs pretty well (in the early 90’s), they had some nice innovations and great quality. However they didn’t invent every technique used in their codec. Like I said in the post above, they have developed a good codec, then locked up 1% of it with patents. Once enough people adopt it – you are stuck with them. It’s a great business if you can sleep at night. Just not really great for any one else in the world.

    Jean-Marc could comment better on how he avoided any patent issues. My understanding is (i) he looked at what was patented then (ii) worked out how to get good speech quality by using techniques in the public domain.

    The key thing to understand is that the patents only cover one way to build a codec. There are many ways to get to the same quality codec. It’s like driving from A-B. You can get there via C, or via D. But if some one owns a toll gate at C, and then forces you to go that way…….it locks out a perfectly viable, and potentially free, alternative.

  5. Your blog never stops being an interesting read, that’s for sure!

    To the point, you couldn’t have picked a better time to blog about this. The first browser supporting HTML5 came out days ago and there are some huge discussions involving browser vendors (Mozilla, Opera, Google, Apple) in two working groups (WHATWG & W3C) about the and codecs for the Web.

    Days ago, there was a mail[1] in WHATWG’s mailing list by its editor mentioning that there’s “no suitable codec that all vendors are willing to implement and ship” (sic).

    These are definitely interesting times that we live in and some new challenges for the FL/OSS community in general.

    Perhaps your (open) hardware skills could help you coordinate an effort to create open Ogg Vorbis/Theora decoder hardware? That would probably help their adoption.

    Regards,
    Faidon

    1: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-June/020620.html

  6. Hi Faidon,

    Thanks for the link re codecs. Do the Ogg Vorbis/Theora decoders actually need hardware acceleration? While open hardware would be great, doing it in spare cycles on a general purpose CPU would be better.

    Cheers, David

  7. codec2.org says some misleading things about AMBE and MELP. It describes them as more robust than speex. Any radio link codec needs to use massive FEC to achieve robustness – e.g. the full rate GSM codecs run at about 13.kbps, but the channel runs at about 22kbps. The difference is a lot of FEC bits, carefully crafted to protect the most important bits in the codec data (e.g. the MSBs of the energy field). Some codecs are certainly more robust than others, but nothing at low bit rates is awfully robust without some serious FEC. Each bit is just too important when there aren’t a lot of them. :-)

    I first saw AMBE used in about 1992. Its patents must be near the end of their days now. MELP is newer. Codecs with expired patents are more interesting for open source solutions than codecs developed in the open, like speex. You can be sure no patent holders will spring a surprise on you when its that old. :-) Sadly AMBE is a rather poor codec, so its not that interesting these days.

    Steve

  8. Hi Steve – in the early 1990’s I was involved in some codec “smack down” tests where MBE (IMBE, AMBE) consistently delivered better speech quality and much better robustness to bit errors than CELP codecs at equivalent bit rates.

    One reason for poor bit error performance of CELP is the memory of the adaptive codebook (pitch predictor). Also xMBE is just more efficient for a given speech quality than CELP so you have more bits free for FEC.

    The earlier xMBE codecs I studied had unequal error protection, for example the pitch was heavily protected, low order amplitude bits less protected.

    I am pretty sure xMBE/MELP are still superior to CELP type codecs at low bit rates, which is why we see codecs like xMBE in many low bandwidth (< 5 kbit/s including FEC) digital radio applications. Above 8 kbit/s CELP codecs like Speex and g.729 have better quality, but below 5 kbit/s xMBE and MELP are much better. Good point about some patents on xMBE expiring, although I still maintain a concerted open source development effort would deliver a low bit rate codec (< 5 kbit/s including FEC) with similar quality. The patents are probably only important for interoperability. I imagine much of the interesting stuff is protected by confidential information, i.e. perhaps DVSI only distribute binaries. – David

  9. I worked on VSELP at Motorola. If you slow down the half rate GSM codec to 30ms frames you get roughly the VSELP codec for MIRS/iDEN (GSM has some extra bells and whistles, but they share most of the same features). The other codec for MIRS/iDEN is AMBE. Both those codecs were vying for the utility services, and AMBE won. Apparently it won hands down. However, when you hear people bitching about how bad iDEN sounds, they are using the AMBE codec. I’m not clear why Motorola put AMBE into iDEN, as they had to pay for it. I guess it was for public utilities interworking. That version of VSELP is very intolerant of poor mics and ADCs. Much more so than other codecs I have worked with. However, given a good input signal it blows AMBE away.

    The AMBE vs CELP comparisons I saw in the early 90s were very skewed, comparing the nastiest of ancient CELP codecs (e.g. the early and seriously unfinished VSELP in IS54) with the latest and greatest in AMBE.

    Steve

  10. All the codec trials I saw pitted the best everyone else could offer at that time against xMBE. I gather by about 1996 some other algorithms had caught up and MELP won out in the DoD stakes – I bet one big factor was less restrictive licensing.

    You know I wonder if there are any CELP codecs today that can get close to xMBE/MELP vocoder quality at a source coding rate of less than 3 kbit/s?

    All this talk of low bit rate (sub 5 kbit/s) vocoders is getting me motivated to kick off a project in this area! Like Oslec, it seems like a closed-source itch that needs scratching. My thesis was on sinusoidal speech coding, but I didn’t do much work in the quantisation area. However with the resources of the open source community I am sure this can be sorted.

    I do have a reasonable quality unquantised sinusoidal codec as a starting point.

    – David

  11. How early in the 90s were your experiences? If you said 90-91 I think you’re probably right. AMBE had a head start. If you said 93-94 I think you would be wrong. Various CELP based techniques were really coming into focus by then.

    Getting much below 4kbps is a problem for CELP based codecs. They can work really well at about 4kbps (iDEN is 4.2kbps). Somewhere between there and 3kbps they all seem to fall off a cliff. :-) If you are only looking to get below 5kbps, speex does a reasonable job. I don’t think Jean Marc has put huge effort into optimising the lowest rates, so that might be an interesting area to work on.

    MELP doesn’t have less restrictive licencing. Licencing it is a pain. It just the US government that has an easy time with its licencing.

    Steve

  12. Can you post a link to any iDEN 4.2 kbit/s CELP codec samples Steve? I would be interested to hear the codec. Is the IDEN codec patent free and open? Seems to me 4.2 kbit/s is getting in the range needed for HF/VHF digital radio.

  13. The iDEN VSELP codec is structurally close to the half rate GSM codec. I imagine most of the patents on half rate GSM (there appear to be many of those) also apply to the iDEN codec. It will be several years before they all expire.

    The arithmetic coding used in more modern CELP based codecs should beat what this VSELP codec could do.

    Steve

Comments are closed.