Codec 2

David Rowe, VK5DGR

Introduction

Codec2 is an open source low bit rate speech codec designed for communications quality speech at 2400 bit/s and below. Applications include low bandwidth HF/VHF digital radio and VOIP trunking. Codec 2 operating at 2400 bit/s can send 26 phone calls using the bandwidth required for one 64 kbit/s uncompressed phone call. It fills a gap in open source, free-as-in-speech voice codecs beneath 5000 bit/s and is released under the GNU Lesser General Public License (LGPL).

The motivations behind the project are summarised in this blog post.

Individuals can support Codec2 development via a Donation. Companies can support Codec 2 by paid contract development, for example a project to meet a specific need of your company.

News

  1. Dec 11, 2012: FreeDV released! FreeDV is a GUI application that combines Codec 2 and the FDMDV modem in single, user friendly application that runs on Linux and Windows. It enables anyone with a SSB radio start using digital voice.

  2. Sep 15, 2012: Significant quality increases at 2400, 1400, and 1200 bit/s from the LPC post filter and a new 3200 bit/s mode. You can listen to the new 3200 bit/s mode (compared to g729 and AMR-NB) below.

  3. August 2, 2012: Codec 2 was awarded the ARRL technical innovation award for 2012.

  4. July 2012: A commercial development project kicks off to improve the quality of Codec 2 for VOIP applications. The company wishes to remain anonymous, but I wanted to acknowledge their kind contribution.

Status

Alpha release of 3200, 2400, 1400, and 1200 bit/s codecs in the codec-dev SVN branch.

Here are some samples (SVN revision 714).

Codec Male Female
Original male female
Codec 2 3200 bit/s male female
Codec 2 2400 bit/s male female
Codec 2 1400 bit/s male female
Codec 2 1200 bit/s male female

Here is Codec 2 operating at 2400 bit/s compared to some other low bit rate codecs:

Codec Male Female
Original male female
Codec 2 2400 bit/s male female
MELP 2400 bit/s male female
AMBE 2000 bit/s male female
LPC-10 2400 bit/s male female

Notes: The MELP samples are from an early 1998 simulation. I would welcome any samples processed with a modern version of MELP. The AMBE samples were generated using a DV-Dongle, a USB device containing the DVSI AMBE2000 chip. The LPC-10 samples were generated using the Spandsp library.

Here is Codec 2 operating at 3200 bit/s compared to some higher bit rate CELP codecs, typcially used for VOIP and mobile phone work:

Codec Male Female
Original male female
Codec 2 3200 bit/s male female
AMR 4750 bit/s male female
g.729a 8000 bit/s male female

Here is a counter example where AMBE works better compared to Codec 2. In particular the low frequency reproduction is much better. Thank you Kristoff ON1ARF for sending in these samples. Why AMBE works so much better than Codec 2 for this sample compared to say the male sample above (hts1a.wav) is an interesting mystery that I am exploring. Low frequency energy? Input filtering perhaps? Or a corner case where the Codec 2 parameter estimators (pitch, voicing etc) break down? One nice thing about open source DSP development is people tend to send you samples that break the algorithms. By investigating the performance of the algorithm with these samples the algorithms can be improved.

Codec Kristoff
Original Kristoff
Codec 2 2400 bit/s Kristoff
AMBE 2000 bit/s Kristoff

Here are some samples with acoustic background noise, similar to what would be experienced when driving a truck. As you can see (well, hear) background noise is a tough test for low bit rate vocoders. They achieve high compression rates by being highly optimised for human speech, at the expense of performance with non-speech signals like background noise and music. Note that Codec 2 has just one voicing bit, unlike mixed excitation algorithms like AMBE and MELP.

Codec Male with truck noise
Original male
Codec 2 2400 bit/s male
AMBE 2000 bit/s male
LPC-10 2400 bit/s male

Progress to Date

  1. Linux/gcc simulation (c2sim) which is a test-bed for various modelling and quantisation options – these are controlled via command line switches. Separate encoder (c2enc) and decoder (c2dec) programs that demo Codec2 on the command line. Runs approximately 10x real-time on a modern X86 PC.
  2. Original thesis code has been refactored and brought up to modern gcc standards.
  3. LPC modelling is working nicely and there is a first pass scalar LSP quantiser working at 36 bits/frame with good voice quality. Lots could be done to reduce the LSP bit rate. A novel approach to LPC modelling uses a single bit to correct low frequency LPC errors. LSP quantisers (simple uniform, hand designed tables) are designed to simply keep low frequency LSP quantisation errors less than 25Hz. Keeping LSP errors less than 25 Hz was found to be more important than minimum MSE in subjective tests. This is a surprise, as regular quantiser design (e.g. Lloyd-max and k-means for VQ design) assumes minimum MSE is the key.
  4. A phase model has been developed that uses 0 bits for phase and 1 bit/frame for voiced/unvoiced decision but delivers OK speech quality. This works suspiciously well – codecs with a single bit voiced/unvoiced decision aren’t meant to sound this good. Usually mixed voicing at several bits/frame is required. However the phase and voicing still represent the largest quality drop in the codec.

    The LPC post filter helps compensate for some of this quality drop. With incremental improvements Codec 2 is now approaching the quality of CELP codecs in the 4000 – 8000 bit/s range.

    To give you an idea of what is possible with Codec 2 this is what the codec sounds like at a 10ms frame rate with original phases. The other codec parameters are fully quantised, and the lpc post filter is enabled:

    Original Codec2 (original phases) g.729a 8000 bit/s
    hts1a male hts1a male hts1a male
    hts2a female hts2a female hts2a female

  5. An experimental post-filter has been developed to improve performance with speech mixed with background noise. The post-filter delivers many of the advantages of mixed voicing but unlike mixed voicing requires zero bits.
  6. Non-Linear Pitch (NLP) pitch estimator working OK, and a simple pitch tracker has been developed to help with some problem frames.
  7. An algorithm for interpolating sinusoidal model parameters from a 20ms frame rate required for 2400 bit/s coding to the internal native 10ms frame rate.
  8. For the lower bit rates, changing the frame rate from 20 to 40ms seems to have a remarkably small effect on speech quality. This has resulted in 1500 and 1200 bit/s versions of the codec.
  9. “Pathalogical samples” have been collected that break or perform poorly with Codec 2. These will be investigated to improve the quality of the codec.
  10. With the kind help of Jean-Marc Valin Vector Quantisation of the LSPs and a joint Pitch-Energy VQ have been prototyped.
  11. A FDMDV HF modem has been implemented to allow testing of Codec 2 over HF radio channels.
  12. The Freeswitch project has integrated an early version of Codec 2. Patches are have been developed to integrate Codec 2 into Asterisk, kindly funded by Ed W from Mailasail. The code for Asterisk support is in Codec 2 SVN, here is the README.
  13. Windows support has been added, including a DLL and import library that contains Codec 2 and the FDMDV modem. See the win32 directory of Codec 2 SVN.
  14. A LPC Post filter has been developed that significantly increases quality with no increase in the bit rate.

Current work areas:

  1. Improve speech quality by work on the LSP quantisation, phase modelling, voicing estimation, post-filter, and interpolation.
  2. Gather feedback from real-world testing of Alpha code.
  3. Integration with modem code to develop an open source digital voice system for HF/VHF radio.

Browse:

https://svn.code.sf.net/p/freetel/code/codec2-dev

Check Out:


Development version (latest and greatest but may not compile cleanly at any given time):
 
$ svn co https://svn.code.sf.net/p/freetel/code/codec2-dev codec2-dev
 

The Mailing List

For any questions, comments, support, suggestions for applications by end users and developers please post to the Codec2 Mailing List

Quick Start

To encode the file raw/hts1a.raw then decode to a raw file ( 8 kHz, 16 bit ints) hts1a_c2.raw:

  $ svn co https://svn.code.sf.net/p/freetel/code/codec2-dev codec2-dev
  $ cd codec2-dev
  $ ./configure
  $ make
  $ cd src
  $ ./c2demo ../raw/hts1a.raw hts1a_c2.raw
  $ play -r 8000 -s -2 hts1a_c2.raw

Fun with pipes:

  $ ./c2enc 1400 ../raw/morig.raw - | ./c2dec 1400 - - | play -t raw -r 8000 -s -2 -

Encode a voice file with Codec 2 at 1400 bit/s and send through FDMDV modem, play decoded audio:

  $ ./c2enc 1400 ../raw/hts1a.raw - | ./fdmdv_mod - - | ./fdmdv_demod - - | ./c2dec 1400 - - | play -t raw -r 8000 -s -2 -

Development Roadmap

Here is a project road map showing planned development, completed milestones are in blue:

Dave Witten KD0EAG is building a GUI application called FDMDV-2 that can run on Linux and Windows. This GUI application will include Codec 2, the FDMDV modem, and will connect to a SSB transceiver to create a Digital Voice system for HF radio. Dave has drawn a block diagram of FDMDV-2 (click on the image for a larger version):

How it Works

What follows is basic introduction to the core Codec 2 algorithms using maths in ‘C code’ to make it more familiar.

Also see:

  1. A presentation on Codec 2 in Power Point or Open Office form.
  2. At linux.conf.au 2012 I presented a graphical description of how Codec 2 works, see the Links section below. This is a really gentle introduction.

Codec2 uses “harmonic sinusoidal speech coding”. Sinusoidal coding was developed at the MIT Lincoln labs in the mid 1980′s, starting with some gentlemen called R.J. McAulay and T.F. Quatieri. I worked on these codec algorithms for my PhD during the 1990′s. Sinusoidal coding is a close relative of the xMBE codec family and they often use mixed voicing models similar to those used in MELP.

Speech is modelled as a sum of sinusoids:

 
  for(m=1; m<=L; m++)
    s[n] += A[m]*cos(Wo*m*n + phi[m]);

The sinusoids are multiples of the fundamental frequency Wo (omega-naught), hence the name “harmonic sinusoidal coding”. For each frame, we analyse the speech signal and extract a set of parameters:

 
  Wo, {A}, {phi}

Where Wo is the fundamental frequency (also know as the pitch), { A } is a set of L amplitudes and { phi } is a set of L phases. L is chosen to be equal to the number of harmonics that can fit in a 4 kHz bandwidth:

 
  L = floor(pi/Wo)

Wo is specified in radians normalised to 4 kHz, such that pi radians = 4 kHz. The fundamental frequency in Hz is:

 
  F0 = (8000/(2*pi))*Wo

We then need to encode (quantise) Wo, { A }, { phi } and transmit them to a decoder which reconstructs the speech. A frame might be 10-20ms in length so we update the parameters every 10-20ms (100 to 50 Hz update rate).

The speech quality of the basic harmonic sinusoidal model is pretty good, close to transparent. It is also relatively robust to Wo estimation errors. Unvoiced speech (e.g. consonants) are well modelled by a bunch of harmonics with random phases. Speech corrupted with background noise also sounds OK, the background noise doesn’t introduce any grossly unpleasant artifacts.

As the parameters are quantised to a low bit rate and sent over the channel, the speech quality drops. The challenge is to achieve a reasonable trade off between speech quality and bit rate.

Codec 2 Block Diagrams

Here are some block diagrams that illustrate the major sgnal processing elements for a fully quantised configuration of Codec 2. This example includes the LPC correction bit which was a feature of the 2550 bit/s version.

The encoder:

The decoder:

These figures were explained in a presentation I gave at the DCC 2011 conference, for more information see the video of that talk.

Example Bit Allocation

Parameter bits/frame
Spectral magnitudes (LSPs) 36
Joint Pitch and Energy 8
Voicing (updated each 10ms) 2
Spare 2
Total 48

At a 20ms update rate 48 bits/frame is 2400 bits/s.

Challenges

The tough bits of this project are:

1. Parameter estimation, in particular voicing estimation.

2. Reduction of a time-varying number of parameters (L changes with Wo each frame) to a fixed number of parameters required for a fixed bit rate. The trick here is that { A } tend to vary slowly with frequency, so we can “fit” a curve to the set of { A } and send parameters that describe that curve.

3. Discarding the phases { phi }. In most low bit rate codecs phases are discarded, and synthesised at the decoder using a rule-based approach. This also implies the need for a “voicing” model as voiced speech (vowels) tends to have a different phase structure to unvoiced (constants). The voicing model needs to be accurate (not introduce distortion), and relatively low bit rate.

4. Quantisation of the amplitudes { A } to a small number of bits while maintaining speech quality. For example 30 bits/frame at a 20ms frame rate is 30/0.02 = 1500 bits/s, a large part of our 2400 bit/s “budget”.

5. Performance with different speakers and background noise conditions. This is where you come in – as codec2 develops please send me samples of it’s performance with various speakers and background noise conditions and together we will improve the algorithm. This approach proved very powerful when developing Oslec. One of the cool things about open source!

Can I help?

Look the the Development Roadmap above and see if there is anything that interests you.

Not all of this project is DSP. There are many general C coding tasks like code review, writing user applications, testing, and even patent review.

I will happily accept sponsorship for this project. For example research grants, or development contracts from companies interested in seeing an open source low bit rate speech codec.

You can also donate to the codec2 project via PayPal (which also allows credit card donations):

Donation in US$:

Thanks to the following for your kind PayPal donations:

Brian Morrison, Andreas Weller, Stuart Brock, Bryan Greenawalt, Anthony Cutler (many times), Martin Flynn, Melvyn Whitten (many times), Glen English, William Scholey, Andreas Bier, David Witten, Clive Ousbey, David Bern, Bryan Pollock, Mario Dearanzeta, Gerhard Burian, Tim Rob, Daniel Cussen, Gareth Davies, Simon Eatough, Neil Brewitt, Robert Eugster, Ramon Gandia, A J Carr, Van Jacobson, Eric Muehlstein, Cecil Casey, Nicola Giacobbe, John Ackermann, Joel Kolstad, Curt E. Mills, James Ahlstrom, Chris Inwood, Gary Greene, Robert McGwier, Jacek Radzikowski, Lars Morich, CW Black, Thomas Azlin, Pablo Di Noto, Pavel Nikulin, Ernest Martin, Cesar Bremer, Mark VandeWettering, Poovan Chetty, Martin Barratt, Patrick Strasser, Alexandru Csete, Stuart Brock, Rob Nottage, Steven G Harnish, Alok Prasad, Satish Mali, Paul Russell, Steven Heimann, Einstel Limited, Jonathan Taylor, Andrew Howell, Rey Cyril, Aristea Papadopoulou, Anthony Best, Leif Burrow, Stephane Fillod, Andreas Bier, Stephane Mabille, Anthony Bombardiere, Gerald Muething, Daniel Curry, Graham Bryce, BSD Lazarus, David Bern, Peter Marks, M J Hill, Simone Arrigo, Thomas Seiler, J B Nicholson-Owens, Zoltan Deák-Lugossy.

Thanks to the following for sponsoring Codec 2 development:

Ed W from Mailasail (Asterisk support).

Thanks to the following for your kind Equipment donations:

Melvyn Whitten (headphones & DV Dongle), TAPR (Yaesu FT-817ND radio).

Thanks to the following for travel and conference support:

Melvyn Whitten, TAPR.

Thanks to the following for submitting patches or helping out with the code and algorithms:

Bruce Perens, Bill Cowley, Jean-Marc Valin, Gregory Maxwell, Peter Ross, Edwin Torok, Mathieu Rene, Brian West, Bruce Robertson, Stuart Marsden, Daniel Ankers, Thomas Sprinkmeier, Peter Lawrence.

Thanks also to the many people who have sent emails of encouragement, publicised Codec 2, used in their applications, and participated in the mailing list. If I have forgotten anyone above, please let me know!

Is it Patent Free?

I think so – much of the work is based on old papers from the 60, 70s and 80′s and the PhD thesis work used as a baseline for this codec was original. A nice little mini project would be to audit the patents used by proprietary 2400 bit/s codecs (MELP and xMBE) and compare.

Proprietary codecs typically have small, novel parts of the algorithm protected by patents. However proprietary codecs also rely heavily on large bodies of public domain work. The patents cover perhaps 5% of the codec algorithms. Proprietary codec designers did not invent most of the algorithms they use in their codec. Typically, the patents just cover enough to make designing an interoperable codec very difficult. These also tend to be the parts that make their codecs sound good.

However there are many ways to make a codec sound good, so we simply need to choose and develop other methods.

Is Codec2 compatible with xMBE or MELP?

Nope – I don’t think it’s possible to build a compatible codec without infringing on patents or access to commercial in confidence information.

Hacking

All of my development is on an Ubuntu Linux box. If you would like to play with Codec2 here are some notes:

  • src/sim.sh will perform the several processing steps required to output speech files at various processing steps, for example:

    $ cd codec2/src
    $ ./sim.sh hts1a

    will produce hts1a_uq (unquantised, i.e. baseline sinusoidal model), hts1a_phase0 (zero phase model), hts1a_lpc10 (10th order LPC model) etc.

  • You can then listen to all of these samples (and the original) using:

      $ ./listensim.sh hts1a
  • Specific notes about LPC and Phase modelling are below.
  • There are some useful scripts in the scripts directory, for example wav2raw.sh, raw2wav.sh, playraw.sh, menu.sh. Note that sim.sh and listensim.sh are in the src directory as that’s where they get used most of the time.
  • The blog post Testing Codec 2 Algorithms describes the steps I take when working on Codec 2.

LPC Modelling Notes

Linear Prediction Coefficient (LPC) modelling is used to model the sine wave amplitudes { A }. The use of LPC in speech coding is common, although the application of LPC modelling to frequency domain coding is fairly novel. They are mainly used for time domain codecs like LPC-10 and CELP.

LPC modelling has a couple of advantages:

  • From time domain coding we know a lot about LPC, for example how to quantise them efficiently using Line Spectrum Pairs (LSPs).
  • The number of amplitudes varies each frame as Wo and hence L vary. This makes the { A } tricky to quantise and transmit. However it is possible to convey the same information using a fixed number of LPCs which makes the quantisation problem easier.

To test LPC modelling:

  $ ./c2sim ../raw/hts1a.raw --lpc 10 -o hts1a_lpc10.raw

The blog post [4] discusses why LPC modelling works so well when Am recovered via RMS method (Section 5.1 of thesis). Equal area model of LPC spectra versus harmonic seems to work remarkably well, especially compared to sampling the LPC spectrum. SNRs up to 30dB on female frames.

There is a problem with modelling the low order (e.g. m=1, i.e. fundamental) harmonics for some male samples. The amplitude of the m=1 harmonic is raised by as much as 30dB after LPC modelling as (I think) LPC spectra must have zero derivative at DC. This means it’s poor at modelling very low freq harmonics which unfortunately the ear is very sensitive to. To correct this an extra bit has been added on some versions of the codec to correct LPC modelling errors on the first harmonic. When set this bit instructs the decoder to attenuate the LPC modelled harmonic by 30dB.

Phase Modelling Notes

I have a “zero order” phase model under constant development. This model synthesise the phases of each harmonic at the decoder side. The model is described in source code of phase.c.

The zero phase model requires just one voicing bit to be transmitted to the decoder, all other phase information is synthesised use a rule based model. It seems to work OK for most speech samples, but adds a “clicky” artefact to some low pitched speakers. For reasons I don’t yet understand, the model quality drops when the zero phase model is combined with LPC based amplitude modelling. Also see the blog posts below for more discussion of phase models.

To determine voicing we use the MBE algorithm on the first 1 kHz. This attempts to fit an “all voiced” harmonic spectrum then compared the fit in a Mean Square Error (MSE) sense. This works OK, and is fast, but like all parameter estimators screws up occasionally. The worst type of error is when voiced speech is accidentally declared unvoiced. So I have biased the threshold towards voiced decisions which reduces the speech quality a little. More work required, maybe a mixture of two estimators so their errors are uncorrelated.

Unvoiced speech can be represented well by random phases and a Wo estimate that jumps around randomly. If Wo is small the number of harmonics is large which makes the noise less periodic and more noise like to the ear. With Wo jumping around phase tracks are discontinuous between frames which also makes the synthesised signal more noise like and prevents time domain pulses forming that the ear is sensitive to.

Running the Phase Model


  $ ./c2sim ../raw/hts1a.raw hts1a.mdl --phase0 -o hts1a_phase0.raw
 

Octave Scripts

  • pl.m – plot a segment from a raw file
  • pl2.m – plot the same segments from two different files to compare
  • plamp.m – menu based GUI interface to “dump” files, move back and forward through file examining time and frequency domain parameters, lpc model etc

              $ CFLAGS=-DDUMP ./configure
              $ make clean
              $ make
              $ cd src
              $ ./c2sim ../raw/hts1a.raw --lpc 10 --dump hts1a
              $ cd ../octave
              $ octave
              octave:1> plamp("../src/hts1a",25)
  • plphase.m – similar to plamp.m but for analysing phase models

              $ ./c2dec --phase [0|1] --dump hts1a_phase
              $ cd ../octave
              $ octave
              octave:1> plphase("../src/hts1a_phase",25)
  • plpitch.m – plot two pitch contours (.p files) and compare
  • plnlp.m – plots a bunch of NLP pitch estimator states. Screenshot

Directories


  script   - shell scripts to simply common operations
  speex    - LSP quantisation code borrowed from Speex for testing
  src      - C source code
  octave   - Matlab/Octave scripts
  pitch    - pitch estimator output files
  raw      - speech files in raw format (16 bits signed linear 8 KHz)
  unittest - Unit test source code
  wav      - speech files in wave file format

Other Uses

The DSP algorithms contained in codec2 may be useful for other DSP applications, for example:

  • The nlp.c pitch estimator is a working, tested, pitch estimator for human speech. NLP is an open source pitch estimator presented in C code complete with a GUI debugging tool (plnlp.m screenshot). It can be run stand-alone using the tnlp.c unit test program. It could be applied to other speech coding research. Pitch estimation is a popular subject in academia, however most pitch estimators are described in papers, with the fine implementation details left out.
  • The basic analysis/synthesis framework could be used for high quality speech synthesis.

Links

  1. Codec 2 presentation in Power Point or Open Office form.
  2. Bruce Perens introducing the codec2 project concept
  3. July 1997, David’s PhD Thesis, “Techniques for Harmonic Sinusoidal Coding”, used for baseline algorithm
  4. Aug 2009, Open Source Low rate Speech Codec Part 1 – Introduction
  5. Aug 2009, Open Source Low rate Speech Codec Part 2 – Spectral Magnitudes
  6. Sep 2009, Open Source Low rate Speech Codec Part 3 – Phase and Male Speech
  7. Sep 2009, Open Source Low rate Speech Codec Part 4 – Zero Phase Model
  8. Aug 2010, Codec2 V0.1 Alpha Released
  9. September 21 2010 – Slashdotted!
  10. Oct 2010, Codec 2 – Alpha Release and Voicing
  11. Dec 2010, Codec 2 V0.1A
  12. Dec 2010, Testing Codec 2 Algorithms
  13. Jan 2011, Codec 2 Over the Air
  14. Sep 2011, Codec 2 talk at the 2011 ARRL/TAPR Digital Communications Conference in Baltimore, Video and Slides.
  15. Nov 15, 2011, Codec 2 at 1400 bits/s describes the first pass at a 1400 bit/s version, including an introduction to Line Spectrum Pairs (LSPs) and Vector Quantisation (VQ).
  16. 27 Jan 2012, Codec 2 talk at linux.conf.au 2012 (voted best talk of conference!) Video and Slides. This talk has a really easy to understand graphical description of Codec 2, a discussion on patent free codecs, and the strong links between Ham Radio and the Open Source movement. More on lca.conf.au 2012 in this blog post.
  17. 4 March 2012, Jean-Marc Valin has done some great work on joint VQ of the frame to frame pitch and gain differences, documented in his blog post A Pitch-Energy Quantizer for Codec2