Codec 2 at 450 bit/s

Thomas Kurin is a Masters student at the Institute of Electronics Engineering, part of the Friedrich-Alexander University Erlangen-Nuremberg. Thomas and his supervisor Stefan Erhardt recently contacted me with some exciting news – a version of Codec 2 running at 450 bit/s, including a 16 kHz mode!

I’m very happy that Codec 2 can be used as a starting point for academic research, and thrilled about the progress Thomas has made. It’s also great for me to have a contribution to Codec 2 on such a deep technical level. Thomas has done a lot of work in vector quantisation, an area that I struggle in, and has developed an innovative 16 kHz mode.

Any speech codec running as low at 450 bit/s is pretty special, and this work is also a great starting point for further innovation and quality improvements.

Thomas has sent me some samples, and is working with me to merge his new mode into codec2-dev, and has kindly offered to tell us his story below. Well done Thomas :-)

Here are Thomas (left) and Stefan (right) working together on their innovative 450 bit/s Codec 2 mode at the University of Erlangen:

Samples

Now this is bleeding edge, very low bit rate speech coding. Don’t expect Hi-Fi. The use case is channels where no other speech signal can get through. Ham radio operators will know what I mean.

If integrated with FreeDV, a 450 bit/s codec translates to a 2dB improvement in SNR over FreeDV 700D. This would support digital speech over HF radio at lower than -4dB SNR. Weak signal or FreeDV EME anyone?

Listen to the samples, and please add your comments. How does it compare to Codec 2 at 1300 and 700 bit/s? Could you communicate using this mode?

Sample 1300 700C 450 8kHz 450 16kHz
hts1a Listen Listen Listen Listen
hts2a Listen Listen Listen Listen
forig Listen Listen Listen Listen
ve9qrp_10s Listen Listen Listen Listen
mmt1 Listen Listen Listen Listen
vk5qi Listen Listen Listen Listen
cq_ref Listen Listen Listen Listen

The samples I use are deliberately chosen to give codecs a hard time. mmt1 is the worst, it has high level truck background noise added to it.

Thomas’ Story

As a masters student at FAU Erlangen-Nürnberg I have been working on and with Codec 2 and especially with 700C for the last few months. I experimented with the source code and the Vector Quantisation. This lead to the 450 bit/s Codec which shall be shown in the following post.

The 450 Codec builds on the 700C Codec. It consists of 3 major changes:

  1. The training data used for Vector Quantisation (VQ) was changed to include multiple languages, as for other languages the VQ sometimes performed poorly because certain vectors were missing. The used dataset consisted of approximately 30 minutes English, 30 minutes German, 15 minutes Russian, 15 minutes Chinese and 15 minutes Spanish. This Codebook allowed a switch from the two stage VQ with 9 * 2 = 18 bit per frame to a one stage VQ with just 9 bit per frame for VQ. The audio quality is of course somewhat changed, but still understandable.
  2. The energy quantisation was changed from 4 bit to 3 bit as no change in quality for the one stage VQ was detectable. This leads to a reduction of 9 + 1 = 10 bit, from a 28 bit frame to a 18 bit frame. This means a 450 bit/s Codec.
  3. The biggest change was the inclusion of a 16k mode. This means the 8kHz sampled and encoded signal can be converted to a 16kHz sampled signal at the decoder. To achieve this, the codebook was trained with data sampled at 16kHz. Then only the vectors for the 0-4 kHz frequencies (= 8kHz sample rate) are used for encoding. The decoder uses the indices to look up the 16kHz sampled and trained vectors. This works because a vector that is similar between 0-4 kHz mostly also looks similar in 4-8 kHz. This results in a pseudo-wideband format without any additional bits. For some speakers this improves quality, but for some speakers it just creates noise. But that’s the beauty of the system, as the same data can both be decoded to a 8kHz sampled signal, and to a 16kHz sampled signal. Therefore the transmitted signal stays the same and the receiver can choose dependent on quality which mode he wants to listen on.The 16k mode still needs some refinement but its fascinating that it works at all.

Over the next week or so I will patch the changes with David against codec2-dev. As it is a new codec, it shouldn’t change any of the other codecs when patching, as most of the patching will be to include new files.

Reading Further

Codec 2
Codec 2 700C

16 thoughts on “Codec 2 at 450 bit/s”

  1. Wow, this is great. The power of open source. People building upon the work of others. To me this sounds better than any other vocoder out there at such a low bitrate.

  2. Well, that is impressive. Okay, we won’t win awards in audiophile magazines for audio clarity, but we are within striking distance of being able to carry out a QSO over 300 baud acoustic couplers.

    I don’t think even MELP can match this. Mighty fine work!

  3. When I started listening to the samples I nearly gave up right away, calling the result of only academic use. But then decided to keep surfing through more of the samples. Well what should I say. Results are really mixed from nearly unintelligible to nearly as good as 1300 it so much depends on the input audio I guess. Maybe a decent ‘preconditioning’ kind of filter might improve results, by removing noise hum and other non voice parts of audio before encoding…

    1. I was trying to find some background to my above mentioned concept of pre-filtering speech. The most relevant paper I came across is this one: https://pdfs.semanticscholar.org/41ae/15ea6034edcb0af9b6c7033e398a934d47b6.pdf

      Seems like there is potential kind of merging noise filtering with vocoding, which lets me ask, is this actually already being done? :-) the more I dig, the more questions are coming up…

      In any case I’m enjoying reading this kind of progress reports, it keeps me thinking and reasoning, maybe it helps rather than annoys my fellow commenters 😉

      1. Yes the usual approach for low bit rate speech coding is to remove non-speech signals like background noise. Also a good idea to equalise to remove non-essential acoustic features like highpass or lowpass spectral slope.

        1. Somehow with the truck sample it’s not working as nicely as it did work with ambe+2 in the inmarsat network. There I did lots of comissioning and troubleshooting inside various unfortunate locations on board aircrafts…
          On the other hand that was about 2400kbps, not 450 or 700 also I didn’t have a chance to check it against codec2 as im out from this job for nearly 5 years.

          I wish I could contribute something more substantial than subjective impressions from the past 😉

          1. Actually that sample came from Inmarsat – I participated in a codec trial for a new Inmatsat service in the early 1990s (probably what became Mini-M), and this one one of the samples they sent us.

  4. That’s quite interesting, I was mainly on Swift broadband and the earlier swift64, mini M was already some relic of a past time 😉

    Biggest problem used to be chaining of codecs, especially ADPCM of DECT fame made any of the MBE based codecs fail miserably, being it ambe or imbe, iridium or inmarsat.
    Thus we did lots of testing with a normal ISDN deskphone, connected directly to an S0 bus on the Satcom box.
    We always found g711 of the phone transcoded into AMBE+2 in the HSD then via inmarsats i4 network to the ground, recoded into g711 and finally terminated into a terestrial ISDN network sounded much better, than g711 routed directly, we expected without any conversions, over the I3 constellation to the same ISDN ground network. Except for bitloss over the space segment, the only other difference we could imagine, also due to ambes noise free speech, must be some really great working “noise voice separator” we always attributed this to the codec. The same was absolutely not true for the older MBE family codecs, as found in imarsat classic voice services (also called safety services) and iridium.

    Only much later we found some other voice service, that worked satisfactorily for my ears, that was GSM encoded voice over QoS managed IP connections. There the voice clarity and overall perception was largely dependent on the handset. We used smartphones with SIP clients as well as Cisco sccp wlan phones. Best sound Cisco, then iPhone and Samsung flagships next wired LAN handsets, worst were iPod touch and cheaper androids.

    Another monster post. Hopefully with some helpful information. :-)

  5. Try with voice cw (audio morse) ?
    I’m thing better intelligibility maybe, SnR … ?
    I don’t know, that is a question only.

    1. You could encode the information into a morse code-like signal by hand – not representing letters, but binary ones and zeros. A normal (?) operator can do 20 wpm, that is approximately 500 bit/minute or 8.33 bit/s. That implies a factor of ~60 times slower than real-time.
      In other words: For one second of speech you need to send 1 minute of Morse code… Exhausting…

      A better idea would be to use PSK125: The transmission would “only” require about 3.6 times real-time: Talk for 10 seconds and transmit/receive for 36 seconds (or wait for 26 seconds if starting transmission while still talking)…

      With bit rates as low as 450 bit/s a great new playground pops up – I am curious for new DV applications by ham radio enthusiasts all over the world!

  6. This is really cool. I wonder how hard it would be to apply the Wavenet machine learning work to this to help clean up the audio.

  7. The code can now be found in the svn repository. Feel free to test and report back here: Is the new mode intelligible? Or is it “too much”?

    Please note, that Thomas added a plosive detection and coding (the sound samples in this post were still without it) – without spending another bit…

    We are now working on quality improvement, doing tests with a tracking equalizer and the postfilter for sharpening the formants…

Comments are closed.