Codec 2 Wideband

I’m spending a month or so improving the speech quality of a couple of Codec 2 modes. I have two aims:

  1. Make the 700 bit/s codec sound better, to improve speech quality on low SNR HF channels (beneath 0dB).
  2. Develop a higher quality mode in the 2000 to 3000 bit/s range, that can be used on HF channels with modest SNRs (around 10dB)

I ran some numbers on the new OFDM modem and LDPC codes, and turns out we can get 3000 bit/s of codec data through a 2000 Hz channel at down to 7dB SNR.

Now 3000 bit/s is broadband for me – I’ve spent years being very frugal with my bits while I play in low SNR HF land. However it’s still a bit low for Opus which kicks in at 6000 bit/s. I can’t squeeze 6000 bit/s through a 2000 Hz RF channel without higher order QAM constellations which means SNRs approaching 20dB.

So – what can I do with 3000 bit/s and Codec 2? I decided to try wideband(-ish) audio – the sort of audio bandwidth you get from Skype or AM broadcast radio. So I spent a few weeks modifying Codec 2 to work at 16 kHz sample rate, and Jean Marc gave me a few tips on using DCTs to code the bits.

It’s early days but here are a few samples:

Description Sample
1 Original Speech Listen
2 Codec 2 Model, orignal amplitudes and phases Listen
3 Synthetic phase, one bit voicing, original amplitudes Listen
4 Synthetic phase, one bit voicing, amplitudes at 1800 bit/s Listen
5 Simulated analog SSB, 300-2600Hz BPF, 10dB SNR Listen

Couple of interesting points:

  • Sample (2) is as good as Codec 2 can do, its the unquantised model parameters (harmonic phases and amplitudes). It’s all down hill from here as we quantise or toss away parameters.
  • In (3) I’m using a one bit voicing model, this is very vocoder and shouldn’t work this well. MBE/MELP all say you need mixed excitation. Exploring that conundrum would be a good Masters degree topic.
  • In (3) I can hear the pitch estimator making a few mistakes, e.g. around “sheet” on the female.
  • The extra 4kHz of audio bandwidth doesn’t take many more bits to encode, as the ear has a log frequency response. It’s maybe 20% more bits than 4kHz audio.
  • You can hear some words like “well” are muddy and indistinct in the 1800 bit/s sample (4). This usually means the formants (spectral) peaks are not well defined, so we might be tossing away a little too much information.
  • The clipping on the SSB sample (5) around the words “depth” and “hours” is an artifact of the PathSim AGC. But dat noise. It gets really fatiguing after a while.

Wideband audio is a big paradigm shift for Push To Talk (PTT) radio. You can’t do this with analog radio: 2000 Hz of RF bandwidth, 8000 Hz of audio bandwidth. I’m not aware of any wideband PTT radio systems – they all work at best 4000 Hz audio bandwidth. DVSI has a wideband codec, but at a much higher bit rate (8000 bits/s).

Current wideband codecs shoot for artifact-free speech (and indeed general audio signals like music). Codec 2 wideband will still have noticeable artifacts, and probably won’t like music. Big question is will end users prefer this over SSB, or say analog FM – at the same SNR? What will 8kHz audio sound like on your HT?

We shall see. I need to spend some time cleaning up the algorithms, chasing down a few bugs, and getting it all into C, but I plan to be testing over the air later this year.

Let me know if you want to help.

Play Along

Unquantised Codec 2 with 16 kHz sample rate:

$ ./c2sim ~/Desktop/c2_hd/speech_orig_16k.wav --Fs 16000 -o - | play -t raw -r 16000 -s -2 -

With “Phase 0” synthetic phase and 1 bit voicing:

$ ./c2sim ~/Desktop/c2_hd/speech_orig_16k.wav --Fs 16000 --phase0 --postfilter -o - | play -t raw -r 16000 -s -2 -

Links

FreeDV 2017 Road Map – this work is part of the “Codec 2 Quality” work package.

Codec 2 page – has an explanation of the way Codec 2 models speech with harmonic amplitudes and phases.

5 thoughts on “Codec 2 Wideband”

  1. This shows a lot of promise. I really struggle with the lower bandwidth digital speech modes, in particular the 700bits/s samples you previously published. However these are really good, easily understandable to me. I agree with your comments about sample 4 being perhaps a little too compressed but I can still understand it easily.

    This is the first time that I’ve preferred the sound of the digital over the analogue in these comparisons. Keep up the good work

    1. Thanks Russell. The idea is to come up with something clearly better than SSB given a channel with a moderate SNR.

  2. This sounds like a great step forward. I played with a version of Codec2 earlier this year, using it in a machine learning project. The aim was to use the compression and parametric representation of vocal audio from the codec to simplify a speech learning for a neural network.

    Unfortunately I used an unofficial GitHub version of the Codec (yes, I know I was warned!), but really the issue I had was that too much compression (1300 bit version) made learning speech sequences harder than reducing the data rate should have made things easier. The background of what I did is here: http://babble-rnn.consected.com

    Anyway, this new work looks great. Let me know if I can help out. I can scrub up my C skills again!

    I’d like to rerun my previous neural network with it again as well. I think there are also approaches that could be applied from machine learning that could test and optimise different design or tuning choices made in your new work. Just ideas at the moment though.

    Thanks for the great work!
    Phil

    1. Hi Phil,

      I took a look at Babble, that’s an interesting project and good to see Codec 2 was useful for your project. That’s what open source is all about. Maybe a mel-spaced spectral amplitude vector as used in 700C would be a good feature set to play with.

      Would be great to have some help with C coding on a high quality Codec 2 mode. I will email you directly.

      – David

  3. Channel capacity calculation shows you only need 1.16 KHz for 3 Kbps at 7db. 2 KHz should achieve a very low error rate.

    I think anyone that heard #2 sans background noise vs typical HF SSB they would be blown away. That sounds really good. Certainly far better than the C4FM I routinely hear from repeaters.

    Amazing work.

Comments are closed.