I’m spending a month or so improving the speech quality of a couple of Codec 2 modes. I have two aims:
- Make the 700 bit/s codec sound better, to improve speech quality on low SNR HF channels (beneath 0dB).
- Develop a higher quality mode in the 2000 to 3000 bit/s range, that can be used on HF channels with modest SNRs (around 10dB)
I ran some numbers on the new OFDM modem and LDPC codes, and turns out we can get 3000 bit/s of codec data through a 2000 Hz channel at down to 7dB SNR.
Now 3000 bit/s is broadband for me – I’ve spent years being very frugal with my bits while I play in low SNR HF land. However it’s still a bit low for Opus which kicks in at 6000 bit/s. I can’t squeeze 6000 bit/s through a 2000 Hz RF channel without higher order QAM constellations which means SNRs approaching 20dB.
So – what can I do with 3000 bit/s and Codec 2? I decided to try wideband(-ish) audio – the sort of audio bandwidth you get from Skype or AM broadcast radio. So I spent a few weeks modifying Codec 2 to work at 16 kHz sample rate, and Jean Marc gave me a few tips on using DCTs to code the bits.
It’s early days but here are a few samples:
|2||Codec 2 Model, orignal amplitudes and phases||Listen|
|3||Synthetic phase, one bit voicing, original amplitudes||Listen|
|4||Synthetic phase, one bit voicing, amplitudes at 1800 bit/s||Listen|
|5||Simulated analog SSB, 300-2600Hz BPF, 10dB SNR||Listen|
Couple of interesting points:
- Sample (2) is as good as Codec 2 can do, its the unquantised model parameters (harmonic phases and amplitudes). It’s all down hill from here as we quantise or toss away parameters.
- In (3) I’m using a one bit voicing model, this is very vocoder and shouldn’t work this well. MBE/MELP all say you need mixed excitation. Exploring that conundrum would be a good Masters degree topic.
- In (3) I can hear the pitch estimator making a few mistakes, e.g. around “sheet” on the female.
- The extra 4kHz of audio bandwidth doesn’t take many more bits to encode, as the ear has a log frequency response. It’s maybe 20% more bits than 4kHz audio.
- You can hear some words like “well” are muddy and indistinct in the 1800 bit/s sample (4). This usually means the formants (spectral) peaks are not well defined, so we might be tossing away a little too much information.
- The clipping on the SSB sample (5) around the words “depth” and “hours” is an artifact of the PathSim AGC. But dat noise. It gets really fatiguing after a while.
Wideband audio is a big paradigm shift for Push To Talk (PTT) radio. You can’t do this with analog radio: 2000 Hz of RF bandwidth, 8000 Hz of audio bandwidth. I’m not aware of any wideband PTT radio systems – they all work at best 4000 Hz audio bandwidth. DVSI has a wideband codec, but at a much higher bit rate (8000 bits/s).
Current wideband codecs shoot for artifact-free speech (and indeed general audio signals like music). Codec 2 wideband will still have noticeable artifacts, and probably won’t like music. Big question is will end users prefer this over SSB, or say analog FM – at the same SNR? What will 8kHz audio sound like on your HT?
We shall see. I need to spend some time cleaning up the algorithms, chasing down a few bugs, and getting it all into C, but I plan to be testing over the air later this year.
Let me know if you want to help.
Unquantised Codec 2 with 16 kHz sample rate:
$ ./c2sim ~/Desktop/c2_hd/speech_orig_16k.wav --Fs 16000 -o - | play -t raw -r 16000 -s -2 -
With “Phase 0” synthetic phase and 1 bit voicing:
$ ./c2sim ~/Desktop/c2_hd/speech_orig_16k.wav --Fs 16000 --phase0 --postfilter -o - | play -t raw -r 16000 -s -2 -
FreeDV 2017 Road Map – this work is part of the “Codec 2 Quality” work package.
Codec 2 page – has an explanation of the way Codec 2 models speech with harmonic amplitudes and phases.