Thomas Kurin is a Masters student at the Institute of Electronics Engineering, part of the Friedrich-Alexander University Erlangen-Nuremberg. Thomas and his supervisor Stefan Erhardt recently contacted me with some exciting news – a version of Codec 2 running at 450 bit/s, including a 16 kHz mode!
I’m very happy that Codec 2 can be used as a starting point for academic research, and thrilled about the progress Thomas has made. It’s also great for me to have a contribution to Codec 2 on such a deep technical level. Thomas has done a lot of work in vector quantisation, an area that I struggle in, and has developed an innovative 16 kHz mode.
Any speech codec running as low at 450 bit/s is pretty special, and this work is also a great starting point for further innovation and quality improvements.
Thomas has sent me some samples, and is working with me to merge his new mode into codec2-dev, and has kindly offered to tell us his story below. Well done Thomas
Here are Thomas (left) and Stefan (right) working together on their innovative 450 bit/s Codec 2 mode at the University of Erlangen:
Now this is bleeding edge, very low bit rate speech coding. Don’t expect Hi-Fi. The use case is channels where no other speech signal can get through. Ham radio operators will know what I mean.
If integrated with FreeDV, a 450 bit/s codec translates to a 2dB improvement in SNR over FreeDV 700D. This would support digital speech over HF radio at lower than -4dB SNR. Weak signal or FreeDV EME anyone?
Listen to the samples, and please add your comments. How does it compare to Codec 2 at 1300 and 700 bit/s? Could you communicate using this mode?
|Sample||1300||700C||450 8kHz||450 16kHz|
The samples I use are deliberately chosen to give codecs a hard time. mmt1 is the worst, it has high level truck background noise added to it.
As a masters student at FAU Erlangen-Nürnberg I have been working on and with Codec 2 and especially with 700C for the last few months. I experimented with the source code and the Vector Quantisation. This lead to the 450 bit/s Codec which shall be shown in the following post.
The 450 Codec builds on the 700C Codec. It consists of 3 major changes:
- The training data used for Vector Quantisation (VQ) was changed to include multiple languages, as for other languages the VQ sometimes performed poorly because certain vectors were missing. The used dataset consisted of approximately 30 minutes English, 30 minutes German, 15 minutes Russian, 15 minutes Chinese and 15 minutes Spanish. This Codebook allowed a switch from the two stage VQ with 9 * 2 = 18 bit per frame to a one stage VQ with just 9 bit per frame for VQ. The audio quality is of course somewhat changed, but still understandable.
- The energy quantisation was changed from 4 bit to 3 bit as no change in quality for the one stage VQ was detectable. This leads to a reduction of 9 + 1 = 10 bit, from a 28 bit frame to a 18 bit frame. This means a 450 bit/s Codec.
- The biggest change was the inclusion of a 16k mode. This means the 8kHz sampled and encoded signal can be converted to a 16kHz sampled signal at the decoder. To achieve this, the codebook was trained with data sampled at 16kHz. Then only the vectors for the 0-4 kHz frequencies (= 8kHz sample rate) are used for encoding. The decoder uses the indices to look up the 16kHz sampled and trained vectors. This works because a vector that is similar between 0-4 kHz mostly also looks similar in 4-8 kHz. This results in a pseudo-wideband format without any additional bits. For some speakers this improves quality, but for some speakers it just creates noise. But that’s the beauty of the system, as the same data can both be decoded to a 8kHz sampled signal, and to a 16kHz sampled signal. Therefore the transmitted signal stays the same and the receiver can choose dependent on quality which mode he wants to listen on.The 16k mode still needs some refinement but its fascinating that it works at all.
Over the next week or so I will patch the changes with David against codec2-dev. As it is a new codec, it shouldn’t change any of the other codecs when patching, as most of the patching will be to include new files.