Yesterday my friend and fellow open source speech coder Jean-Marc Valin (of Speex and Opus fame) emailed me with some exciting news. W. Bastiaan Kleijn and friends have published a paper called “Wavenet based low rate speech coding“. Basically they take bit stream of Codec 2 running at 2400 bit/s, and replace the Codec 2 decoder with the WaveNet deep learning generative model.
What is amazing is the quality – it sounds as good an an 8000 bit/s wideband speech codec! They have generated wideband audio from the narrowband Codec model parameters. Here are the samples – compare “Parametrics WaveNet” to Codec 2!
This is a game changer for low bit rate speech coding.
I’m also happy that Codec 2 has been useful for academic research (Yay open source), and that the MOS scores in the paper show it’s close to MELP at 2400 bit/s. Last year we discovered Codec 2 is better than MELP at 600 bit/s. Not bad for an open source codec written (more or less) by one person.
Now I need to do some reading on Deep Learning!