Jean-Marc Valin has been working on Neural Network (NN) based speech synthesis in his project called LPCNet. It has similar speech quality to Wavenet, but is based on an architecture called WaveRNN, and includes many new innovations.
Jean-Marc’s work is aimed at reducing the synthesis CPU load down to the level of a modern CPU, for example a mobile phone or Raspberry Pi, and he has made significant progress in that direction.
As well as being useful for his research – this code is a working, open source reference system for Neural Net (NN) based synthesis projects. He has also written an ICASSP 2019 paper on LPCNet, which explains many of the finer details of NN speech synthesis. Fantastic resources for other people coming up to speed in NN synthesis. Well done Jean-Marc!
Over the past few weeks Jean-Marc has kindly answered many NN-noob questions from me. I have used the answers to comment his code and add to his README. There are still many aspects of how this code works that I do not understand. However I can drive his software well enough to synthesise high quality speech:
The first sample was from inside the training database, the second outside.
The network is driven by some speech codec like parameters, but it’s not actually running as a speech codec at present. However it’s a great starting point for high quality speech (de)coding, or indeed speech synthesis.
How I trained
My GTX1060 GPU isn’t quite up to spec, so for training I had to reduce the batch_size to 16, and run for 60 epochs. I used the TSP speech database discussed in the LPCNet README, and followed Jean-Marc’s suggestion of resampling it twice (once at +5% Fs, once at -5% Fs), to get 3x the training data. It took 14 hours for me to train. Synthesis runs 10 times slower than real time on my GPU, however much of this is overhead. If the Keras code was ported to C – it would be close to real time on a modern laptop/phone CPU.
Jean Marc’s blog post on LPCNet, including links to LPCNet source code and his ICASSP 2019 paper.
WaveNet and Codec 2
FFTNet, some good figures that helped me get my head around the idea of sampling a probability distribution.