LPCNet and Codec 2 Part 2

Since the last post I’ve trained LPCNet using quantised Codec 2 parameters. I’ve also modified the Codec 2 decoder (c2dec) to dump the received features to a disk file so I can use them to synthesise speech using LPCNet.

So now I can decode a Codec 2 bit stream using a LPCNet decoder:

Condition Sample
Original 8kHz sample Original
Codec 2 1300 encoder and Codec 2 decoder codec2_1300
Codec 2 1300 encoder and LPCNet decoder lpcnet_1300
Codec 2 2400 encoder and Codec 2 decoder codec2_2400
Codec 2 2400 encoder and LPCNet decoder lpcnet_2400
Codec 2 unquantised 6th order LSP and LPCNet decoder lpcnet_6lsps

I think it sounds pretty good, especially at 2400 bits/s. Even the lpcnet_1300 decoder sounds better than Codec 2 at 2400. LPCNet is much more natural, without the underwater sound of low rate vocoded speech.

Less Features

The Kleijn et al paper showed NN based synthesises can produce high quality speech from feature sets of legacy vocoders like Codec 2. This implies the legacy vocoders are sending a lot of extra information that is not normally used by the legacy decoder/synthesis algorithms. If high quality speech is not required, it could be argued we are sending “too much” information, and scope exists to reduce the bit rate by coarser quantisation or sending less features.

The lpcnet_6lsps sample uses just 6th order Linear Prediction, in the form of 6 Line Spectral Pairs (LSPs). A LPC order of 10 is common for 8 kHz sampled speech. There is no way 6th order LPC would work (i.e. provide intelligible speech) for any existing LPC based vocoder (CELP, MELP etc). This sample has some odd artefacts; but is intelligible, and also quite natural sounding compared to a vocoder.

Curiously, when the synthesis breaks down (e.g. “depth of the well”), it sounds to me like the pitch is halved. I can hear this on both the male and female segments.

Next Steps

I still have much to learn in this area, but the initial results are promising, especially at 2400 bit/s. The main difference between the Codec 2 feature set at 2400 and 1300 is the update rate, so it would be useful to explore those differences further with a series of tests.

This work is not far from being usable “over the air” as a FreeDV mode, and promises a significant jump in quality. Codec 2 1300 is the vocoder used for FreeDV 1600, so it may be possible to develop a drop in replacement for initial testing.

The “best” (in terms of quality at a given bit rate) encoder and feature set (especially under quantisation) is an open research question. It is amazing that it works with the Codec 2 bit stream at all – we should be able to do better with a custom encoder.

3 thoughts on “LPCNet and Codec 2 Part 2”

  1. Hi David,
    congratulations, those samples sound great!
    As the EsHailsat2 will be in operation in about 1 month, I think that is what we need to compete with the D-Star folks. I am ready to try 2400 mode without error correction, or any other mode, over the satellite. Will need to set up the linux coding / decoding :-).
    Gerhard OE3GBB

    1. Hi Gerhard,
      That sounds like a fine project. Yes – there are a few chunks of C code we would need to write to get such a system working, but many of the building blocks are there.

      1. Hi,
        one big problem will be the drifting LO frequency of standard LNBs, which even makes it hard to copy SSB. So any DV or digital mode will only be possible with special LO LNBs or at few WEBSDRs having that feature. One way would be automatic tuning of the decoder within a wider passband, more as we have now with FreeDV. A new and wide field for experiments :-).

Leave a Reply

Your email address will not be published. Required fields are marked *