Low Order LPC and Bandpass Filtering

I’ve been working on the Linear Predictive Coding (LPC) modeling used in the Codec 2 700 bit/s mode to see if I can improve the speech quality. Given this mode was developed in just a few days I felt it was time to revisit it for some tuning.

LPC fits a filter to the speech spectrum. We update the LPC model every 40ms for Codec 2 at 700 bit/s (10 or 20ms for the higher rate modes).

Speech Codecs typically use a 10th order LPC model. This means the filter has 10 coefficients, and every 40ms we have to send them to the decoder over the channel. For the higher bit rate modes I use about 37 bits/frame for this information, which is the majority of the bit rate.

However I discovered I can get away with a 6th order model, if the input speech is filtered the right way. This has the potential to significantly reduce the bit rate.

The Ear

Our ear perceives speech based on the frequency of peaks in the speech spectrum. When the peaks in the speech spectrum are indistinct, we have trouble understanding what is being said. The speech starts to sound muddy. With analog radio like SSB (or in a crowded room), the troughs between the peaks fill with noise as the SNR degrades, and eventually we can’t understand what’s being said.

The LPC model is pretty good at representing peaks in the speech spectrum. With a 10th order LPC model (p=10) you get 10 poles. Each pair of poles can represent one peak, so with p=10 you get up to 5 independent peaks, with p=6, just 3.

I discovered that LPC has some problems if the speech spectrum has big differences between the low and high frequency energy. To find the LPC coefficients, we use an algorithm that minimises the mean square error. It tends to “throw poles” at the highest energy part of signal (frequently near DC), while ignoring the still important, lower energy peaks at higher frequencies above 1000Hz. So there is a mismatch in the way LPC analysis works and how our ears perceive speech.

For example I found that samples like hts1a and ve9qrp code quite well, but cq_ref and kristoff struggle. The former have just 12dB between the LF and HF parts of the speech spectrum, the latter 40dB. This may be due to microphones, input filtering, or analog shaping.

Another problem with using an unconventionally low LPC order like p=6 is that the model “runs out of poles”. Some speech signals may have 4 or 5 peaks, so the poor LPC model gets all confused and tries to reach a compromise that just sounds bad.

My Experiments

I messed around with a bunch of band pass filters that I applied to the speech samples before LPC modeling. These filters whip the speech signal into a shape that the LPC model can work with. I ran various samples (hts1a, hts2a, cq_ref, ve9qrp_10s, kristoff, mmt1, morig, forig, x200_ext, vk5qi) through them to come up with the best compromise for the 700 bits/mode.

Here is what p=6 LPC modeling sounds like with no band pass filter. Here is a sample of p=6 LPC modeling with a 300 to 2600Hz input band pass filter with very sharp edges.

Even though the latter sample is band limited, it is easier to understand as the LPC model is doing a better job of clearly representing those peaks.

Filter Implementation

After some experimentation with sox I settled on two different filter types: a sox “bandpass 1000 2000″ worked on some, whereas on others with more low frequency content “bandpass 1500 2000″ sounded better. Some helpful discussions with Glen VK1XX had suggested that a two band AGC was common in broadcast audio pre-processing, and might be useful here.

However through a process of frustrated experimentation (I was stuck on cq_ref for a day) I found that a very sharp skirted filter between 300 and 2600Hz did a pretty good job. Like p=6 LPC, a 2600Hz cut off is quite uncommon for speech coding, but SSB users will find it strangely familiar…….

Note that for the initial version of the 700 bit/s mode (currently in use in FreeDV 700) I have a different band pass filter design I chose more or less at random on the day that sounds like this with p=6 LPC. This filter now appears to be a bit too severe.

Plots

Here is a little chunk of speech from hts1a:

Below are the original (red) and p=6 LPC models (green line) without and with a sox “bandpass 1000 2000″ filter applied. If the LPC model was perfect green and red would be superimposed. Open each image in a new browser tab then jump back and forth. See how the two peaks around 550 and 1100Hz are better defined with the bandpass filter? The error (purple) in the 500 – 1000 Hz region is much reduced, better defining the “twin peaks” for our long suffering ears.

Here are three spectrograms of me saying “D G R”. The dark lines represent the spectral peaks we use to perceive the speech. In the “no BPF” case you can see the spectral peaks between 2.2 and 2.3 seconds are all blurred together. That’s pretty much what it sounds like too – muddy and indistinct.

Note that compared to the original, the p=6 BPF spectrogram is missing the pitch fundamental (dark line near 0 Hz), and a high frequency peak at around 2.5kHz is indistinct. Turns out neither of these matter much for intelligibility – they just make the speech sound band limited.

Next Steps

OK, so over the last few weeks I’ve spent some time looking at the effects of microphone placement, and input filtering on p=6 LPC models. Now time to look at quantisation of the 700 mode parameters then try it again over the air and see if the speech quality is improved. To improve performance in the presence of bit errors I’d also like to get the trellis based decoding into a real world usable form. When the entire FreeDV 700 mode (codec, modem, error handling) is working OK compared to SSB, time to look at porting to the SM1000.

Command Line Magic

I’m working with the c2sim program, which lets me explore Codec 2 in a partially quantised or incomplete state. I pipe audio in and out between various sox stages.

Note these simulations sound a lot better than the final Codec 2 at 700 bit/s as nothing else is quantised/decimated, e.g. it’s all at a 10ms frame rate with original phases. It’s a convenient way to isolate the LPC modeling step with as much fidelity as we can.

If you want to sing along here are a couple of sample command lines. Feel free to ask me any questions:

sox -r 8000 -s -2 ../../raw/hts1a.raw -r 8000 -s -2 -t raw - bandpass 1000 2000 | ./c2sim - --lpc 6 --lpcpf -o - | play -t raw -r 8000 -s -2 -
 
sox -r 8000 -s -2 ../../raw/cq_ref.raw -r 8000 -s -2 -t raw - sinc 300 sinc -2600 | ./c2sim - --lpc 6 --lpcpf -o - | play -t raw -r 8000 -s -2 -

Reading Further

Open Source Low Rate Speech Codec Part 2
LPC Post Filter for Codec 2

Microphone Placement and Speech Codecs

This week I have been looking at the effect different speech samples have on the performance of Codec 2. One factor is microphone placement. In radio (from broadcast to two way HF/VHF) we tend to use microphones closely placed to our lips. In telephony, hands free, or more distance microphone placement has become . . . → Read More: Microphone Placement and Speech Codecs

Self Driving Cars

I’m a believer in self driving car technology, and predict it will have enormous effects, for example:

Our cars currently spend most of the time doing nothing. They could be out making money for us as taxis while we are at work.
How much infrastructure and frustration (home garage, driveways, car parks, finding a park) do we . . . → Read More: Self Driving Cars

FreeDV Robustness Part 6 – Early Low SNR Results

Anyone who writes software should be sentenced to use it. So for the last few days I’ve been radiating FreeDV 700 signals from my home in Adelaide to this websdr in Melbourne, about 800km away. This has been very useful, as I can sample signals without having to bother other Hams. Thanks John! . . . → Read More: FreeDV Robustness Part 6 – Early Low SNR Results

8 Mega Watts in your bare hands

I recently went on a nice road trip to Gippstech, an interstate Ham radio conference, with Andrew, VK5XFG. On the way, we were chatting about Electric Cars, and how much of infernal combustion technology is really just a nasty hack. Andrew made the point that if petrol cars had been developed now, we would . . . → Read More: 8 Mega Watts in your bare hands

Trellis Decoding for Codec 2

OK, so FreeDV 700 was released a few weeks ago and I’m working on some ideas to improve it. Especially those annoying R2D2 noises due to bit errors at low SNRs.

I’m trying some ideas to improve the speech quality without the use of Forward Error Correction (FEC).

Speech coding is the art of “what can I . . . → Read More: Trellis Decoding for Codec 2

WTF Internal Combustion?

At the moment I’m teaching my son to drive in my Electric Car. Like my daughter before him it’s his first driving experience. Recently, he has started to drive his grandfathers pollution generator, which has a manual transmission. So I was trying to explain why the clutch is needed, and it occurred to . . . → Read More: WTF Internal Combustion?

FreeDV Robustness Part 5 – FreeDV 700

We’ve just released FreeDV v0.98 GUI software, which includes the new FreeDV 700 mode. This new mode has poorer speech quality than FreeDV 1600 but is far more robust, close to SSB on low SNR fading HF channels. Mel Whitten and the test team have made contacts over 1000 km using just 1 Watt!

You can . . . → Read More: FreeDV Robustness Part 5 – FreeDV 700

New Charger for my EV

On Sunday morning I returned home and plugged in my trusty EV to feed it some electrons. Hmm, something is wrong. No lights on one of the chargers. Oh, and the charger circuit breaker in the car has popped. Always out for adventure, and being totally incompetent at anything above 5V and 1 . . . → Read More: New Charger for my EV

Lower SNR limit of Digital Voice

I’m currently working on a Digital Voice (DV) mode that will work at negative SNRs. So I started thinking about where the theoretical limits are:

Lets assume we have a really good rate 0.5 FEC code that approaches the Shannon Limit of perfectly correcting random bit errors up to a channel BER of 12%
A real-world code . . . → Read More: Lower SNR limit of Digital Voice