My endeavor to produce a digital voice mode that competes with SSB continues. For a big chunk of 2016 I took a break from this work as I was gainfully employed on a commercial HF modem project. However since December I have once again been working on a 700 bit/s codec. The goal is voice quality roughly the same as the current 1300 bit/s mode. This can then be mated with the coherent PSK modem, and possibly the 4FSK modem for trials over HF channels.
I have diverged somewhat from the prototype I discussed in the last post in this saga. Lots of twists and turns in R&D, and sometimes you just have to forge ahead in one direction leaving other branches unexplored.
|vk5qi 1% BER||Listen||Listen|
Note the 700C samples are a little lower level, an artifact of the post filtering as discussed below. What I listen for is intelligibility, how easy is the same to understand compared to the reference 1300 bit/s samples? Is it muffled? I feel that 700C is roughly the same as 1300. Some samples a little better (cq_ref), some (ve9qrp_10s, mmt1) a little worse. The artifacts and frequency response are different. But close enough for now, and worth testing over air. And hey – it’s half the bit rate!
I threw in a vk5qi sample with 1% random errors, and it’s still usable. No squealing or ear damage, but perhaps more sensitive that 1300 to the same BER. Guess that’s expected, every bit means more at a lower bit rate.
Some of the samples like vk5qi and cq_ref are strongly low pass filtered, others like ve9qrp are “flat” spectrally, with the high frequencies at about the same level as the low frequencies. The spectral flatness doesn’t affect intelligibility much but can upset speech codecs. Might be worth trying some high pass (vk5qi, cq_ref) or low pass (ve9qrp_10s) filtering before encoding.
Below is a block diagram of the signal processing. The resampling step is the key, it converts the time varying number of harmonic amplitudes to fixed number (K=20) of samples. They are sampled using the “mel” scale, which means we take more finely spaced samples at low frequencies, with coarser steps at high frequencies. This matches the log frequency response of the ear. I arrived at K=20 by experiment.
The amplitudes and even the Vector Quantiser (VQ) entries are in dB, which is very nice to work in and matches the ears logarithmic amplitude response. The VQ was trained on just 120 seconds of data from a training database that doesn’t include any of the samples above. More work required on the VQ design and training, but I’m encouraged that it works so well already.
Here is a 3D plot of amplitude in dB against time (300 frames) and the K=20 frequency vectors for hts1a. You can see the signal evolving over time, and the low levels at the high frequency end.
The post filter is another key step. It raises the spectral peaks (formants) an lowers the valleys (anti-formants), greatly improving the speech quality. When the peak/valley ratio is low, the speech takes on a muffled quality. This is an important area for further investigation. Gain normalisation after post filtering is why the 700C samples are lower in level than the 1300 samples. Need some more work here.
The two stage VQ uses 18 bits, energy 4 bits, and pitch 6 bits for a total of 28 bits every 40ms frame. Unvoiced frames are signalled by a zero value in the pitch quantiser removing the need for a voicing bit. It doesn’t use differential in time encoding to make it more robust to bit errors.
Days and days of very careful coding and checks at each development step. It’s so easy to make a mistake or declare victory early. I continually compared the output speech to a few Codec 2 1300 samples to make sure I was in the ball park. This reduced the subjective testing to a manageable load. I used automated testing to compare the reference Octave code to the C code, porting and testing one signal processing module at a time. Sometimes I would just printf rows of vectors from two versions and compare the two, old school but quite effective and spotting the step where the bug crept in.
The Octave simulation code can be driven by the scripts newamp1_batch.m and newamp1_fby.m, in combination with c2sim.
To try the C version of the new mode:
codec2-dev/build_linux/src$ ./c2enc 700C ../../raw/hts1a.raw - | ./c2dec 700C - -| play -t raw -r 8000 -s -2 -
Some thoughts on FEC. A (23,12) Golay code could protect the most significant bits of 1st VQ index, pitch, and energy. The VQ could be organised to tolerate errors in a few of its bits by sorting to make an error jump to a ‘close’ entry. The extra 11 parity bits would cost 1.5dB in SNR, but might let us operate at significantly lower in SNR on a HF channel.
Over the next few weeks we’ll hook up 700C to the FreeDV API, and get it running over the air. Release early and often – lets find out if 700C works in the real world and provides a gain in performance on HF channels over FreeDV 1600. If it looks promising I’d like to do another lap around the 700C algorithm, investigating some of the issues mentioned above.