VQ Index Optimisation

Following on from the subset vector quantiser [1] I’ve been working on improving speech quality in the presence of bit errors without the use of Forward Error Correction (FEC). I’ve tried two techniques:

  1. Optimisation of VQ indexes [2][3]
  2. Trellis based maximum likelihood decoding of a sequence of VQ indexes [4]

Digital speech and FEC isn’t a great mix. Speech codecs often work fine with a few percent BER, which is unheard of for data applications that need zero bit errors. This is because humans can extract intelligible information from payload speech data with errors. On the channels of interest we like to play at low SNRs and high BERs (around 10%). However not many FEC codes work at 10% BER, and the ones that do require large block sizes, introducing latency which is problematic for Push To Talk (PTT) speech. So I’m interested in exploring alternatives to FEC that allow gradual degradation of speech in channels with high bit error rates.

Vector Quantisation

Here is a plot that shows a 2 dimensional Vector Quantiser (VQ) in action. The cloud of small dots is the source data we wish to encode. Each dot represents 2 source data samples, plotted for convenience on a 2D plot. The circles are the VQ entries, which are trained to approximate the source data. We arrange the VQ entries in a table, each with a unique index. To encode a source data pair, we find the nearest VQ pair, and send the index of that VQ entry over the channel. At the decoder, we reconstruct the pair using a simple table look up.

When we get a bit error in that index, we tend to jump to some random VQ entry, which can introduce a large error in the decoded pair. Index optimisation re-arranges the VQ indexes so a bit error in the received index will result in jumping to a nearby VQ entry, minimising the effect of the error. This is shown by the straight lines in the plot above. Each line shows the decoded VQ value for a single bit error. They aren’t super close (the nearest neighbours), as I guess it’s hard to simultaneously optimise all VQ entries for all single bit errors.

For my experiments I used a 16-th order VQ, so instead of two pairs there were 16 samples encoded for each VQ entry. These 16 samples represent the speech spectrum. Hard to plot a 16 dimensional value but the same ideas apply. We can optimise the VQ indexes so that a single bit error will lead to a decoded value “close” to the desired value. This gives us some robustness to bit errors for free. Unlike Forward Error Correction, no additional bits need to be sent. Also unlike FEC – the errors aren’t corrected – just masked to a certain extent (the decoded speech sounds a bit better).

The Trellis system is described in detail in [4]. This looks at a sequence of received VQ indexes, and the trajectory they take on the figure above. It makes the assumption that the speech signal changes fairly slowly in time, so the trajectory we trace on the figure above tends to be small jumps across the VQ “space”. A large jump means a possible bit error. Simultaneously, we look at the likelihood of receiving each vector index. In a noisy channel, we are more sure of some bits, and less sure of others. These are all arranged on a 2D “trellis” which we search to find the most likely path. This tends to work quite well when the vectors are sampled at a high rate (10 or 20ms), less so when we sample them less often (say 40ms).

The details of the two algorithms tested are in the GitHub PRs [5][6].

Results

This plot compares a few methods. The x axis is normalised SNR, and the y-axis spectral distortion. Results were an average of 30 seconds of speech on an AWGN channel.

  1. No errors (blue, bottom), is the basic VQ quantiser with no channel errors. Compared to the input samples, it has an average distortion of 3dB. That gets you rough but usable “communications quality” speech.
  2. Vanilla AWGN (green) is the spectral distortion as we lower the Eb/No (SNR), and bit errors gradually creep in. FreeDV 700C [8] uses no FEC so would respond like this to bit errors.
  3. The red curve is similar to FreeDV 700D/E – we are using a rate 0.5 LDPC code to protect the speech data. This works well until it doesn’t – you hit a threshold in SNR and the code falls over, introducing more errors than it corrects.
  4. The upper blue curve is the index optimised VQ (using the binary switch algorithm). This works pretty well (compare to green) and zero cost – we’ve just shuffled a few indexes in the VQ table.
  5. Black is when we combine the FEC with index optimisation. Even better at high Eb/No, and the “knee” where the FEC falls over is much less obvious than “red”.
  6. Cyan is the Trellis decoder, quite a good result at low Eb/No, but a “long tail” – it makes a few mistakes even at high Eb/No.

Here are some speech samples showing the index optimisation and trellis routines in action. They were generated at an Eb/No = 1dB (6% BER) operating point on the plot above. The Codec 2 700C mode is provided as a control. In these tests of spectral distortion the 700C mode uses 22 bits/frame at a 40ms frame rate, the “600” mode just 12 bits/frame at a 30ms frame rate. I’ve just applied the index optimisation and trellis decoding to the 600 mode.

Mode BER VQ Decoder Sample1 Sample2 Sample3
700C 0.00 Orig Normal Listen Listen Listen
700C 0.06 Orig Normal Listen Listen Listen
600 0.00 Orig Normal Listen Listen Listen
600 0.06 Orig Normal Listen Listen Listen
600 0.06 Opt Normal Listen Listen Listen
600 0.06 Opt Trellis Listen Listen Listen

The index optimisation seems effective, especially on samples 1 and 2. The improvements are less noticeable on the longer sample3, although the longer sample makes it harder to do a quick A/B test. The Trellis scheme is even better at reducing the pops and clicks, but I feel on sample1 at least it tends to “smooth” the speech, so it becomes a little less intelligible.

Discussion

In this experiment I compared the spectral distortion of two non-redundant techniques to FEC based protection on a Spectral Distortion versus Eb/No scale.

While experimenting with this work I found an interesting trade off between update rate and error protection. With a higher update rate, we notice errors less. Unfortunately this increased the bit rate too.

The non-FEC techniques have a gradual “fuzzy” degradation versus a knee. This is quite useful for digital speech systems, e.g. at the bottom of a fade we might get “readability 3” speech, that bounce back up to “readability 5” after the fade. The ear will put it all back together using “Brain FEC”. With FEC based schemes you get readability 5 – R2D2 noises in the fade – then readability 5.

So non-FEC schemes have some potential to lower the “minimum SNR” the voice link can handle.

It’s clear that index optimisation does help intelligibility, with or without FEC.

At low Eb/No, the PER is 50%! So every 2nd 12-bit vector index has at least 1 bit error, and yet we are getting (readability 3 – readable with difficulty) speech. However the rule of thumb I have developed experimentally still applies – you still need PER=0.1/BER=0.01 for “readability 5” speech.

There are some use cases where not using FEC might be useful. A rate 0.5 FEC system requires twice the RF bandwidth, and modem synchronisation is harder as each symbol has half the energy. It introduces latency as the FEC codewords for a decent code are larger than the vocoder frame size. When you lose a FEC codeword, you tend to lose a large chunk of speech. Frame sync is slower, as it happens at the FEC codeword rate, making recovery after a deep fade or PTT sync slower. So having a non-FEC alternative in our toolkit for low SNR digital speech is useful.

On a personal note I quite enjoyed this project. It was tractable, and I managed to get results in a reasonable amount of time without falling down too many R&D rabbit holes. It was also fun and rewarding to come to grips with at least some of the math in [3]. I am quite pleased that (after the usual fight with the concept of coding gain) I managed to reconcile FEC and non-FEC results on a single plot that roughly compares to the perceptual quality of speech.

Ideas for further work:

  1. Test across a larger number of samples to get a better feel for the effectiveness of these algorithms.
  2. The index optimisation could be applied to Codec 2 700C (and hence FreeDV 700C/D/E). This would however break compatibility.
  3. More work with the trellis scheme might be useful. In a general sense, this is a model that takes into account various probabilities, e.g. how likely is it that we received a certain codeword? We could also include source probability information – e.g. for a certain speaker (or globally across all speakers) some vectors will be more likely than others. The probability tables could be updated in real time, as when the channel is not faded, we can trust that each index is probably correct.
  4. The “600” mode above is a prototype Codec 2 mode based on this work and [1]. We could develop that into a real world Codec 2/FreeDV mode and see how it goes over the air.
  5. The new crop of Neural Net vocoders use the same parameter set and VQ, so index optimisation/FEC trade offs may also be useful there (they may do this already). For example we could run FreeDV 2020 without any FEC, freeing up RF bandwidth for more pilot symbols so it handles HF channels better.

Reading Further

[1] Subset Vector Quantiser
[2] Codec 2 at 450 bit/s
[3] Pseudo-Gray coding, K. Zeger; A. Gersho, 1990
[4] Trellis Decoding for Codec 2
[5] PR – Vector Quantiser Index Optimisation
[6] PR – Trellis decoding of VQ
[7] RST Scale
[8] FreeDV Technology

Controlled FreeDV Testing

This post presents some results from a controlled experiment to test FreeDV against SSB over a range of real world HF channels.

As described in [2] I take a 10 second speech sample and transmit it using SSB then FreeDV. By transmitting them more or less at the same time, we get to test them over the same channel. The peak power of the SSB and FreeDV signals are adjusted to be the same. The SSB is compressed [3] so that we are generating as much SSB talk power as possible.

Over the past few weeks we’ve collected 1158 samples of FreeDV and SSB signals, using a network of KiwSDRs and some scripts to automate the collection of off-air samples. Every 30 minutes my IC7200 would click and whir into life as the laptop connected to it transmitted a test signal in various FreeDV modes. Simultaneously, the signals would be received by a remote KiwiSDR and recorded as a wave file. This was then decoded to produce a side by side SSB versus FreeDV audio signal.

Jose, LU5DKI, also collected some samples for me using his station. Jose is very experienced at using FreeDV with SDRs over international DX paths.

I’ll present some of the more interesting samples here. If you open the spectrogram in a new tab you should be able to see a larger version. The spectrograms are like a waterfall but time flows left to right. The tests were conducted with FreeDV 700C/D/E [4].

Serial Comment SSB & FreeDV Spectrogram
0036 Lower limit of 700D, which can handle lower SNRs than other modes. SSB very difficult copy. Listen
0105 FreeDV 700C with some co-channel SSB, results in a few errors in the decoded voice, SSB affected more. Listen
0106 Fade at the start of SSB and 700D, however 700D takes a few seconds to sync up Listen
0119 Impulse noise, e.g. atmospherics, lightning, largely suppressed in decoded FreeDV. Listen
0146 Slow fade, occasionally nulling out entire signal Listen
0500 700D falling over on a weak fast fading/NVIS channel, note the moth-eaten spectrogram Listen
0501 700E coping with the same NVIS channel as Serial 0500 – 700E is designed to handle fast fading Listen
0629 “Barber pole” frequency selective fading carving a notch out of the 700D signal, but no errors at 5dB SNR. SSB improves only slowly with increasing SNR – still quite noisy. Listen
j01 DX 700D from Argentina to Rotorua, New Zealand. Both SSB and FreeDV pretty good. Listen
j03 700D DX sample from Argentina to Ireland, SSB a weak copy but 700D not getting sync Listen
j03 700E As above but with 700E, which is decoding with some errors. 700E was designed for long distance paths Listen

Notes:

  1. The initial “hello” sounds buzzy as the microphone equaliser [5] is still kicking in.
  2. Most of these samples are low SNR as that’s an interesting area of operation for me. Also – the experiment didn’t collect many high SNR samples, perhaps due to the limitations of my station (50W into a simple dipole) and band conditions during the experiment. I would have liked to have collected some high SNR/fast fading examples.
  3. Unfortunately the output audio levels aren’t normalised, so the FreeDV part of the sample will sound louder. I didn’t apply any noise cancellation to the SSB samples, but if you have access to such software please feel free to download the samples and see what you can do.

Credits

Special thanks to Jose LU5DKI, Mooneer K6AQ, Peter VK3RV for their help in manually collecting samples and and discussion around these experiments. Thanks also to the KiwiSDR community, a very useful resource for experimental radio.

Links

[1] Codec 2 HF Data Modes Part 3 Similar script based automated testing for Codec 2 HF data modes.
[2] Automated Voice Testing Pull Request, containing more details of the experiment and test software.
[3] FreeDV 700E and Compression, including a description of the SSB compressor used here.
[4] Summary of various modes in FreeDV manual.
[5] Codec 2 700C Equaliser Part 2

Subset Vector Quantiser

I’ve returned to speech coding after a year of playing with VHF and HF modems. This is somewhat daunting for me, as speech coding is R&D, which tends to be very open ended; it’s possible to work for months with no clear outcomes. In contrast the modem work is straight forward engineering, and I get the positive feedback of “having stuff work” on a regular basis.

So I’m trying to time box the speech coding projects to a few days work each. This is quite a personal challenge, as there are just so many variables and paths to follow. It’s so easy to go off on a tangent and watch the months pass!

In this particular project I’m looking at the Codec 2 700C Vector Quantiser (VQ), and exploring ways to make it more inherently robust to high pass and low pass filtering at the edges of the spectrum. The broad goal is to improve speech quality at a given bit rate, or support a lower bit rate at the same speech quality. I’m targeting bit rates in the 600 bit/s range, and the lower end of the quality range (communications quality speech).

Subset Vector Quantiser

It’s well known that the most important speech information is between 300 and 3000 Hz. Energy outside that range makes the speech sounds nicer, but doesn’t help intelligibility much. Analog modes such as SSB exploit this by band limiting the speech so that the transmitter power is used just to punch through information in the narrow bandwidth that matters.

With digital speech the RF bandwidth is not directly linked to the bandwidth of the decoded speech. For example FreeDV 700D uses around 1100 Hz of RF bandwidth, but the decoded speech has energy covering most of the 0 to 4000 Hz range.

Codec 2 encodes and transmits the speech spectrum on a regular basis. As well as encoding parts of the spectrum necessary for intelligibility, it also has to encode other features, such as the high pass and low pass roll off the microphone and analog filters in the sound card. These features don’t carry any intelligibility, but bits are consumed to encode them. When you have just 28 bits/frame – every bit matters!

Turns out that some of the wilder variations in the speech spectrum from different sources of speech are in the 0-300Hz and 3000-4000Hz regions, for example different high pass or low pass filtering for a particular microphone or sound card. This can upset the Codec 2 quantiser, e.g. it might expend bits modelling a particular low pass filter response – lowering the quality of the perceptually important 300-3000 Hz information.

So I’ve prototyped a Vector Quantiser (VQ) that just uses the information in the 300-3000 Hz range to quantise the speech spectrum. However the VQ is trained on the full range, so it uses that full range to synthesise the speech, hopefully recovering some of the extended spectrum. There is also a limiter before the VQ to reduce the dynamic range of the frames.

Here are some samples processed with the newamp1 two stage VQ, and the new subset VQ. Like the newamp1 algorithm, it works on vectors of K=20 mel-spaced magnitude samples. The speech codec is only partially quantised (10ms frames, original phase, unquantised pitch), so it would sound worse in a real world fully quantised codec. The “newamp1” algorithm is used for Codec 2 700C [1] (and employed in FreeDV 700C/D/E), and uses 22 bit/frame (550 bits/s at a 40ms frame rate). The subset VQ uses just 12 bits frame (300 bits/s at a 40ms frame rate).

Filename newamp1 subset
big dog Listen Listen
cap Listen Listen
fish Listen Listen
hts2a Listen Listen

There are some samples where newamp1 sounds louder – this could be due to the gain limiter stage in the subset algorithm constraining the dynamic range. There also seems to be more high frequency response with newamp1, indicating subset is not recovering the high frequency speech energy as I had hoped. Both are quite intelligible, and acceptable for communications quality speech.

This table presents the mean square spectral distortion in dB*dB:

Filename newamp1 subset
big dog 6.05 8.56
cap 9.03 8.10
fish 10.81 7.31
hts2a 10.51 8.62

Both the samples and objective results show the subset VQ is holding up OK next to the reference newamp1, despite the low bit rate. I’ve found that around 9 dB*dB spectral distortion gives acceptable results for my (low communications quality) use case.

I tried adding an artificial low pass filter above 3400 Hz to a couple of the input samples, to simulate what might happen from different microphones.

It seems to work OK with the low pass filtered input samples, they sound pretty similar to the sample using the original source (one of the goals of this work):

Filename subset subset low pass
cap Listen Listen
fish Listen Listen

Conclusions and Further work

I’m surprised that a single stage VQ is working quite well at just 12 bits/frame. It’s also encoding both the average frame energy and the spectral shape (I use a separate scalar quantiser for frame energy in newamp1). This is quite optimal from a VQ theory point of view, but sometimes not possible due to practical concerns such as the storage/CPU requirements for a single stage VQ.

Further work ideas:

  1. Try a few more samples, and push this work through to a fully quantised speech codec.
  2. Seeing it’s doing well with a single stage, it would interesting to see if it sounds better with multiple stages.
  3. A single stage VQ enables other tricks, like non-FEC techniques to make the VQ robust to bit errors, such as sorting the VQ indexes [2] or Neural Net style training with bit errors.
  4. Try different companding curves instead of the hard limiter, to better represent louder signals.

Links

[1] Codec 2 700C
[2] Codec 2 at 450 bits/s. Another single VQ Codec 2 mode, that can generate high frequency information using a similar approach to this post.
[3] Codec 2 700C Equaliser Part 2. A previous approach to handle a similar problem, in this case the speech is equalised before hitting a full band VQ.
[4] The script train_sub_quant.sh in This GitHub PR is used to perform the experiments documented in this post.

Simple FreeDV API Examples

Here are some simple examples showing how to use the Codec 2 and FreeDV API in C and Python applications.

Some of the other examples have grown a little too full featured and complex so I thought it might be useful to show the simplest possible programs. Instructions at the top of each file.

The Quisk SDR is a fine example of a GUI application that uses the FreeDV API.

I’m intrigued by the idea of using the FreeDV API with other languages such as Python. I first became aware of this idea through this FreeDV KISS TNC project from xssfox.

Using similar techniques Mark, VK5QI, has implemented a High Altitude Balloon (HAB) telemetry library using a fork of C modem/protocol code from Codec 2 combined with Python. The project includes a nice Python GUI application for receiving HAB telemetry.

Codec 2 HF Data Modes Part 3

It works! I’ve spent the last few weeks building automated tests for the new HF data modes. All three modes are working well over real world 40m/20m channels at distances between 100km and 3000km. My goals were:

  • Transfer a total 1 Mbyte of data – that’s quite a bit for HF.
  • Find some fast fading and long delay spread channels. Hence the 100km (NVIS) tests.
  • Run the tests over a week to experience a range of HF conditions.
  • Look for any corner cases that break the modems.
  • Collect a bunch of real world samples to support further development.

I ended up with 649 off air samples, that I can inspect by browsing the spectograms:

Here is a histogram of the SNRs tested:

The data modes work fine in simulation down to -4dB, so I’m pleased to see the same is true on real world channels. Curiously I didn’t get any really high SNRs, perhaps my station is too modest (IC7200 into a dipole). At some stage I will need to find a high SNR path to test QAM modes.

I did find a fast fading example (with bonus co-channel SSB):

The spectrogram of this sample shows the “barber pole” lines moving quickly as the notch sweeps through the modem spectrum:

Each test logs some data: in this case 5 good frames from 5 transmitted, despite the fast fading and SSB.

sdr.ironstonerange.com
datac3
waiting for KiwiSDR .
Tx data signal
Stopping KiwiSDR
Process receiver sample
modembufs:    183 bytes:   630 Frms.:     5 SNRAv:  3.66
BER......: 0.0354 Tbits: 10240 Terrs:   363
Coded BER: 0.0000 Tbits:  5120 Terrs:     0
Coded FER: 0.0000 Tfrms:     5 Tfers:     0
FrmGd FrmDt Bytes SNRAv RawBER    Pre  Post UWfails
    5     5   630  3.66 0.0354      5     0       0

Unfortunately I didn’t collect any large delay spread examples – these would show up as closely spaced notches in the spectrogram. In this example the notches are 700Hz apart:

… which means 1/700=1.4ms of delay spread. The worst case sample had notches spaced by 500 Hz (2ms delay spread). The modes I tested are designed to handle up to 6ms so they didn’t get much of a work out. I might need some help from friends in high latitudes….

Running a test

Each test is controlled by the ota_test.sh script, for example:

./ota_test.sh kiwisdr.areg.org.au

This records the received signal from the KiwiSDR while transmitting data frames using my HF radio. After the transmission is complete, the received wave file is run through the demodulator. The received frames are checked and we log some statistics. There are set-up instructions at the top of the script, and even some help:

./ota_test.sh

Automated Over The Air (OTA) data test for FreeDV OFDM HF data modems

  usage ./ota_test.sh [-d] [-f freq_kHz] [-t] [-n Nbursts] [-o model] [-p port] kiwi_url

    -d        debug mode; trace script execution
    -o model  select radio model number ('rigctl -l' to list)
    -m mode   datac0|datac1|datac3
    -t        Tx only, useful for manually observing SDRs which block multiple sessions from one IP

I also wrote scripts to automate the tests via cron, and to summarise results.

Looking at the data over time, I learnt a bit about HF propogation, e.g. I could see 100km NVIS falling over at sunset, but could still send packets 800km away for a few more hours (although with reduced performance). In the morning it was the reverse, I could see SNRs from a 800km path reducing, and NVIS SNRs building up.

I can think of a few improvements to my test system:

  • Rx from multiple KiwiSDRs at the same time, to collect data faster from multiple paths.
  • Add an option to pre-pend a wavefile with SSB for periodic station ID.
  • The system could be used to test voice modes: send a SSB signal, then a FreeDV voice mode signal, to compare them over the same channel. The samples could be normalised to the same peak power.

The tests also reminded me I haven’t tuned the compression (PAPR) of the data modes to get maximum performance out of them for a given peak power. Will work on that next.

Links
README_data – Codec 2 data mode documentation (HF OFDM raw data section)
HF OFDM Data testing – GitHub PR where I’m developing the automated tests.
Codec 2 HF Data Modes Part 1
Codec 2 HF Data Modes Part 2

Codec 2 HF Data Modes Part 2

Over the past few months I’ve been working on HF data modes, in particular building up a new burst acquisition system for our OFDM modem. As usual, what seemed like a small project turned out to be a lot of work! I’ve now integrated all the changes into the FreeDV API and started testing over the air, sending frames of data from a Tx at my home to remote SDRs all over Australia.

Features:

  • Importantly – this work is open source – filling a gap in the HF data world. HF is used for Ham radio, emergency communications and in the developing world where no other infrastructure exists. It needs to be open.
  • High performance waveforms designed for fast fading channels with modern FEC (thanks Bill, VK5DSP).
  • Implemented as a C library that can be cross compiled on many machines, and called from other programs (C and Python examples). You don’t need to be tied to one operating system or expensive, proprietary hardware.
  • Further development is supported by a suite of automated tests.

I’m not aiming to build a full blown TNC myself, just the layer that can move data frames over HF Radio channels. This seems to be where the real need lies, and the best use of my skills. I have however been working with TNC developers like Simon, DJ2LS. Together we have written a set of use cases that we have been developing against. This has been very useful, and a fun learning experience for both of us.

I’ve documented the Codec 2 HF data modes in README_data, which includes simple examples of how to use the API, and simulated/real world results.

Further work:

  • Automated testing over real world channels
  • Tuning performance
  • Port a higher bit rate QAM16 mode to C
  • Working with TNC developers
  • Prototype very simple low cost HF Data links using RTLSDRs and RpiTx transmitters

Reading Further

HF Acquisition Pull Request – journal of the recent development
README_data – Codec 2 data mode documentation (HF OFDM raw data section)
Codec2 HF Data Modes Part 1

FreeDV 700E and Compression

FreeDV 700D [9] is built around an OFDM modem [6] and powerful LDPC codes, and was released in mid 2018. Since then our real world experience has shown that it struggles with fast fading channels. Much to my surprise, the earlier FreeDV 700C mode actually works better on fast fading channels. This is surprising as 700C doesn’t have any FEC, but instead uses a simple transmit diversity scheme – the signal is sent twice at two different frequencies.

So I decided to develop a new FreeDV 700E waveform [8] with the following features:

  1. The ability to handle fast fading through an increased pilot symbol rate, but also with FEC which is useful for static crashes, interference, and mopping up random bit errors.
  2. Uses a shorter frame (80ms), rather than the 160ms frame of 700D. This will reduce latency and makes sync faster.
  3. The faster pilot symbol rate will mean 700E can handle frequency offsets better, as well as fast fading.
  4. Increasing the cyclic prefix from 2 to 6ms, allowing the modem to handle up to 6ms of multipath delay spread.
  5. A wider RF bandwidth than 700D, which can help mitigate frequency selective fading. If one part of the spectrum is notched out, we can use FEC to recover data from other parts of the spectrum. On the flip side, narrower signals are more robust to some interference, and use less spectrum.
  6. Compression of the OFDM waveform, to increase the average power (and hence received SNR) for a given peak power.
  7. Trade off low SNR performance for fast fading channel performance. A higher pilot symbol rate and longer cyclic prefix mean less energy is available for data symbols, so low SNR performance won’t be as good as 700D.
  8. It uses the same Codec 2 700C voice codec, so speech quality will be the same as 700C and D when SNR is high.

Over the course of 2020, we’ve refactored the OFDM modem and FreeDV API to make implementing new modem waveforms much easier. This really helped – I designed, simulated, and released the FreeDV 700E mode in just one week of part time work. It’s already being used all over the world in the development version of FreeDV 1.5.0.

My bench tests indicate 700C/D/E are equivalent on moderate fading channels (1Hz Doppler/2ms spread). As the fading speeds up to 2Hz 700D falls over, but 700C/E perform well. On very fast fading (4Hz/4ms) 700E does better as 700C stops working. 700D works better at lower SNRs on slow fading channels (1Hz Doppler/2ms and slower).

The second innovation is compression of the 700C/D/E waveforms, to increase average power significantly (around 6dB from FreeDV 1.4.3). Please be careful adjusting the Tx drive and especially enabling the Tools – Options – Clipping. It can drive your PA quite hard. I have managed 40W RMS out of my 75W PEP transmitter. Make sure your transmitter can handle long periods of high average power.

I’ve also been testing against compressed SSB, which is pretty hard to beat as it’s so robust to fading. However 700E is hanging on quite well with fast fading, and unlike SSB becomes noise free as the SNR increases. At the same equivalent peak power – 700D is doing well when compressed SSB is -5dB SNR and rather hard on the ears.

SSB Compression

To make an “apples to apples” comparison between FreeDV to SSB at low SNRs I need SSB compressor software than I can run on the (virtual) bench. So I’ve developed a speech compressor using some online wisdom [1][2]. Turns out the “Hilbert Clipper” in [2] is very similar to how I am compressing my OFDM signals to improve their PAPR. This appeals to me – using the same compression algorithm on SSB and FreeDV.

The Hilbert transformer takes the “real” speech signal and converts it to a “complex” signal. It’s the same signal, but now it’s represented by in phase and quadrature signals, or alternatively a vector spinning around the origin. Turns out you can do a much better job at compression by limiting the magnitude of that vector, than by clipping the input speech signal. Any clipping tends to spread the signal in frequency, so we have a SSB filter at the output to limit the bandwidth. Good compressors can get down to about 6dB PAPR for SSB, mine is not too shabby at 7-8dB.

It certainly makes a difference to noisy speech, as you can see in this plot (in low SNR channel), and the samples below:

Compression SNR Sample
Off High Listen
On High Listen
Off Low Listen
On Low Listen

FreeDV Performance

Here are some simulated samples. They are all normalised to the same peak power, and all waveforms (SSB and FreeDV) are compressed. The noise power N is in dB but there is some arbitrary scaling (for historical reasons). A more negative N means less noise. For a given noise power N, the SNRs vary as different waveforms have different peak to average power ratio. I’m adopting the convention of comparing signals at the same (Peak Power)/(Noise Power) ratio. This matches real world transmitters – we need to do the best we can with a given PEP (peak power). So the idea below is to compare samples at the same noise power N, and channel type, as peak power is known to be constant. An AWGN channel is just plain noise, MPP is 1Hz Doppler, 2ms delay spread; and MPD is 2Hz Doppler, 4ms delay spread.

Test Mode Channel N SNR Sample
1 SSB AWGN -12.5 -5 Listen
700D AWGN -12.5 -1.8 Listen
2 SSB MPP -17.0 0 Listen
700E MPP -17.0 3 Listen
3 SSB MPD -23.0 8 Listen
700E MPD -23.0 9 Listen

Here’s a sample of the simulated off air 700E modem signal 700E at 8dB SNR in a MPP channel. It actually works up to 4 Hz Doppler and 6ms delay spread. Which sounds likes a UFO landing.

Comments:

  1. With digital when your in a fade, you’re a fade! You lose that chunk of speech. A short FEC code (less than twice fade duration) isn’t going to help you much. We can’t extend the code length because latency (this is PTT speech). Sigh.
  2. This explain why 700C (with no FEC) does so well – we lose speech during deep fades (where FEC breaks anyway) but it “hangs on” as the Doppler whirls around and sounds fine in the “anti-fades”. The voice codec is robust to a few % BER all by itself which helps.
  3. Analog SSB is nearly impervious to fading, no matter how fast. It’s taken a lot of work to develop modems that “hang on” in fast fading channels.
  4. Analog SSB degrades slowly with decreasing SNR, but also improves slowly with increasing SNR. It’s still noisy at high SNRs. DSP noise reduction can help.

Lets take a look at the effect of compression. Here is a screen shot from my spectrum analyser in zero-span mode. It’s displaying power against time from my FT-817 being driven by two waveforms. Yellow is the previous, uncompressed 700D waveform, purple is the latest 700D with compression. You can really get a feel for how much higher the average power is. On my radio I jumped from 5-10W RMS to 40WRMS.

Jose’s demo

Jose, LU5DKI sent me a wave file sample of him “walking through the modes” over a 12,500km path between Argentina and New Zealand. The SSB is at the 2:30 mark:

This example shows how well 700E can handle fast fading over a path that includes Antartica:

A few caveats:

  • Jose’s TK-80 radio is 40 years old and doesn’t have any compression available for SSB.
  • FreeDV attenuates the “pass through” off air radio noise by about 6dB, so the level of the SSB will be lower than the FreeDV audio. However that might be a good idea with all that noise.
  • Some noise reduction DSP might help, although it tends to fall over at low SNRs. I don’t have a convenient command line tool for that. If you do – here is Jose’s sample. Please share the output with us.

I’m interested in objective comparisons of FreeDV and SSB using off air samples. I’m rather less interested in subjective opinions. Show me the samples …….

Conclusions and Further Work

I’m pleased with our recent modem waveform development and especially the compression. It’s also good fun to develop new waveforms and getting easier as the FreeDV API software matures. We’re getting pretty good performance over a range of channels now. Learning how to make modems for digital voice play nicely over HF channels. I feel our SSB versus FreeDV comparisons are maturing too.

The main limitation is the Codec 2 700C vocoder – while usable in practice it’s not exactly HiFi. Unfortunately speech coding is hard – much harder than modems. More R&D than engineering, which means a lot more effort – with no guarantee of a useful result. Anyhoo, lets see if I can make some progress on speech quality at low SNRs in 2021!

Links

[1] Compression – good introduction from AB4OJ.
[2] DSP Speech Processor Experimentation 2012-2020 – Sophisticated speech processor.
[3] Playing with PAPR – my initial simulations from earlier in 2020.
[4] Jim Ahlstrom N2ADR has done some fine work on FreeDV filter C code – very useful once again for this project. Thanks Jim!
[5] Modems for HF Digital Voice Part 1 and Part 2 – gentle introduction in to modems for HF.
[6] Steve Ports an OFDM modem from Octave to C – the OFDM modem Steve and I built – it keeps getting better!
[7] FreeDV 700E uses one of Bill’s fine LDPC codes.
[8] Modem waveform design spreadsheet.
[9] FreeDV 700D Released

Command Lines

Writing these down so I can cut and paste them to repeat these tests in the future….

Typical FreeDV 700E simulation, wave file output:

./src/freedv_tx 700E ../raw/ve9qrp_10s.raw - --clip 1 | ./src/cohpsk_ch - - -23 --mpp --raw_dir ../raw --Fs 8000 | sox -t .s16 -r 8000 -c 1 - ~/drowe/blog/ve9qrp_10s_700e_23_mpd_rx.wav

Looking at the PDF (histogram) of signal magnitude is interesting. Lets generate some compressed FreeDV 700E:

./src/freedv_tx 700D ../raw/ve9qrp.raw - --clip 1 | ./src/cohpsk_ch - - -100  --Fs 8000 --complexout > ve9qrp_700d_clip1.iq16

Now take the complex valued output signal and plot the PDF and CDF the magnitude (and time domain and spectrum):

octave:1> s=load_raw("../build_linux/ve9qrp_700d_clip1.iq16"); s=s(1:2:end)+j*s(2:2:end); figure(1); plot(abs(s)); S=abs(fft(s)); figure(2): clf; plot(20*log10(S)); figure(3); clf; [hh nn] = hist(abs(s),25,1); cdf = empirical_cdf(1:max(abs(s)),abs(s)); plotyy(nn,hh,1:max(abs(s)),cdf);


This is after clipping, so 100% of the samples have a magnitude less than 16384. Also see [3].

When testing with real radios it’s useful to play a sine wave at the same PEP level as the modem signals under test. I could get 75WRMS (and PEP) out of my IC-7200 using this test (13.8VDC power supply):

./misc/mksine - 1000 160 16384 | aplay -f S16_LE

We can measure the PAPR of the sine wave with the cohpsk_ch tool:

./misc/mksine - 1000 10 | ./src/cohpsk_ch - /dev/null -100 --Fs 8000
ohpsk_ch: Fs: 8000 NodB: -100.00 foff: 0.00 Hz fading: 0 nhfdelay: 0 clip: 32767.00 ssbfilt: 1 complexout: 0
cohpsk_ch: SNR3k(dB):    85.23  C/No....:   120.00
cohpsk_ch: peak.....: 10597.72  RMS.....:  9993.49   CPAPR.....:  0.51 
cohpsk_ch: Nsamples.:    80000  clipped.:     0.00%  OutClipped:  0.00%

CPAPR = 0.5dB, should be 0dB, but I think there’s a transient as the Hilbert Transformer FIR filter memory fills up. Close enough.

By chaining cohpsk_ch together is various ways we can build a SSB compressor, and simulate the channel by injecting noise and fading:

./src/cohpsk_ch ../raw/ve9qrp_10s.raw - -100 --Fs 8000 | ./src/cohpsk_ch - - -100 --Fs 8000 --clip 16384 --gain 10 | ./src/cohpsk_ch - - -100 --Fs 8000 --clip 16384 | ./src/cohpsk_ch - - -17 --raw_dir ../raw --mpd --Fs 8000 --gain 0.8 | aplay -f S16_LE

cohpsk_ch: peak.....: 16371.51  RMS.....:  7128.40   CPAPR.....:  7.22

A PAPR of 7.2 dB is pretty good for a few hours work – the cools kids get around 6dB [1][2].

Open IP over VHF/UHF 4

For the last few weeks I’ve been building up some automated test software for my fledgling IP over radio system.

Long term automated tests can help you thrash out a lot of issues. Part of my cautious approach in taking small steps to build a complex system. I’ve built up a frame repeater system where one terminal “pings” another terminal – and repeats this thousands of times. I enjoyed writing service scripts [3] to wrap up the complex command lines, and bring the system “up” and “down” cleanly. The services also provide some useful debug options like local loopback testing of the radio hardware on each terminal.

I started testing the system “over the bench” with two terminals pinging and ponging frames back and forth via cables. After a few hours I hit a bug where the RpiTx RF would stop. A few repeats showed this sometimes happened in a few minutes, and other times after a few hours.

This lead to an interesting bug hunt. I quite enjoy this sort of thing, peeling off the layers of a complex system, getting closer and closer to the actual problem. It was fun to learn about the RpiTx [2] internals. A very clever system of a circular DMA buffer feeding PLL fractional divider values to the PLLC registers on the Pi. The software application chases that DMA read pointer around, trying to keep the buffer full.

By dumping the clock tree I eventually worked out some other process was messing with the PLLC register. Evariste on the RpiTx forum then suggested I try “force_turbo=1” [4]. That fixed it! My theory is the CPU freq driver (wherever that lives) was scaling all the PLLs when the CPU shifted clock speed. To avoid being caught again I added some logic to check PLLC and bomb out if it appears to have been changed.

A few other interesting things I noticed:

  1. I’m running 10 kbit/s for these tests, with a 10kHz shift between the two FSK tones and a carrier frequency of 144.5MHz. I use a FT-817 SSB Rx to monitor the transmission, which has bandwidth of around 3 kHz. A lot of the time a FSK burst sounds like broadband noise, as the FT-817 is just hearing a part of the FSK spectrum. However if you tune to the high or low tone frequency (just under 144.500 or 144.510) you can hear the FSK tones. Nice audio illustration of FSK in action.
  2. On start up RPiTx uses ntp to calibrate the frequency, which leads to slight shifts in the frequency each time it starts. Enough to be heard by the human ear, although I haven’t measured them.

I’ve just finished a 24 hour test where the system sent 8600 bursts (about 6 Mbyte in each direction) over the link, and everything is running nicely (100% of packets were received). This gives me a lot of confidence in the system. I’d rather know if there are any stability issues now than when the device under test is deployed remotely.

I feel quite happy with that result – there’s quite a lot of signal processing software and hardware that must be playing nicely together to make that happen. Very satisfying.

Next Steps

Now it’s time to put the Pi in a box, connect a real antenna and try some over the air tests. My plan is:

  1. Set up the two terminals several km apart, and see if we can get a viable link at 10 kbit/s, although even 1 kbit/s would be fine for initial tests. Enough margin for 100 kbit/s would be even better, but happy to work on that milestone later.
  2. I’m anticipating some fine tuning of the FSK_LDPC waveforms will be required.
  3. I’m also anticipating problems with urban EMI, which will raise the noise floor and set the SNR of the link. I’ve instrumented the system to measure the noise power at both ends of the link, so I can measure this over time. I can also measure received signal power, and estimate path loss. Knowing the gain of the RTLSDR, we can measure signal power in dBm, and estimate noise power in dBm/Hz.
  4. There might be some EMI from the Pi, lets see what happens when the antenna is close.
  5. I’ll run the frame repeater system over several weeks, debug any stability issues, and collect data on S, N, SNR, and Packet Error Rate.

Reading Further

[1] Open IP over VHF/UHF Part 1 Part 2 Part 3
[2] RpiTx – Radio transmitter software for Raspberry Pis
[3] GitHub repo for this project with build scripts, a project plan and a bunch of command lines I use to run various tests. The latest work in progress will be an open pull request.
[4] RpiTx Group discussion of the PLLC bug discussed above

Open IP over VHF/UHF 3

The goal of this project is to develop a “100 kbit/s IP link” for VHF/UHF using just a Pi and RTLSDR hardware, and open source signal processing software [1]. Since the last post, I’ve integrated a bunch of components and now have a half duplex radio data system running over the bench.

Recent progress:

  1. The Tx and Rx signal processing is now operating happily together on a Pi, CPU load is fine.
  2. The FSK_LDPC modem and FEC [2] has been integrated, so we can now send and receive coded frames. The Tx and Rx command line programs have been modified to send and receive bursts of frames.
  3. I’ve added a PIN diode Transmit/Receive switch, which I developed for the SM2000 project [3]. This is controlled by a GPIO from the Pi. There is also logic to start and stop the Pi Tx carrier at the beginning and end of bursts – so it doesn’t interfere with the Rx side.
  4. I’ve written a “frame repeater” application that takes packets received from the Rx and re-transmits them using the Tx. This will let me run “ping” tests over the air. A neat feature is it injects the received Signal and Noise power into the frame it re-transmits. This will let me measure the received power, the noise floor, and SNR at the remote station.
  5. The receiver in each terminal is very sensitive, and inconveniently picks up frames transmitted by that terminal. After trying a few approaches I settled on a “source filtering” design. When a packet is transmitted, the Tx places a “source byte” in the frame that is unique to that terminal. A one byte MAC address I guess. The local receiver then ignores (filters) any packets with that source address, and only outputs frames from other terminals.

Here is a block diagram of the Pi based terminal, showing hardware and software components:

When I build my next terminal, I will try separate Tx and Rx antennas, as a “minimalist” alternative to the TR switch. The next figure shows the transmit control signals in action. Either side of a burst we need to switch the TR switch and turn the Tx carrier on and off:

Here’s the current half duplex setup on the bench:

Terminal2 is on the left, is comprised of the Pi, RTLSDR, and TR switch. Terminal1 (right) is the HackRF/RTLSDR connected to my laptop. Instead of a TR switch I’m using a hybrid combiner (a 3dB loss, but not an issue for these tests). This also shows how different SDR Tx/Rx hardware can be used with this system.

I’m using 10,000 bit/s for the current work, although that’s software configurable. When I start testing over the air I’ll include options for a range of bit rates, eventually shooting for 100 kbits/s.

Here’s a demo video of the system:

Next Steps

The command lines to run everything are getting unwieldy so I’ll encapsulate them is some “service” scripts to start and stop the system neatly. Then box everything up, try a local RF link, and check for stability over a few days. Once I’m happy I will deploy a terminal and start working through the real world issues. The key to getting complex systems going is taking tiny steps. Test and debug carefully at each step.

It’s coming together quite nicely, and I’m enjoying a few hours of work on the project every weekend. It’s very satisfying to build the layers up one by one, and a pleasant surprise when the pieces start playing nicely together and packets move magically across the system. I’m getting to play with RF, radios, modems, packets, and even building up small parts of a protocol. Good fun!

Reading Further

[1] Open IP over UHF/VHF Part 1 and Part 2.
[2] FSK LDPC Data Mode – open source data mode using a FSK modem and powerful LDPC codes.
[3] SM2000 Part 3 – PIN TR Switch and VHF PA
[4] GitHub repo for this project with build scripts, a project plan and a bunch of command lines I use to run various tests. The latest work in progress will be an open pull request.

Speech Spectral Quantisation using VQ-VAE

As an exercise to learn more about machine learning, I’ve been experimenting with Vector Quantiser Variational AutoEncoders (VQ VAE) [2]. Sounds scary but is basically embedding a vector quantiser in a Neural Network so they train together. I’ve come up with a simple network that quantises 80ms (8 x 10ms frames) of spectral magnitudes in 88 bits (about 1100 bits/s).

I arrived at my current model through trial and error, using this example [1] as a starting point. Each 10ms frame is a vector of energies from 14 mel-spaced filters, derived from LPCNet [6]. The network uses conv1D stages to downsample and upsample the vectors, with a two stage VQ (11 bits per stage) in the Autoencoder “bottleneck”. The VQ is also encoding total frame energy, so the remaining parameters for a vocoder would be pitch and (maybe) voicing.

This work (spectral quantisation) is applicable to “old school” vocoders like Codec 2 and is also being used with newer Neural Vocoders in some research papers.

I haven’t used it to synthesise any speech yet but it sure does make nice plots. This one is a 2D histogram of the encoder space, white dots are the stage 1 VQ entries. The 16 dimensional data has been reduced to 2 dimensions using PCA.

If the VQ is working, we should expect more dots in the brighter colour areas, and less in the darker areas.

Here is a sample input (green) output (red) of 8 frames:

This is a transition region, going from voiced to unvoiced speech. It seems to handle it OK. The numbers are (frame_number, SD), where SD is the Spectral Distortion in dB*dB. When we get a high SD frame, quite often it’s not crazy wrong, more an educated guess that will probably sound OK, e.g. a different interpolation profile for the frame energy across a transition. Formants are mostly preserved.

The VQ seems to be doing something sensible, after 20 epochs I can see most VQ entries are being used, and the SD gets better with more bits. The NN part trains much faster that the VQ.

Here is a histogram of the SDs for each frame:

The average SD is around 7.5 dB*dB, similar to some of the Codec 2 quantisers. However this is measured on every 10ms frame in an 8 frame sequence, so it’s a measure of how well it interpolates/decimates in time as well. As I mentioned above – some of the “misses” that push the mean SD higher are inconsequential.

Possible Bug in Codec 2 700C

I use similar spectral magnitude vectors for Codec 2 700C [5] – however when I tried that data the SD was about double. Hmmm. I looked into it and found some bugs/weaknesses in my approach for Codec 2 700C (for that codec the spectral magnitudes a dependant on the pitch estimator which occasionally loses it). So that was a nice outcome – trying to get the same result two different ways can be a pretty useful test.

Further Work

Some ideas for further work:

  1. Use kmeans for training.
  2. Inject bit errors when training to make it robust to channel errors.
  3. Include filtered training material to make it robust to recording conditions.
  4. Integrate into a codec and listen to it.
  5. Try other networks – I’m still learning how to engineer an optimal network.
  6. Make it work with relu activations, I can only get it to work with tanh.

Reading Further

[1] VQ VAE Keras MNIST Example – my starting point for the VQ-VAE work
[2] Neural Discrete Representation Learning
[3] My Github repo for this work
[4] Good introduction to PCA
[5] Codec 2 700C – also uses VQ-ed mel-spaced vectors
[6] LPCNet: DSP-Boosted Neural Speech Synthesis