Subset Vector Quantiser

I’ve returned to speech coding after a year of playing with VHF and HF modems. This is somewhat daunting for me, as speech coding is R&D, which tends to be very open ended; it’s possible to work for months with no clear outcomes. In contrast the modem work is straight forward engineering, and I get the positive feedback of “having stuff work” on a regular basis.

So I’m trying to time box the speech coding projects to a few days work each. This is quite a personal challenge, as there are just so many variables and paths to follow. It’s so easy to go off on a tangent and watch the months pass!

In this particular project I’m looking at the Codec 2 700C Vector Quantiser (VQ), and exploring ways to make it more inherently robust to high pass and low pass filtering at the edges of the spectrum. The broad goal is to improve speech quality at a given bit rate, or support a lower bit rate at the same speech quality. I’m targeting bit rates in the 600 bit/s range, and the lower end of the quality range (communications quality speech).

Subset Vector Quantiser

It’s well known that the most important speech information is between 300 and 3000 Hz. Energy outside that range makes the speech sounds nicer, but doesn’t help intelligibility much. Analog modes such as SSB exploit this by band limiting the speech so that the transmitter power is used just to punch through information in the narrow bandwidth that matters.

With digital speech the RF bandwidth is not directly linked to the bandwidth of the decoded speech. For example FreeDV 700D uses around 1100 Hz of RF bandwidth, but the decoded speech has energy covering most of the 0 to 4000 Hz range.

Codec 2 encodes and transmits the speech spectrum on a regular basis. As well as encoding parts of the spectrum necessary for intelligibility, it also has to encode other features, such as the high pass and low pass roll off the microphone and analog filters in the sound card. These features don’t carry any intelligibility, but bits are consumed to encode them. When you have just 28 bits/frame – every bit matters!

Turns out that some of the wilder variations in the speech spectrum from different sources of speech are in the 0-300Hz and 3000-4000Hz regions, for example different high pass or low pass filtering for a particular microphone or sound card. This can upset the Codec 2 quantiser, e.g. it might expend bits modelling a particular low pass filter response – lowering the quality of the perceptually important 300-3000 Hz information.

So I’ve prototyped a Vector Quantiser (VQ) that just uses the information in the 300-3000 Hz range to quantise the speech spectrum. However the VQ is trained on the full range, so it uses that full range to synthesise the speech, hopefully recovering some of the extended spectrum. There is also a limiter before the VQ to reduce the dynamic range of the frames.

Here are some samples processed with the newamp1 two stage VQ, and the new subset VQ. Like the newamp1 algorithm, it works on vectors of K=20 mel-spaced magnitude samples. The speech codec is only partially quantised (10ms frames, original phase, unquantised pitch), so it would sound worse in a real world fully quantised codec. The “newamp1” algorithm is used for Codec 2 700C [1] (and employed in FreeDV 700C/D/E), and uses 22 bit/frame (550 bits/s at a 40ms frame rate). The subset VQ uses just 12 bits frame (300 bits/s at a 40ms frame rate).

Filename newamp1 subset
big dog Listen Listen
cap Listen Listen
fish Listen Listen
hts2a Listen Listen

There are some samples where newamp1 sounds louder – this could be due to the gain limiter stage in the subset algorithm constraining the dynamic range. There also seems to be more high frequency response with newamp1, indicating subset is not recovering the high frequency speech energy as I had hoped. Both are quite intelligible, and acceptable for communications quality speech.

This table presents the mean square spectral distortion in dB*dB:

Filename newamp1 subset
big dog 6.05 8.56
cap 9.03 8.10
fish 10.81 7.31
hts2a 10.51 8.62

Both the samples and objective results show the subset VQ is holding up OK next to the reference newamp1, despite the low bit rate. I’ve found that around 9 dB*dB spectral distortion gives acceptable results for my (low communications quality) use case.

I tried adding an artificial low pass filter above 3400 Hz to a couple of the input samples, to simulate what might happen from different microphones.

It seems to work OK with the low pass filtered input samples, they sound pretty similar to the sample using the original source (one of the goals of this work):

Filename subset subset low pass
cap Listen Listen
fish Listen Listen

Conclusions and Further work

I’m surprised that a single stage VQ is working quite well at just 12 bits/frame. It’s also encoding both the average frame energy and the spectral shape (I use a separate scalar quantiser for frame energy in newamp1). This is quite optimal from a VQ theory point of view, but sometimes not possible due to practical concerns such as the storage/CPU requirements for a single stage VQ.

Further work ideas:

  1. Try a few more samples, and push this work through to a fully quantised speech codec.
  2. Seeing it’s doing well with a single stage, it would interesting to see if it sounds better with multiple stages.
  3. A single stage VQ enables other tricks, like non-FEC techniques to make the VQ robust to bit errors, such as sorting the VQ indexes [2] or Neural Net style training with bit errors.
  4. Try different companding curves instead of the hard limiter, to better represent louder signals.

Links

[1] Codec 2 700C
[2] Codec 2 at 450 bits/s. Another single VQ Codec 2 mode, that can generate high frequency information using a similar approach to this post.
[3] Codec 2 700C Equaliser Part 2. A previous approach to handle a similar problem, in this case the speech is equalised before hitting a full band VQ.
[4] The script train_sub_quant.sh in This GitHub PR” is used to perform the experiments documented in this post.

Simple FreeDV API Examples

Here are some simple examples showing how to use the Codec 2 and FreeDV API in C and Python applications.

Some of the other examples have grown a little too full featured and complex so I thought it might be useful to show the simplest possible programs. Instructions at the top of each file.

The Quisk SDR is a fine example of a GUI application that uses the FreeDV API.

I’m intrigued by the idea of using the FreeDV API with other languages such as Python. I first became aware of this idea through this FreeDV KISS TNC project from xssfox.

Using similar techniques Mark, VK5QI, has implemented a High Altitude Balloon (HAB) telemetry library using a fork of C modem/protocol code from Codec 2 combined with Python. The project includes a nice Python GUI application for receiving HAB telemetry.

Codec 2 HF Data Modes Part 3

It works! I’ve spent the last few weeks building automated tests for the new HF data modes. All three modes are working well over real world 40m/20m channels at distances between 100km and 3000km. My goals were:

  • Transfer a total 1 Mbyte of data – that’s quite a bit for HF.
  • Find some fast fading and long delay spread channels. Hence the 100km (NVIS) tests.
  • Run the tests over a week to experience a range of HF conditions.
  • Look for any corner cases that break the modems.
  • Collect a bunch of real world samples to support further development.

I ended up with 649 off air samples, that I can inspect by browsing the spectograms:

Here is a histogram of the SNRs tested:

The data modes work fine in simulation down to -4dB, so I’m pleased to see the same is true on real world channels. Curiously I didn’t get any really high SNRs, perhaps my station is too modest (IC7200 into a dipole). At some stage I will need to find a high SNR path to test QAM modes.

I did find a fast fading example (with bonus co-channel SSB):

The spectrogram of this sample shows the “barber pole” lines moving quickly as the notch sweeps through the modem spectrum:

Each test logs some data: in this case 5 good frames from 5 transmitted, despite the fast fading and SSB.

sdr.ironstonerange.com
datac3
waiting for KiwiSDR .
Tx data signal
Stopping KiwiSDR
Process receiver sample
modembufs:    183 bytes:   630 Frms.:     5 SNRAv:  3.66
BER......: 0.0354 Tbits: 10240 Terrs:   363
Coded BER: 0.0000 Tbits:  5120 Terrs:     0
Coded FER: 0.0000 Tfrms:     5 Tfers:     0
FrmGd FrmDt Bytes SNRAv RawBER    Pre  Post UWfails
    5     5   630  3.66 0.0354      5     0       0

Unfortunately I didn’t collect any large delay spread examples – these would show up as closely spaced notches in the spectrogram. In this example the notches are 700Hz apart:

… which means 1/700=1.4ms of delay spread. The worst case sample had notches spaced by 500 Hz (2ms delay spread). The modes I tested are designed to handle up to 6ms so they didn’t get much of a work out. I might need some help from friends in high latitudes….

Running a test

Each test is controlled by the ota_test.sh script, for example:

./ota_test.sh kiwisdr.areg.org.au

This records the received signal from the KiwiSDR while transmitting data frames using my HF radio. After the transmission is complete, the received wave file is run through the demodulator. The received frames are checked and we log some statistics. There are set-up instructions at the top of the script, and even some help:

./ota_test.sh

Automated Over The Air (OTA) data test for FreeDV OFDM HF data modems

  usage ./ota_test.sh [-d] [-f freq_kHz] [-t] [-n Nbursts] [-o model] [-p port] kiwi_url

    -d        debug mode; trace script execution
    -o model  select radio model number ('rigctl -l' to list)
    -m mode   datac0|datac1|datac3
    -t        Tx only, useful for manually observing SDRs which block multiple sessions from one IP

I also wrote scripts to automate the tests via cron, and to summarise results.

Looking at the data over time, I learnt a bit about HF propogation, e.g. I could see 100km NVIS falling over at sunset, but could still send packets 800km away for a few more hours (although with reduced performance). In the morning it was the reverse, I could see SNRs from a 800km path reducing, and NVIS SNRs building up.

I can think of a few improvements to my test system:

  • Rx from multiple KiwiSDRs at the same time, to collect data faster from multiple paths.
  • Add an option to pre-pend a wavefile with SSB for periodic station ID.
  • The system could be used to test voice modes: send a SSB signal, then a FreeDV voice mode signal, to compare them over the same channel. The samples could be normalised to the same peak power.

The tests also reminded me I haven’t tuned the compression (PAPR) of the data modes to get maximum performance out of them for a given peak power. Will work on that next.

Links
README_data – Codec 2 data mode documentation (HF OFDM raw data section)
HF OFDM Data testing – GitHub PR where I’m developing the automated tests.
Codec 2 HF Data Modes Part 1
Codec 2 HF Data Modes Part 2

Codec 2 HF Data Modes Part 2

Over the past few months I’ve been working on HF data modes, in particular building up a new burst acquisition system for our OFDM modem. As usual, what seemed like a small project turned out to be a lot of work! I’ve now integrated all the changes into the FreeDV API and started testing over the air, sending frames of data from a Tx at my home to remote SDRs all over Australia.

Features:

  • Importantly – this work is open source – filling a gap in the HF data world. HF is used for Ham radio, emergency communications and in the developing world where no other infrastructure exists. It needs to be open.
  • High performance waveforms designed for fast fading channels with modern FEC (thanks Bill, VK5DSP).
  • Implemented as a C library that can be cross compiled on many machines, and called from other programs (C and Python examples). You don’t need to be tied to one operating system or expensive, proprietary hardware.
  • Further development is supported by a suite of automated tests.

I’m not aiming to build a full blown TNC myself, just the layer that can move data frames over HF Radio channels. This seems to be where the real need lies, and the best use of my skills. I have however been working with TNC developers like Simon, DJ2LS. Together we have written a set of use cases that we have been developing against. This has been very useful, and a fun learning experience for both of us.

I’ve documented the Codec 2 HF data modes in README_data, which includes simple examples of how to use the API, and simulated/real world results.

Further work:

  • Automated testing over real world channels
  • Tuning performance
  • Port a higher bit rate QAM16 mode to C
  • Working with TNC developers
  • Prototype very simple low cost HF Data links using RTLSDRs and RpiTx transmitters

Reading Further

HF Acquisition Pull Request – journal of the recent development
README_data – Codec 2 data mode documentation (HF OFDM raw data section)
Codec2 HF Data Modes Part 1

FreeDV 700E and Compression

FreeDV 700D [9] is built around an OFDM modem [6] and powerful LDPC codes, and was released in mid 2018. Since then our real world experience has shown that it struggles with fast fading channels. Much to my surprise, the earlier FreeDV 700C mode actually works better on fast fading channels. This is surprising as 700C doesn’t have any FEC, but instead uses a simple transmit diversity scheme – the signal is sent twice at two different frequencies.

So I decided to develop a new FreeDV 700E waveform [8] with the following features:

  1. The ability to handle fast fading through an increased pilot symbol rate, but also with FEC which is useful for static crashes, interference, and mopping up random bit errors.
  2. Uses a shorter frame (80ms), rather than the 160ms frame of 700D. This will reduce latency and makes sync faster.
  3. The faster pilot symbol rate will mean 700E can handle frequency offsets better, as well as fast fading.
  4. Increasing the cyclic prefix from 2 to 6ms, allowing the modem to handle up to 6ms of multipath delay spread.
  5. A wider RF bandwidth than 700D, which can help mitigate frequency selective fading. If one part of the spectrum is notched out, we can use FEC to recover data from other parts of the spectrum. On the flip side, narrower signals are more robust to some interference, and use less spectrum.
  6. Compression of the OFDM waveform, to increase the average power (and hence received SNR) for a given peak power.
  7. Trade off low SNR performance for fast fading channel performance. A higher pilot symbol rate and longer cyclic prefix mean less energy is available for data symbols, so low SNR performance won’t be as good as 700D.
  8. It uses the same Codec 2 700C voice codec, so speech quality will be the same as 700C and D when SNR is high.

Over the course of 2020, we’ve refactored the OFDM modem and FreeDV API to make implementing new modem waveforms much easier. This really helped – I designed, simulated, and released the FreeDV 700E mode in just one week of part time work. It’s already being used all over the world in the development version of FreeDV 1.5.0.

My bench tests indicate 700C/D/E are equivalent on moderate fading channels (1Hz Doppler/2ms spread). As the fading speeds up to 2Hz 700D falls over, but 700C/E perform well. On very fast fading (4Hz/4ms) 700E does better as 700C stops working. 700D works better at lower SNRs on slow fading channels (1Hz Doppler/2ms and slower).

The second innovation is compression of the 700C/D/E waveforms, to increase average power significantly (around 6dB from FreeDV 1.4.3). Please be careful adjusting the Tx drive and especially enabling the Tools – Options – Clipping. It can drive your PA quite hard. I have managed 40W RMS out of my 75W PEP transmitter. Make sure your transmitter can handle long periods of high average power.

I’ve also been testing against compressed SSB, which is pretty hard to beat as it’s so robust to fading. However 700E is hanging on quite well with fast fading, and unlike SSB becomes noise free as the SNR increases. At the same equivalent peak power – 700D is doing well when compressed SSB is -5dB SNR and rather hard on the ears.

SSB Compression

To make an “apples to apples” comparison between FreeDV to SSB at low SNRs I need SSB compressor software than I can run on the (virtual) bench. So I’ve developed a speech compressor using some online wisdom [1][2]. Turns out the “Hilbert Clipper” in [2] is very similar to how I am compressing my OFDM signals to improve their PAPR. This appeals to me – using the same compression algorithm on SSB and FreeDV.

The Hilbert transformer takes the “real” speech signal and converts it to a “complex” signal. It’s the same signal, but now it’s represented by in phase and quadrature signals, or alternatively a vector spinning around the origin. Turns out you can do a much better job at compression by limiting the magnitude of that vector, than by clipping the input speech signal. Any clipping tends to spread the signal in frequency, so we have a SSB filter at the output to limit the bandwidth. Good compressors can get down to about 6dB PAPR for SSB, mine is not too shabby at 7-8dB.

It certainly makes a difference to noisy speech, as you can see in this plot (in low SNR channel), and the samples below:

Compression SNR Sample
Off High Listen
On High Listen
Off Low Listen
On Low Listen

FreeDV Performance

Here are some simulated samples. They are all normalised to the same peak power, and all waveforms (SSB and FreeDV) are compressed. The noise power N is in dB but there is some arbitrary scaling (for historical reasons). A more negative N means less noise. For a given noise power N, the SNRs vary as different waveforms have different peak to average power ratio. I’m adopting the convention of comparing signals at the same (Peak Power)/(Noise Power) ratio. This matches real world transmitters – we need to do the best we can with a given PEP (peak power). So the idea below is to compare samples at the same noise power N, and channel type, as peak power is known to be constant. An AWGN channel is just plain noise, MPP is 1Hz Doppler, 2ms delay spread; and MPD is 2Hz Doppler, 4ms delay spread.

Test Mode Channel N SNR Sample
1 SSB AWGN -12.5 -5 Listen
700D AWGN -12.5 -1.8 Listen
2 SSB MPP -17.0 0 Listen
700E MPP -17.0 3 Listen
3 SSB MPD -23.0 8 Listen
700E MPD -23.0 9 Listen

Here’s a sample of the simulated off air 700E modem signal 700E at 8dB SNR in a MPP channel. It actually works up to 4 Hz Doppler and 6ms delay spread. Which sounds likes a UFO landing.

Comments:

  1. With digital when your in a fade, you’re a fade! You lose that chunk of speech. A short FEC code (less than twice fade duration) isn’t going to help you much. We can’t extend the code length because latency (this is PTT speech). Sigh.
  2. This explain why 700C (with no FEC) does so well – we lose speech during deep fades (where FEC breaks anyway) but it “hangs on” as the Doppler whirls around and sounds fine in the “anti-fades”. The voice codec is robust to a few % BER all by itself which helps.
  3. Analog SSB is nearly impervious to fading, no matter how fast. It’s taken a lot of work to develop modems that “hang on” in fast fading channels.
  4. Analog SSB degrades slowly with decreasing SNR, but also improves slowly with increasing SNR. It’s still noisy at high SNRs. DSP noise reduction can help.

Lets take a look at the effect of compression. Here is a screen shot from my spectrum analyser in zero-span mode. It’s displaying power against time from my FT-817 being driven by two waveforms. Yellow is the previous, uncompressed 700D waveform, purple is the latest 700D with compression. You can really get a feel for how much higher the average power is. On my radio I jumped from 5-10W RMS to 40WRMS.

Jose’s demo

Jose, LU5DKI sent me a wave file sample of him “walking through the modes” over a 12,500km path between Argentina and New Zealand. The SSB is at the 2:30 mark:

This example shows how well 700E can handle fast fading over a path that includes Antartica:

A few caveats:

  • Jose’s TK-80 radio is 40 years old and doesn’t have any compression available for SSB.
  • FreeDV attenuates the “pass through” off air radio noise by about 6dB, so the level of the SSB will be lower than the FreeDV audio. However that might be a good idea with all that noise.
  • Some noise reduction DSP might help, although it tends to fall over at low SNRs. I don’t have a convenient command line tool for that. If you do – here is Jose’s sample. Please share the output with us.

I’m interested in objective comparisons of FreeDV and SSB using off air samples. I’m rather less interested in subjective opinions. Show me the samples …….

Conclusions and Further Work

I’m pleased with our recent modem waveform development and especially the compression. It’s also good fun to develop new waveforms and getting easier as the FreeDV API software matures. We’re getting pretty good performance over a range of channels now. Learning how to make modems for digital voice play nicely over HF channels. I feel our SSB versus FreeDV comparisons are maturing too.

The main limitation is the Codec 2 700C vocoder – while usable in practice it’s not exactly HiFi. Unfortunately speech coding is hard – much harder than modems. More R&D than engineering, which means a lot more effort – with no guarantee of a useful result. Anyhoo, lets see if I can make some progress on speech quality at low SNRs in 2021!

Links

[1] Compression – good introduction from AB4OJ.
[2] DSP Speech Processor Experimentation 2012-2020 – Sophisticated speech processor.
[3] Playing with PAPR – my initial simulations from earlier in 2020.
[4] Jim Ahlstrom N2ADR has done some fine work on FreeDV filter C code – very useful once again for this project. Thanks Jim!
[5] Modems for HF Digital Voice Part 1 and Part 2 – gentle introduction in to modems for HF.
[6] Steve Ports an OFDM modem from Octave to C – the OFDM modem Steve and I built – it keeps getting better!
[7] FreeDV 700E uses one of Bill’s fine LDPC codes.
[8] Modem waveform design spreadsheet.
[9] FreeDV 700D Released

Command Lines

Writing these down so I can cut and paste them to repeat these tests in the future….

Typical FreeDV 700E simulation, wave file output:

./src/freedv_tx 700E ../raw/ve9qrp_10s.raw - --clip 1 | ./src/cohpsk_ch - - -23 --mpp --raw_dir ../raw --Fs 8000 | sox -t .s16 -r 8000 -c 1 - ~/drowe/blog/ve9qrp_10s_700e_23_mpd_rx.wav

Looking at the PDF (histogram) of signal magnitude is interesting. Lets generate some compressed FreeDV 700E:

./src/freedv_tx 700D ../raw/ve9qrp.raw - --clip 1 | ./src/cohpsk_ch - - -100  --Fs 8000 --complexout > ve9qrp_700d_clip1.iq16

Now take the complex valued output signal and plot the PDF and CDF the magnitude (and time domain and spectrum):

octave:1> s=load_raw("../build_linux/ve9qrp_700d_clip1.iq16"); s=s(1:2:end)+j*s(2:2:end); figure(1); plot(abs(s)); S=abs(fft(s)); figure(2): clf; plot(20*log10(S)); figure(3); clf; [hh nn] = hist(abs(s),25,1); cdf = empirical_cdf(1:max(abs(s)),abs(s)); plotyy(nn,hh,1:max(abs(s)),cdf);


This is after clipping, so 100% of the samples have a magnitude less than 16384. Also see [3].

When testing with real radios it’s useful to play a sine wave at the same PEP level as the modem signals under test. I could get 75WRMS (and PEP) out of my IC-7200 using this test (13.8VDC power supply):

./misc/mksine - 1000 160 16384 | aplay -f S16_LE

We can measure the PAPR of the sine wave with the cohpsk_ch tool:

./misc/mksine - 1000 10 | ./src/cohpsk_ch - /dev/null -100 --Fs 8000
ohpsk_ch: Fs: 8000 NodB: -100.00 foff: 0.00 Hz fading: 0 nhfdelay: 0 clip: 32767.00 ssbfilt: 1 complexout: 0
cohpsk_ch: SNR3k(dB):    85.23  C/No....:   120.00
cohpsk_ch: peak.....: 10597.72  RMS.....:  9993.49   CPAPR.....:  0.51 
cohpsk_ch: Nsamples.:    80000  clipped.:     0.00%  OutClipped:  0.00%

CPAPR = 0.5dB, should be 0dB, but I think there’s a transient as the Hilbert Transformer FIR filter memory fills up. Close enough.

By chaining cohpsk_ch together is various ways we can build a SSB compressor, and simulate the channel by injecting noise and fading:

./src/cohpsk_ch ../raw/ve9qrp_10s.raw - -100 --Fs 8000 | ./src/cohpsk_ch - - -100 --Fs 8000 --clip 16384 --gain 10 | ./src/cohpsk_ch - - -100 --Fs 8000 --clip 16384 | ./src/cohpsk_ch - - -17 --raw_dir ../raw --mpd --Fs 8000 --gain 0.8 | aplay -f S16_LE

cohpsk_ch: peak.....: 16371.51  RMS.....:  7128.40   CPAPR.....:  7.22

A PAPR of 7.2 dB is pretty good for a few hours work – the cools kids get around 6dB [1][2].

Open IP over VHF/UHF 4

For the last few weeks I’ve been building up some automated test software for my fledgling IP over radio system.

Long term automated tests can help you thrash out a lot of issues. Part of my cautious approach in taking small steps to build a complex system. I’ve built up a frame repeater system where one terminal “pings” another terminal – and repeats this thousands of times. I enjoyed writing service scripts [3] to wrap up the complex command lines, and bring the system “up” and “down” cleanly. The services also provide some useful debug options like local loopback testing of the radio hardware on each terminal.

I started testing the system “over the bench” with two terminals pinging and ponging frames back and forth via cables. After a few hours I hit a bug where the RpiTx RF would stop. A few repeats showed this sometimes happened in a few minutes, and other times after a few hours.

This lead to an interesting bug hunt. I quite enjoy this sort of thing, peeling off the layers of a complex system, getting closer and closer to the actual problem. It was fun to learn about the RpiTx [2] internals. A very clever system of a circular DMA buffer feeding PLL fractional divider values to the PLLC registers on the Pi. The software application chases that DMA read pointer around, trying to keep the buffer full.

By dumping the clock tree I eventually worked out some other process was messing with the PLLC register. Evariste on the RpiTx forum then suggested I try “force_turbo=1” [4]. That fixed it! My theory is the CPU freq driver (wherever that lives) was scaling all the PLLs when the CPU shifted clock speed. To avoid being caught again I added some logic to check PLLC and bomb out if it appears to have been changed.

A few other interesting things I noticed:

  1. I’m running 10 kbit/s for these tests, with a 10kHz shift between the two FSK tones and a carrier frequency of 144.5MHz. I use a FT-817 SSB Rx to monitor the transmission, which has bandwidth of around 3 kHz. A lot of the time a FSK burst sounds like broadband noise, as the FT-817 is just hearing a part of the FSK spectrum. However if you tune to the high or low tone frequency (just under 144.500 or 144.510) you can hear the FSK tones. Nice audio illustration of FSK in action.
  2. On start up RPiTx uses ntp to calibrate the frequency, which leads to slight shifts in the frequency each time it starts. Enough to be heard by the human ear, although I haven’t measured them.

I’ve just finished a 24 hour test where the system sent 8600 bursts (about 6 Mbyte in each direction) over the link, and everything is running nicely (100% of packets were received). This gives me a lot of confidence in the system. I’d rather know if there are any stability issues now than when the device under test is deployed remotely.

I feel quite happy with that result – there’s quite a lot of signal processing software and hardware that must be playing nicely together to make that happen. Very satisfying.

Next Steps

Now it’s time to put the Pi in a box, connect a real antenna and try some over the air tests. My plan is:

  1. Set up the two terminals several km apart, and see if we can get a viable link at 10 kbit/s, although even 1 kbit/s would be fine for initial tests. Enough margin for 100 kbit/s would be even better, but happy to work on that milestone later.
  2. I’m anticipating some fine tuning of the FSK_LDPC waveforms will be required.
  3. I’m also anticipating problems with urban EMI, which will raise the noise floor and set the SNR of the link. I’ve instrumented the system to measure the noise power at both ends of the link, so I can measure this over time. I can also measure received signal power, and estimate path loss. Knowing the gain of the RTLSDR, we can measure signal power in dBm, and estimate noise power in dBm/Hz.
  4. There might be some EMI from the Pi, lets see what happens when the antenna is close.
  5. I’ll run the frame repeater system over several weeks, debug any stability issues, and collect data on S, N, SNR, and Packet Error Rate.

Reading Further

[1] Open IP over VHF/UHF Part 1 Part 2 Part 3
[2] RpiTx – Radio transmitter software for Raspberry Pis
[3] GitHub repo for this project with build scripts, a project plan and a bunch of command lines I use to run various tests. The latest work in progress will be an open pull request.
[4] RpiTx Group discussion of the PLLC bug discussed above

Open IP over VHF/UHF 3

The goal of this project is to develop a “100 kbit/s IP link” for VHF/UHF using just a Pi and RTLSDR hardware, and open source signal processing software [1]. Since the last post, I’ve integrated a bunch of components and now have a half duplex radio data system running over the bench.

Recent progress:

  1. The Tx and Rx signal processing is now operating happily together on a Pi, CPU load is fine.
  2. The FSK_LDPC modem and FEC [2] has been integrated, so we can now send and receive coded frames. The Tx and Rx command line programs have been modified to send and receive bursts of frames.
  3. I’ve added a PIN diode Transmit/Receive switch, which I developed for the SM2000 project [3]. This is controlled by a GPIO from the Pi. There is also logic to start and stop the Pi Tx carrier at the beginning and end of bursts – so it doesn’t interfere with the Rx side.
  4. I’ve written a “frame repeater” application that takes packets received from the Rx and re-transmits them using the Tx. This will let me run “ping” tests over the air. A neat feature is it injects the received Signal and Noise power into the frame it re-transmits. This will let me measure the received power, the noise floor, and SNR at the remote station.
  5. The receiver in each terminal is very sensitive, and inconveniently picks up frames transmitted by that terminal. After trying a few approaches I settled on a “source filtering” design. When a packet is transmitted, the Tx places a “source byte” in the frame that is unique to that terminal. A one byte MAC address I guess. The local receiver then ignores (filters) any packets with that source address, and only outputs frames from other terminals.

Here is a block diagram of the Pi based terminal, showing hardware and software components:

When I build my next terminal, I will try separate Tx and Rx antennas, as a “minimalist” alternative to the TR switch. The next figure shows the transmit control signals in action. Either side of a burst we need to switch the TR switch and turn the Tx carrier on and off:

Here’s the current half duplex setup on the bench:

Terminal2 is on the left, is comprised of the Pi, RTLSDR, and TR switch. Terminal1 (right) is the HackRF/RTLSDR connected to my laptop. Instead of a TR switch I’m using a hybrid combiner (a 3dB loss, but not an issue for these tests). This also shows how different SDR Tx/Rx hardware can be used with this system.

I’m using 10,000 bit/s for the current work, although that’s software configurable. When I start testing over the air I’ll include options for a range of bit rates, eventually shooting for 100 kbits/s.

Here’s a demo video of the system:

Next Steps

The command lines to run everything are getting unwieldy so I’ll encapsulate them is some “service” scripts to start and stop the system neatly. Then box everything up, try a local RF link, and check for stability over a few days. Once I’m happy I will deploy a terminal and start working through the real world issues. The key to getting complex systems going is taking tiny steps. Test and debug carefully at each step.

It’s coming together quite nicely, and I’m enjoying a few hours of work on the project every weekend. It’s very satisfying to build the layers up one by one, and a pleasant surprise when the pieces start playing nicely together and packets move magically across the system. I’m getting to play with RF, radios, modems, packets, and even building up small parts of a protocol. Good fun!

Reading Further

[1] Open IP over UHF/VHF Part 1 and Part 2.
[2] FSK LDPC Data Mode – open source data mode using a FSK modem and powerful LDPC codes.
[3] SM2000 Part 3 – PIN TR Switch and VHF PA
[4] GitHub repo for this project with build scripts, a project plan and a bunch of command lines I use to run various tests. The latest work in progress will be an open pull request.

Speech Spectral Quantisation using VQ-VAE

As an exercise to learn more about machine learning, I’ve been experimenting with Vector Quantiser Variational AutoEncoders (VQ VAE) [2]. Sounds scary but is basically embedding a vector quantiser in a Neural Network so they train together. I’ve come up with a simple network that quantises 80ms (8 x 10ms frames) of spectral magnitudes in 88 bits (about 1100 bits/s).

I arrived at my current model through trial and error, using this example [1] as a starting point. Each 10ms frame is a vector of energies from 14 mel-spaced filters, derived from LPCNet [6]. The network uses conv1D stages to downsample and upsample the vectors, with a two stage VQ (11 bits per stage) in the Autoencoder “bottleneck”. The VQ is also encoding total frame energy, so the remaining parameters for a vocoder would be pitch and (maybe) voicing.

This work (spectral quantisation) is applicable to “old school” vocoders like Codec 2 and is also being used with newer Neural Vocoders in some research papers.

I haven’t used it to synthesise any speech yet but it sure does make nice plots. This one is a 2D histogram of the encoder space, white dots are the stage 1 VQ entries. The 16 dimensional data has been reduced to 2 dimensions using PCA.

If the VQ is working, we should expect more dots in the brighter colour areas, and less in the darker areas.

Here is a sample input (green) output (red) of 8 frames:

This is a transition region, going from voiced to unvoiced speech. It seems to handle it OK. The numbers are (frame_number, SD), where SD is the Spectral Distortion in dB*dB. When we get a high SD frame, quite often it’s not crazy wrong, more an educated guess that will probably sound OK, e.g. a different interpolation profile for the frame energy across a transition. Formants are mostly preserved.

The VQ seems to be doing something sensible, after 20 epochs I can see most VQ entries are being used, and the SD gets better with more bits. The NN part trains much faster that the VQ.

Here is a histogram of the SDs for each frame:

The average SD is around 7.5 dB*dB, similar to some of the Codec 2 quantisers. However this is measured on every 10ms frame in an 8 frame sequence, so it’s a measure of how well it interpolates/decimates in time as well. As I mentioned above – some of the “misses” that push the mean SD higher are inconsequential.

Possible Bug in Codec 2 700C

I use similar spectral magnitude vectors for Codec 2 700C [5] – however when I tried that data the SD was about double. Hmmm. I looked into it and found some bugs/weaknesses in my approach for Codec 2 700C (for that codec the spectral magnitudes a dependant on the pitch estimator which occasionally loses it). So that was a nice outcome – trying to get the same result two different ways can be a pretty useful test.

Further Work

Some ideas for further work:

  1. Use kmeans for training.
  2. Inject bit errors when training to make it robust to channel errors.
  3. Include filtered training material to make it robust to recording conditions.
  4. Integrate into a codec and listen to it.
  5. Try other networks – I’m still learning how to engineer an optimal network.
  6. Make it work with relu activations, I can only get it to work with tanh.

Reading Further

[1] VQ VAE Keras MNIST Example – my starting point for the VQ-VAE work
[2] Neural Discrete Representation Learning
[3] My Github repo for this work
[4] Good introduction to PCA
[5] Codec 2 700C – also uses VQ-ed mel-spaced vectors
[6] LPCNet: DSP-Boosted Neural Speech Synthesis

FSK LDPC Data Mode

I’m developing an open source data mode using a FSK modem and powerful LDPC codes. The initial use case is the Open IP over UHF/VHF project, but it’s available in the FreeDV API as a general purpose mode for sending data over radio channels.

It uses 2FSK or 4FSK, has a variety of LDPC codes available, works with bursts or streaming frames, and the sample rate and symbol rate can be set at init time.

The FSK modem has been around for some time, and is used for several applications such as balloon telemetry and FreeDV digital voice modes. Bill, VK5DSP, has recently done some fine work to tightly integrate the LDPC codes with the modem. The FSK_LDPC system has been tested over the air in Octave simulation form, been ported to C, and bundled up into the FreeDV API to make using it straight forward from the command line, C or Python.

We’re not using a “black box” chipset here – this is ground up development of the physical layer using open source, careful simulation, automated testing, and verification of our work on real RF signals. As it’s open source the modem is not buried in proprietary silicon so we can look inside, debug issues and integrate powerful FEC codes. Using a standard RTLSDR with a 6dB noise figure, FSK_LDPC is roughly 10dB ahead of the receiver in a sample chipset. That’s a factor of 10 in power efficiency or bit rate – your choice!

Performance

The performance is pretty close to what is theoretically possible for coded FSK [6]. This is about Eb/No=8dB (2FSK) and Eb/No=6dB (4FSK) for error free transmission of coded data. You can work out what that means for your application using:

  MDS = Eb/No + 10*log10(Rb) + NF - 174
  SNR = Eb/No + 10*log10(Rb/B)

So if you were using 4FSK at 100 bits/s, with a 6dB Noise figure, the Minimum Detectable Signal (MDS) would be:

  MDS = 6 + 10*log10(100) + 6 - 174
      = -142dBm

Given a 3kHz noise bandwidth, the SNR would be:

  SNR = 6 + 10*log10(100/3000)
      = -8.8 dB

How it Works

Here is the FSK_LDPC frame design:

At the start of a burst we transmit a preamble to allow the modem to syncronise. Only one preamble is transmitted for each data burst, which can contain as many frames as you like. Each frame starts with a 32 bit Unique Word (UW), then the FEC codeword consisting of the data and parity bits. At the end of the data bits, we reserve 16 bits for a CRC.

This figure shows the processing steps for the receive side:

Unique Word Selection

The Unique Word (UW) is a known sequence of bits we use to obtain “frame sync”, or identify the start of the frame. We need this information so we can feed the received symbols into the LDPC decoder in the correct order.

To find the UW we slide it againt the incoming bit stream and count the number of errors at each position. If the number of errors is beneath a certain threshold – we declare a valid frame and try to decode it with the LDPC decoder.

Even with pure noise (no signal) a random sequence of bits will occasionally get a partial match (better than our threshold) with the UW. That means the occasional dud frame detection. However if we dial up the threshold too far, we might miss good frames that just happen to have a few too many errors in the UW.

So how do we select the length of the UW and threshold? Well for the last few decades I’ve been guessing. However despite being allergic to probability theory I have recently started using the Binominal Distribution to answer this question.

Lets say we have a 32 bit UW, lets plot the Binomial PDF and CDF:


The x-axis is the number of errors. On each graph I’ve plotted two cases:

  1. A 50% Bit Error Rate (BER). This is what we get when no valid signal is present, just random bits from the demodulator.
  2. A 10% bit error rate. This is the worst case where we need to get frame sync – a valid, but low SNR signal. The rate half LDPC codes fall over at about 10% BER.

The CDF tells us “what is the chance of this many or less errors”. We can use it to pick the UW length and thresholds.

In this example, say we select a “vaild UW threshold” of 6 bit errors out of 32. Imagine we are sliding the UW over random bits. Looking at the 50% BER CDF curve, we have a probablity of 2.6E-4 (0.026%) of getting 6 or less errors. Looking at the 10% curve, we have a probablity of 0.96 (96%) of detecting a valid frame – or in other words we will miss 100 – 96 = 4% of the valid frames that just happen to have 7 or more errors in the unique word.

So there is a trade off between false detection on random noise, and missing valid frames. A longer UW helps separate the two cases, but adds some overhead – as UW bits don’t carry any payload data. A lower threshold means you are less likely to trigger on noise, but more likely to miss a valid frame that has a few errors in the UW.

Continuing our example, lets say we try to match the UW on a stream of random bits from off air noise. Because we don’t know where the frame starts, we need to test every single bit position. So at a bit rate of 1000 bits/s we attempt a match 1000 times a second. The probability of a random match in 1000 bits (1 second) is 1000*2.6E-4 = 0.26, or about 1 chance in 4. So every 4 seconds, on average, we will get an accidental UW match on random data. That’s not great, as we don’t want to output garbage frames to higher layers of our system. So a CRC on the decoded data is performed as a final check to determine if the frame is indeed valid.

Putting it all together

We prototyped the system in GNU Octave first, then ported the individual components to stand alone C programs that we can string together using stdin/stdout pipes:

$ cd codec2/build_linux$ cd src/
$ ./ldpc_enc /dev/zero - --code H_256_512_4 --testframes 200 |
  ./framer - - 512 5186 | ./fsk_mod 4 8000 5 1000 100 - - |
  ./cohpsk_ch - - -10.5 --Fs 8000  |
  ./fsk_demod --mask 100 -s 4 8000 5 - - |
  ./deframer - - 512 5186  |
  ./ldpc_dec - /dev/null --code H_256_512_4 --testframes
--snip--
Raw   Tbits: 101888 Terr:   8767 BER: 0.086
Coded Tbits:  50944 Terr:    970 BER: 0.019
      Tpkts:    199 Tper:     23 PER: 0.116

The example above runs 4FSK at 5 symbols/second (10 bits/s), at a sample rate of 8000 Hz. It uses a rate 0.5 LDPC code, so the throughput is 5 bit/s and it works down to -24dB SNR (at around 10% PER). This is what it sounds like on a SSB receiver:

Yeah I know. But it’s in there. Trust me.

The command line programs above are great for development, but unwieldy for real world use. So they’ve been combined into single FreeDV API functions. These functions take data bytes, convert them to samples you send through your radio, then at the receiver back to bytes again. Here’s a simple example of sending some text using the FreeDV raw data API test programs:

$ cd codec2/build_linux/src
$ echo 'Hello World                    ' |
  ./freedv_data_raw_tx FSK_LDPC - - 2>/dev/null |
  ./freedv_data_raw_rx FSK_LDPC - - 2>/dev/null |
  hexdump -C
48 65 6c 6c 6f 20 57 6f  72 6c 64 20 20 20 20 20  |Hello World     |
20 20 20 20 20 20 20 20  20 20 20 20 20 20 11 c6  |              ..|

The “2>/dev/null” hides some of the verbose debug information, to make this example quieter. The 0x11c6 at the end is the 16 bit CRC. This particular example uses frames of 32 bytes, so I’ve padded the input data with spaces.

My current radio for real world testing is a Raspberry Pi Tx and RTLSDR Rx, but FSK_LDPC could be used over regular SSB radios (just pipe the audio into and out of your radio with a sound card), or other SDRs. FSK chips could be used as the Tx (although their receivers are often sub-optimal as we shall see). You could even try it on HF, and receive the signal remotely with a KiwiSDR.

I’ve used a HackRF as a Tx for low level testing. After a few days of tuning and tweaking it works as advertised – I’m getting within 1dB of theory when tested over the bench at rates between 500 and 20000 bits/s. In the table below Minimum Detectable Signal (MDS) is defined as 10% PER, measured over 100 packets. I send the packets arranged as 10 “bursts” of 10 packets each, with a gap between bursts. This gives the acquisition a bit of a work out (burst operation is typically tougher than streaming):

Info bit rate (bits/s) Mode NF (dB) Expected MDS (dBm) Measured MDS (dBm) Si4464 MDS (dBm)
1000 4FSK 6 -132 -131 -123
10000 4FSK 6 -122 -120 -110
5000 2FSK 6 -123 -123 -113

The Si4464 is used as an example of a chipset implementation. The Rx sensitivity figures were extrapolated from the nearest bit rate on Table 3 of the Si4464 data sheet. It’s hard to compare exactly as the Si4664 doesn’t have FEC. In fact it’s not possible to fully utilise the performance of high performance FEC codes on chipsets as they generally don’t have soft decision outputs.

FSK_LDPC can scale to any bit rate you like. The ratio of the sample rate to symbol rate Fs/Rs = 8000/1000 (8kHz, 1000 bits/s) is the same as Fs/Rs = 800000/100000 (800kHz, 100k bits/s), so it’s the same thing to the modem. I’ve tried FSK_LDPC between 5 and 40k bit/s so far.

With a decent LNA in front of the RTLSDR, I measured MDS figures about 4dB lower at each bit rate. I used a rate 0.5 code for the tests to date, but other codes are available (thanks to Bill and the CML library!).

There are a few improvements I’d like to make. In some tests I’m not seeing the 2dB advantage 4FSK should be delivering. Syncronisation is trickier for 4FSK, as we have 4 tones, and the raw modem operating point is 2dB further down the Eb/No curve than 2FSK. I’d also like to add some GFSK style pulse shaping to make the Tx spectrum cleaner. I’m sure some testing over real world links will also show up a few issues.

It’s fun building, then testing, tuning and pushing through one bug after another to build your very own physical layer! It’s a special sort of magic when the real world results start to approach what the theory says is possible.

Reading Further

[1] Open IP over UHF/VHF Part 1 and Part 2 – my first use case for the FSK_LDPC protocol described in this post.
[2] README_FSK – recently updated documentation on the Codec 2 FSK modem, including lots of examples.
[3] README_data – new documentation on Codec 2 data modes, including the FSK_LDPC mode described in this post.
[4] 4FSK on 25 Microwatts – Bill and I sending 4FSK signals across Adelaide, using an early GNU Octave simulation version of the FSK_LDPC mode described in this post.
[5] Bill’s LowSNR blog.
[6] Coded Modulation Library Overview – CML is a wonderful library that we are using in Codec 2 for our LDPC work. Slide 56 tells us the theoretical mininum Eb/No for coded FSK (about 8dB for 2FSK and 6dB for 4FSK).
[7] 4FSK LLR Estimation Part 2 – GitHub PR used for development of the FSK_LDPC mode.

RTLSDR Strong Signals and Noise Figure

I’ve been exploring the strong and weak signal performance of the RTLSDR. This all came about after Bill and I performed some Over the Air tests using the RTLSDR. We found that if we turned the gain all the way up, lots of birdies and distortion appeared. However if we wind the gain back to improve strong signal performance, the Noise Figure (NF) increases, messing with our modem link budgets.

Fortunately, there’s been a lot of work on the RTLSDR internals from the open source community. So I’ve had a fun couple of weeks drilling down into the RTLSDR drivers and experimenting. It’s been really interesting, and I’ve learnt a lot about the trade offs in SDR. For USD$22, the RTLSDR is a fine teaching/learning tool – and a pretty good radio.

Strong and Weak Signals

Here is a block diagram of the RTLSDR as I understand it:

It’s a superhet, with an IF bandwidth of 2MHz or less. The IF is sampled by an 8-bit ADC that runs at 28 MHz. The down sampling (decimation) from 28MHz to 2MHz provides some “processing gain” which results in a respectable performance. One part I don’t really understand is the tracking BPF, but I gather it’s pretty broad, and has no impact on strong signals a few MHz away.

There are a few ways for strong signals to cause spurious signals to appear:

  1. A -30dBm signal several MHz away will block the LNA/mixer analog stages at the input. For example a pager signal at 148.6MHz while you are listening to 144.5 MHz. This causes a few birdies to gradually appear, and some compression of your wanted signal.
  2. A strong signal a few MHz away can be aliased into your passband, as the IF filter stop band attenuation is not particularly deep.
  3. A -68dBm signal inside the IF bandwidth will overload the ADC, and the whole radio will fall in a heap.

The levels quoted above are for maximum gain (-g 49), and are consistent with [1] and [5]. If you reduce the gain, the overload levels get higher, but so does your noise figure. You can sometimes work around the first two issues, e.g. if the birdies don’t happen to fall right on top of your signal they can be ignored. So the first two effects – while unfortunate – tend to be more benign than ADC overload.

At the weak signal end of the operating range, we are concerned about noise. Here is how I model the various noise contributions:

The idea of a radio is to use the tuner to remove the narrow band unwanted signals, leaving just your wanted signal. However noise tends to be evenly distributed in frequency, so we are stuck with any noise that is located in the bandwidth of our wanted signal. A common technique is to have enough gain before the ADC such that the signal being sampled is large compared to the ADC quantisation noise. That way we can ignore the noise contribution from the ADC.

However with strong signals, we need to back off the gain to prevent overload. Now the ADC noise becomes significant, and the overall NF of the radio increases.

Another way of looking at this – if the gain ahead of the ADC is small, we have a smaller signal hitting the ADC, which will toggle less bits of the ADC, resulting in coarse quantisation (more quantisation noise) from the ADC.

The final decimation stage reduces ADC quantisation noise. This figure shows our received signal (a single frequency spike), and the ADC quantisation noise (continuous line, with energy at every frequency):

The noise power is the sum of all the noise in our bandwidth of interest. The decimation filter limits this bandwidth, removing most of the noise except for the small band near our wanted signal (shaded blue area). So the total noise power is summed over a smaller bandwidth, noise power is reduced and our SNR goes up. This means that despite just being 8 bits, the ADC performs reasonably well.

Off Air Strong Signals

Here is what my spectrum analyser sees when connected to my antenna. It’s 10MHz wide, centred on 147MHz. There are some strong pager signals that bounce around the -30dBm level, plus some weaker 2 metre band Ham signals between -80 and -100dBm.

When I tune to 144.5MHz, that pager signal is outside the IF bandwidth, so I don’t get a complete breakdown of the radio. However it does cause some strong signal compression, and some birdies/aliased signals pop up. Here is a screen shot from gqrx when the pager signal is active:

In this plot, the signal right on 144.5 is my wanted signal, a weak signal I injected on the bench. The hump at 144.7 is an artefact of the pager signal, which due to strong signal compression just happens to to appear close to (but not on top of) my wanted signal. The wider hump is the IF filter. Here’s a closer look of the IF filter, set to 100kHz using the gqrx “Bandwidth” field:

To see the IF filter shape I connected a terminated LNA to the input of the RTLSDR. This generates wideband noise that is amplified and can be used to visualise the filter shape.

There has been some really cool recent work exploring the IF filtering capabilities of the R820T2 [2]. I tried a few of the newly available IF filter configurations. My experience was the shape factor isn’t great (the filters roll off slowly), so the IF filters don’t have a huge impact on close-in strong signal performance. They do help with attenuating aliased signals.

It was however a great learning and exploring experience, a real deep dive into SDR.

Gqrx is really cool for this sort of work. For example if you tune a strong signal from a signal generator either side of the IF passband, you can see aliases popping up and zooming across the screen. It’s also possible to link gqrx with different RTLSDR driver libraries, to explore their performance. I use the UDP output feature to send samples to my noise figure measuring tool.

Airspy and RTLSDR

The Airspy uses the same tuner chip so I tested it and found about the same strong signal results, i.e. it overloads at the same signal levels as the RTLSDR at max gain. Guess this is no surprise if it’s the same tuner. I once again [5] measured the Airspy noise figure at about 7dB, slightly higher than the 6dB of the RTLSDR. This is consistent with other measurements [1], but quite a way from Airspy’s quoted figure (3.5dB).

This is a good example of the high sample rate/decimation filter architecture in action – the 8-bit RTLSDR delivers a similar noise figure to the 12-bit Airspy.

But the RTLSDR NF probably doesn’t matter

I’ve spent a few weeks spent peering into driver code, messing with inter-stage gains, hammering the little RTLSDR with strong RF signals, and optimising noise figures. However it might all be for naught! At both my home and Bills – external noise appears to dominate. On the 2M band (144.5MHz) we are measuring noise from our (omni) antennas about 20dB above thermal, e.g. -150dBm/Hz compared to the ideal thermal noise level of -174dBm/Hz. Dreaded EMI from all the high speed electronics in urban environments.

EMI can look at lot like strong signal overload – at high gains all these lines pop up, just like ADC overload. If you reduce the gain a bit they drop down into the noise, although its a smooth reduction in level unlike ADC overload which is very abrubt. I guess we are seeing are harmonics of switching power supply signals or other nearby digital devices. Rather than an artefact of Rx overload, we are seeing a sensitive receiver detecting weak EMI signal right down near the noise floor.

So this changes the equation – rather than optimising internal NF we need to ensure enough link margin to get over the ambient noise. An external LNA won’t help, and even a lossy coax run loss might not matter much, as the SNR is more or less set at the antenna.

I’ve settled on the librtlsdr RTLSDR driver/library for my FSK modem experiments as it supports IF filter and inter-stage gain control. I also have a mash up called rtl_fsk.c where I integrate a FSK modem and some powerful LDPC codes, but that’s another story!

Reading Further

[1] Evaluation of SDR Boards V1.0 – A fantastic report on the performance of several SDRs.
[2] RTLSDR: driver extensions – a very interesting set of conference slides discussing recent work with the R820T2 by Hayati Aygun. Many useful links.
[3] librtlsdr fork of rtlsdr driver library that includes support for the IF filter configuration and individual gain stage control discussed in [2].
[4] My fork of librtlsdr – used for NF measurements and rtl_fsk mash up development.
[5] Some Measurements on E4000 and R802T tuners – Detailed look at the tuner by HB9AJG. Fig 5 & 6 bucket curves show overload levels consistent with [1] and my measurements.
[6] Measuring SDR Noise Figure in Real Time