ARM NEON Optimisation

I’ve been trying to optimise NEON DSP code on a Raspberry Pi. Using the intrinsics I managed to get a speed increase of about 3 times over vanilla C with just a few hours work. However the results are still significantly slower than the theoretical speed of the machine, which is 4 multiply-acculumates (8 float operations) per cycle. On a 1.2 GHz core that’s 9.6 GFLOPs.

Since then I’ve been looking at ARM manuals, Googling, and trying various ad-hoc ideas. There is a lack of working, fully optimised code examples, and I can’t find any data on cycle times and latency information for the Cortex A53 device used for the Rpi. The number of ARM devices and families is bewildering, and trying to find information in a series of thousand-page manuals daunting.

Fortunately the same NEON assembler seems to work (i.e. it assembles cleanly and you get the right results) on many ARM machines. It’s just unclear how fast it runs and why.

To get a handle on the problem I wrote a series of simple floating point dot product programs, and attempted to optimise them. Each program runs through a total of 1E9 dot product points, using an inner and outer loop. I made the inner loop pretty small (1000 floats) to try to avoid cache miss issues. Here are the results, using cycle counts measured with “perf”:

Program Test Theory cycles/loop Measured cycles/loop GFLOPS
dot1 Dot product no memory reads 1 4 1.2*8/4 = 1.2
dot2 Dot product no memory reads unrolled 1 1 1.2*8/1 = 9.6
dot3 Dot product with memory reads 3 9.6 1.2*8/9.6 = 1
dot4 Dot product with memory reads assembler 3 6.1 1.2*8/6.1 = 1.6
dotne10 Dot product with memory reads Ne10 3 11 1.2*8/11 = 0.87

Cycles/loop is how many cycles are executed for one iteration of the inner loop. The last column assumes a 1.2 GHz clock, and 8 floating point ops for every NEON vector multiply-accumulate (vmul.f32) instruction (a multiply, an add, 4 floats per vector processed in parallel).

The only real success I had was dot2, but that’s an unrealistic example as it doesn’t read memory in the inner loop. I guessed that the latencies in the NEON pipeline meant an unrolled loop would work better.

Assuming (as I can’t find any data on instruction timing) two cycles for the memory reads, and one for the multiply-accumulate, I was hoping at 3 cycles for dot3 and dot4. Maybe even better if there is some dual issue magic going on. Best I can do is 6 cycles.

I’d rather have enough information to “engineer” the system than have to rely on guesses. I’ve worked on many similar DSP optimisation projects in the past which have had data sheets and worked examples as a starting point.

Here is the neon-dot source code on GitLab. If you can make the code run faster – please send me a patch! The output looks something like:

$ make test
sum: 4e+09 FLOPS: 8e+09
sum: 4e+09 FLOPS: 8e+09
sum: 4.03116e+09 target cycles: 1e+09 FLOPS: 8e+09
sum: 4.03116e+09 target cycles: 1e+09 FLOPS: 8e+09
FLOPS: 4e+09
grep cycles dot_log.txt
     4,002,420,630      cycles:u    
     1,000,606,020      cycles:u    
     9,150,727,368      cycles:u
     6,361,410,330      cycles:u
    11,047,080,010      cycles:u

The dotne10 program requires the Ne10 library. There’s a bit of floating point round off in some of the program outputs (adding 1.0 to a big number), that’s not really a bug.

Some resources I did find useful:

  1. tterribe NEON tutorial. I’m not sure if the A53 has the same cycle timings as the Cortex-A discussed in this document.
  2. ARM docs, I looked at D0487 ARMv8 Arch Ref Manual, DDI500 Cortex A53 TRM, DDI502 Cortex A53 FPU TRM, which both reference the DEN013 ARM Cortex-A Series Programmer’s Guide. Couldn’t find any instruction cycle timing in any of them, but section 20.2 of DEN013 had some general tips.
  3. Linux perf was useful for cycle counts, and in record/report mode may help visualise pipeline stalls (but I’m unclear if that’s what I’m seeing due to my limited understanding).

Porting a LDPC Decoder to a STM32 Microcontroller

A few months ago, FreeDV 700D was released. In that post, I asked for volunteers to help port 700D to the STM32 microcontroller used for the SM1000. Don Reid, W7DMR stepped up – and has been doing a fantastic job porting modules of C code from the x86 to the STM32.

Here is a guest post from Don, explaining how he has managed to get a powerful LDPC decoder running on the STM32.

LDPC for the STM32

The 700D mode and its LDPC function were developed and used on desktop (x86) platforms. The LDPC decoder is implemented in the mpdecode_core.c source file.

We’d like to run the decoder on the SM1000 platform which has an STM32F4 processor. This requires the following changes:

  • The code used doubles in several places, while the stm32 has only single precision floating point hardware.
  • It was thought that the memory used might be too much for a system with just 192k bytes of RAM.
  • There are 2 LDPC codes currently supported, HRA_112_112 used in 700D and, H2064_516_sparse used for Balloon Telemetry. While only the 700D configuration needed to work on the STM32 platform, any changes made to the mainstream code needed to work with the H2064_516_sparse code.


Before making changes it was important to have a well defined test process to validate new versions. This allowed each change to be validated as it was made. Without this the final debugging would likely have been very difficult.

The ldpc_enc utility can generate standard test frames and the ldpc_dec utility receive the frames and measure bit errors. So errors can be detected directly and BER computed. ldpc_enc can also output soft decision symbols to emulate what the modem would receive and pass into the LDPC decoder. A new utility ldpc_noise was written to add AWGN to the sample values between the above utilities. here is a sample run:

$ ./ldpc_enc /dev/zero - --sd --code HRA_112_112 --testframes 100 | ./ldpc_noise - - 1 | ./ldpc_dec - /dev/null --code HRA_112_112 --sd --testframes
single sided NodB = 1.000000, No = 1.258925
code: HRA_112_112
code: HRA_112_112
Nframes: 100
CodeLength: 224 offset: 0
measured double sided (real) noise power: 0.640595
total iters 3934
Raw Tbits..: 22400 Terr: 2405 BER: 0.107
Coded Tbits: 11200 Terr: 134 BER: 0.012

ldpc_noise is passed a “No” (N-zero) level of 1dB, Eb=0, so Eb/No = -1, and we get a 10% raw BER, and 1% after LDPC decoding. This is a typical operating point for 700D.

A shell script (ldpc_check) combines several runs of these utilities, checks the results, and provides a final pass/fail indication.

All changes were made to new copies of the source files (named *_test*) so that current users of codec2-dev were not disrupted, and so that the behaviour could be compared to the “released” version.

Unused Functions

The code contained several functions which are not used anywhere in the FreeDV/Codec2 system. Removing these made it easier to see the code that was used and allowed the removal of some variables and record elements to reduce the memory used.

First Compiles

The first attempt at compiling for the stm32 platform showed that the the code required more memory than was available on the processor. The STM32F405 used in the SM1000 system has 128k bytes of main RAM.

The largest single item was the DecodedBits array which was used to saved the results for each iteration, using 32 bit integers, one per decoded bit.

    int *DecodedBits = calloc( max_iter*CodeLength, sizeof( int ) );

This used almost 90k bytes!

The decode function used for FreeDV (SumProducts) used only the last decoded set. So the code was changed to save only one pass of values, using 8 bit integers. This reduced the ~90k bytes to just 224 bytes!

The FreeDV 700D mode requires on LDPC decode every 160ms. At this point the code compiled and ran but was too slow – using around 25ms per iteration, or 300 – 2500ms per frame!

C/V Nodes

The two main data structures of the LDPC decoder are c_nodes and v_nodes. Each is an array where each node contains additional arrays. In the original code these structures used over 17k bytes for the HRA_112_112 code.

Some of the elements of the c and v nodes (index, socket) are indexes into these arrays. Changing these from 32 bit to 16 bit integers and changing the sign element into a 8 bit char saved about 6k bytes.

The next problem was the run time. Each 700D frame must be fully processed in 160 ms and the decoder was taking several times this long. The CPU load was traced to the phi0() function, which was calling two maths library functions. After optimising the phi0 function (see below) the largest use of time was the index computations of the nested loops which accessed these c and v node structures.

With each node having separate arrays for index, socket, sign, and message these indexes had to be computed separately. By changing the node structures to hold an array of sub-nodes instead this index computation time was significantly reduced. An additional benefit was about a 4x reduction in the number of memory blocks allocated. Each allocation block includes additional memory used by malloc() and free() so reducing the number of blocks reduces memory use and possible heap fragmentation.

Additional time was saved by only calculating the degree elements of the c and v nodes at start-up rather than for every frame. That data is kept in memory that is statically allocated when the decoder is initialized. This costs some memory but saves time.

This still left the code calling malloc several hundred times for each frame and then freeing that memory later. This sort of memory allocation activity has been known to cause troubles in some embedded systems and is usually avoided. However the LDPC decoder needed too much memory to allow it to be statically allocated at startup and not shared with other parts of the code.

Instead of allocating an array of sub-nodes for each c or v node, a single array of bytes is passed in from the parent. The initialization function which calculates the degree elements of the nodes also counts up the memory space needed and reports this to its caller. When the decoder is called for a frame, the node’s pointers are set to use the space of this array.

Other arrays that the decoder needs were added to this to further reduce the number of separate allocation blocks.

This leaves the decisions of how to allocate and share this memory up to a higher level of the code. The plan is to continue to use malloc() and free() at a higher level initially. Further testing can be done to look for memory leakage and optimise overall memory usage on the STM32.


There is a non linear function named “phi0” which is called inside several levels of nested loops within the decoder. The basic operation is:

   phi0(x) = ln( (e^x + 1) / (e^x - 1) )

The original code used double precision exp() and log(), even though the input, output, and intermediate values are all floats. This was probably an oversight. Changing to the single single precision versions expf() and logf() provided some improvements, but not enough to meet our CPU load goal.

The original code used piecewise approximation for some input values. This was extended to cover the full range of inputs. The code was also structured differently to make it faster. The original code had a sequence of if () else if () else if () … This can take a long time when there are many steps in the approximation. Instead two ranges of input values are covered with linear steps that is implemented with table lookups.

The third range of inputs in non linear and is handled by a binary tree of comparisons to reduce the number of levels. All of this code is implemented in a separate file to allow the original or optimised version of phi0 to be used.

The ranges of inputs are:

             x >= 10      result always 0
      10   > x >=  5      steps of 1/2
       5   > x >= 1/16    steps of 1/16
    1/16   > x >= 1/4096  use 1/32, 1/64, 1/128, .., 1/4096
    1/4096 > x            result always 10

The range of values that will appear as inputs to phi0() can be represented with as fixed point value stored in a 32 bit integer. By converting to this format at the beginning of the function the code for all of the comparisons and lookups is reduced and uses shifts and integer operations. The step levels use powers of 2 which let the table index computations use shifts and make the fraction constants of the comparisons simple ones that the ARM instruction set can create efficiently.


Two of the configuration values are scale factors that get multiplied inside the nested loops. These values are 1.0 in both of the current configurations so that floating point multiply was removed.


The optimised LDPC decoder produces the same output BER as the original.

The optimised decoder uses 12k of heap at init time and needs another 12k of heap at run time. The original decoder just used heap at run time, that was returned after each call. We have traded off the use of static heap to clean up the many small heap allocations and reduce execution time. It is probably possible to reduce the static space further perhaps at the cost of longer run times.

The maximum time to decode a frame using 100 iterations is 60.3 ms and the median time is 8.8 ms, far below our budget of 160ms!

Future Possibilities

The remaining floating point computations in the decoder are addition and subtraction so the values could be represented with fix point values to eliminate the floating point operations.

Some values which are computed from the configuration (degree, index, socket) are constants and could be generated at compile time using a utility called by cmake. However this might actually slow down the operation as the index computations might become slower.

The index and socket elements of C and V nodes could be pointers instead of indexes into arrays.

Experiments would be required to ensure these changes actually speed up the decoder.


Don got his first amateur license in high school but was soon distracted with getting an engineering degree (BSEE, Univ. of Washington), then family and life. He started his IC design career with the CPU for the HP-41C calculator. Then came ICs for printers and cameras, work on IC design tools, and some firmware for embedded systems. Exposure to ARES public service lead to a new amateur license – W7DMR and active involvement with ARES. He recently retired after 42 years and wanted to find an open project that combined radio, embedded systems and DSP.

Don lives in Corvallis, Oregon, USA a small city with the state technical university and several high tech companies.

Open Source Projects and Volunteers

Hi it’s David back again ….

Open source projects like FreeDV and Codec 2 rely on volunteers to make them happen. The typical pattern is people get excited, start some work, then drift away after a few weeks. Gold is the volunteer that consistently works week in, week out until their particular project is done. The number of hours/week doesn’t matter – it’s the consistency that is really helpful to the projects. I have a few contributors/testers/users of FreeDV in this category and I appreciate you all deeply – thank you.

If you would like to help out, please contact me. You’ll learn a lot and get to work towards an open source future for HF radio.

If you can’t help out technically, but would like to support this work, please consider Patreon or PayPal.

Reading Further

LDPC using Octave and the CML library. Our LDPC decoder comes from Coded Modulation Library (CML), which was originally used to support Matlab/Octave simulations.

Horus 37 – High Speed SSTV Images. The CML LDPC decoder was converted to a regular C library, and used for sending images from High Altitude Balloons.

Steve Ports an OFDM modem from Octave to C. Steve is another volunteer who put in a fine effort on the C coding of the OFDM modem. He recently modified the modem to handle high bit rates for voice and HF data applications.

Rick Barnich KA8BMA did a fantastic job of designing the SM1000 hardware. Leading edge, HF digital voice hardware, designed by volunteers.

Band Pass Filter and Power Amplifier for Simple HF Data

Is it possible to move data over HF radio using very simple, low cost hardware and clever SDR software? In the last few posts (here and here) I’ve been constructing and testing building blocks for a simple HF data terminal. This post describes a few more, a 3-8 MHz Band Pass Filter (BPF) and 1W Power Amplifier (PA).

Band Pass Filter

The RTL-SDR samples at 28.8 MHz, capturing a broad chunk of spectrum. In direct mode we just sample the Q-channel, so any energy above 14.4 MHz will be aliased into our passband; e.g. both 21 and 7 MHz will appear as a 7 MHz sampled signal.

In the previous post we determined the ADC overloads at -30dBm, so we want to remove any strong signals above or near that level. One source of strong signals is broadcast band AM radio between 500 to 1600 kHz.

The use case is “100 mile” data links so I’d like the receiver to work on the 80M (3.5 MHz) as well as 40M (7.1 MHz) bands, which sets the BPF passband at 3 to 8 MHz. I hooked up my spec-an to a 40M antenna and could see AM broadcast signals peaking at -40dBm, so I set a BPF specification of > 20dB attenuation at 1.5 MHz to keep the sum of all those signals well away from the -30dBm limit. At the high frequency end I specified at > 30dB attenuation at 21 MHz, to reduce any energy aliased down to 7 MHz.

I designed a cascaded High Pass Low Pass/Filter using some tables from my ancient (but still excellent) copy of “RF Circuit Design”, by Chris Bowick. The Octave rtl_sdr script does the calculations for me. A spreadsheet would work well too.

I simulated the BPF using LTSpice, fixed a few bugs, and tweaked it for real world component values. Here is the circuit and frequency response on log and linear scales:

I soldered up the BPF Manhattan style using commercial axial 1uH inductors and ceramic capacitors, then tested it using the spec-an and tracking generator (note linear scale):

The table at the bottom shows the measured attenuation at some important frequencies. The attenuation is a bit low at 21 MHz, perhaps due to the finite Q of the real world inductors. Quite a good match to the LTSpice simulation and close enough for my experiments. The little step at around 10 MHz is a tracking generator artefact.

The next plot shows the effect of the BPF when my spec-an is connected to my 40M dipole (0 to 10MHz span). Yellow is the received signal without the filter, purple with the filter.

The big spike around 0 Hz is an artefact on the spec-an. The filter is doing a good job of nailing the AM broadcast band energy. You can see a peak around 7.4 MHz where the dipole is resonant. Actually this is a bit of a surprise to me, as I want it resonant around 7.2MHz, I better check that out! At 7.2-ish the insertion loss (difference between the purple and yellow) is a few dB as per the tracking generator plot above. It’s more like 6dB at 7.4 MHz (the dipole peak), not quite sure why. An insertion loss of 3dB at 7.2 MHz is OK for my application.

Power Amplifier

A few weeks ago I hooked the rpitx to my 40M dipole and managed to demodulate the 11mW signal a few km away (over an urban channel) using a mag loop and my FT-817. I decided to build a small 1W PA to make the system usable over “100 mile” HF channels. The actual power is not that critical, as we can trade power off against bit rate. For example if a given HF channel supports 100 bit/s at 1W, we then know we can get 1000 bit/s at 10W.

Even low bit rates can be very useful if you have no other communication. A text message or Tweet, allowing for some overhead, averages about 1000 bits. So at 1000 bit/s you can send 1 txt per second, 3600 an hour, or 86,000/day. That’s very useful communication if you are in a disaster situation and want to tell family you are alive. Or perhaps live in a remote area with no other communication. Of course HF channels come and go, so the actual throughput will be less than that.

I explored the junk box and found a partially constructed Beach 40. I isolated the driver and PA stage and poked it with my signal generator. Turns out it had a bit too much gain (the rpitx has plenty of drive) so I ended up with this simple PA circuit:

The only spurious output I can see is the 2nd harmonic is at -44 dBC, meeting ACMA specs:

The low pass filter at the output has a 3dB point at about 10 MHz which is a little high. It could be brought down a little to increase stop-band attenuation and reduce the 2nd harmonic further. I haven’t done anything about impedance matching the input, as it hits 1W (30dBm) output with 14dBm drive from the rpitx. The 1 inch square heatsink is quite warm after 10 minutes but I can still hold it. It’s not very efficient, 2.9W DC input power for 1W out, however 16dB power gain is quite good for a PA. Anyhoo, it’s a fine starting point for my experiments, we can optimise the PA later if necessary.

Next Steps

OK, so I have most of the building blocks I need for some over the air HF data experiments. There was a bit of engineering involved in building the BPF and PA, but the designs are very simple and can be constructed for a few $ or even from road kill (recycled) components. We now have a very low cost HF data radio, running high performance modems, connected to a Linux computer and Wifi.

Next I will put some software together to estimate data throughput, set the system up with real antennas, and gather some experimental results over real world HF channels.

Reading Further

Rpitx and 2FSK, first part in this series.
Testing a RTL-SDR with FSK on HF, second part in this series.
rtl_sdr.m script that calculates component values for the BPF.

Testing a RTL-SDR with FSK on HF

There’s a lot of discussion about ADC resolution and SDRs. I’m trying to develop a simple HF data system that uses RTL-SDRs in “Direct Sample” mode. This blog post describes how I measured the Minimum Detectable Signal (MDS) of my 100 bit/s 2FSK receiver, and a spreadsheet model of the receiver that explains my results.

Noise in a receiver comes from all sorts of places. There are two sources of concern for this project – HF band noise and ADC quantisation noise. On lower HF frequencies (7MHz and below) I’m guess-timating signals weaker than -100dBm will be swamped by HF band noise. So I’d like a receiver that has a MDS anywhere under that. The big question is, can we build such a receiver using a low cost SDR?

Experimental Set Up

So I hooked up the experimental setup in the figure below:

The photo shows the actual hardware. It was spaced apart a bit further for the actual test:

Rpitx is sending 2FSK at 100 bit/s and about 14dBm Tx power. It then gets attenuated by some fixed and variable attenuators to beneath -100dBm. I fed the signal into a RTL-SDR plugged into my laptop, demodulated the 2FSK signal, and measured the Bit Error Rate (BER).

I tried a command line receiver:

rtl_sdr -s 1200000 -f 7000000 -D 2 - | csdr convert_u8_f | csdr shift_addition_cc `python -c "print float(7000000-7177000)/1200000"` | csdr fir_decimate_cc 25 0.05 HAMMING | csdr bandpass_fir_fft_cc 0 0.1 0.05 | csdr realpart_cf | csdr convert_f_s16 | ~/codec2-dev/build_linux/src/fsk_demod 2 48000 100 - - | ~/codec2-dev/build_linux/src/fsk_put_test_bits -

and also gqrx, using this configuration:

with the very handy UDP output option sending samples to the FSK demodulator:

$ nc -ul 7355 | ./fsk_demod 2 48000 100 - - | ./fsk_put_test_bits -

Both versions demodulate the FSK signal and print the bit error rate in real time. I really love the csdr tools, and gqrx is also great for a more visual look at the signal and the ability to monitor the audio.

For these tests the gqrx receiver worked best. It attenuated nearby interferers better (i.e. better sideband rejection) and worked at lower Rx signal levels. It also has a “hardware AGC” option that I haven’t worked out how to enable in the command line tools. However for my target use case I’ll eventually need a command line version, so I’ll have to improve the command line version some time.

The RF Gods are smiling on me today. This experimental set up actually works better than previous bench tests where we needed to put the Tx in another room to get enough isolation. I can still get 10dB steps from the attenuator at -120dBm (ish) with the Tx a few feet from the Rx. It might be the ferrites on the cable to the switched attenuator.

I tested the ability to get solid 10dB steps using a CW (continuous sine wave) signal using the “test” utility in rpitx. FSK bounces about too much, especially with the narrow spec an settings I need to measure weak signals. The configuration of the Rigol DSA815 I used to see the weak signals is described at the end of this post on the SM2000.

The switched attenuator just has 10dB steps. I am getting zero bit errors at -115dBm, and the modem fell over on the next step (-125dBm). So the MDS is somewhere in between.


This spreadsheet (click for the file) models the receiver:

By poking the RTL-SDR with my signal generator, and plotting the output waveforms, I worked out that it clips at around -30dBm (a respectable S9+40dB). So that’s the strongest signal it can handle, at least using the rtl_sdr command line options I can find. Even though it’s an 8 bit ADC I figure there are 7 magnitude bits (the samples are unsigned chars). So we get 6dB per bit or 42dB dynamic range.

This lets us work out the the power of the quantisation noise (42dB beneath -30dBm). This noise power is effectively spread across the entire bandwidth of the ADC, a little bit of noise power for each Hz of bandwidth. The bandwidth is set by the sample rate of the RTL-SDRs internal ADC (28.8 MHz). So now we can work out No (N-nought), the power/unit Hz of bandwidth. It’s like a normalised version of the receiver “noise floor”. An ADC with more bits would have less quantisation noise.

There follows some modem working which gives us an estimate of the MDS for the modem. The MDS of -117.6dBm is between my two measurements above, so we have a good agreement between this model and the experimental results. Cool!

Falling through the Noise Floor

The “noise floor” depends on what you are trying to receive. If you are listening to wide 10kHz wide AM signal, you will be slurping up 10kHz of noise, and get a noise power of:

-146.6+10*log10(10000) = -106.6 dBm

So if you want that AM signal to have a SNR of 20dB, you need a received signal level of -86.6dB to get over the quantisation noise of the receiver.

I’m trying to receive low bit rate FSK which can handle a lot more noise before it falls over, as indicated in the spreadsheet above. So it’s more robust to the quantisation noise and we can have a much lower MDS.

The “noise floor” is not some impenetrable barrier. It’s just a convention, and needs to be considered relative to the bandwidth and type of the signal you want to receive.

One area I get confused about is noise bandwidth. In the model above I assume the noise band width is the same as the ADC sample rate. Please feel free to correct me if that assumption is wrong! With IQ signals we have to consider negative frequencies, complex to real conversions, which all affects noise power. I muddle through this when I play with modems but if anyone has a straight forward explanation of the noise bandwidth I’d love to hear it!

Blocking Tests

At the suggestion of Mark, I repeated the MDS tests with a strong CW interferer from my signal generator. I adjusted the Sig Gen and Rx levels until I could just detect the FSK signal. Here are the results, all in dBm:

Sig Gen 2FSK Rx MDS Difference
-51 -116 65
-30 -96 66

The FSK signal was at 7.177MHz. I tried the interferer at 7MHz (177 kHz away) and 7.170MHz (just 7 kHz away) with the same blocking results. I’m pretty impressed that the system can continue to work with a 65dB stronger signal just 7kHz away.

So the interferer desensitises the receiver somewhat. When listening to the signal on gqrx, I can hear the FSK signal get much weaker when I switch the Sig Gen on. However it often keeps demodulating just fine – FSK is not sensitive to amplitude.

I can also hear spurious tones appearing; the quantisation noise isn’t really white noise any more when a strong signal is present. Makes me feel like I still have a lot to learn about this SDR caper, especially direct sampling receivers!

As with the MDS results – my blocking results are likely to depend on the nature of the signal I am trying to receive. For example a SSB signal or higher data rate might have different blocking results.

Still, 65dB rejection on a $27 radio (at least for my test modem signal) is not too shabby. I can get away with a S9+40dB (-30dBm) interferer just 7kHz away with my rx signal near the limits of where I want to detect (-96dBm).


So I figure for the lower HF bands this receivers performance is OK – the ADC quantisation noise isn’t likely to impact performance and the strong signal performance is good enough. An overload of -30dBm (S9+40dB) is also acceptable given the use case is remote communications where there is unlikely to be any nearby transmitters in the input filter passband.

The 100 bit/s signal is just a starting point. I can use that as a reference to help me understand how different modems and bit rates will perform. For example I can increase the bit rate to say 1000 bit/s 2FSK, increasing the MDS by 10dB, and still be well beneath my -100dBm MDS target. Good.

If it does falls over in the real world due to MDS performance, overload or blocking, I now have a good understanding of how it works so it will be possible to engineer a solution.

For example a pre-amp with X dB gain would lower the quantisation noise power by X dB and allow us to detect weaker signals but then the Rx would overload at -30-X dB. If we have strong signal problems but our target signal is also pretty strong we can insert an attenuator. If we drop in another SDR I can recompute the quantisation noise from it’s specs, and estimate how well it will perform.

Reading Further

Rpitx and 2FSK, first part in this series.
Spreadsheet used to do the working for the quantisation noise.

Rpitx and 2FSK

This post describes tests to evaluate the use of rpitx as a 2FSK transmitter.

About 10 years ago I worked on the Village Telco – a system for community telephone networks based on WiFi. At the time we used WiFi SoCs which were open source at the OS level, but the deeper layers were completely opaque which led (at least for me) to significant frustration.

Since then I’ve done a lot of work on the physical layer, in particular building my skills in modem implementation. Low cost SDR has also become a thing, and CPU power has dropped in price. The physical layer is busy migrating from hardware to software. Software can, and should, be free.

So now we can build open source radios. No more chip sets and closed source.

Sadly, some other aspects haven’t changed. In many parts of the world it’s still very difficult (and expensive) to move an IP packet over the “last 100 miles”. So, armed with some new skills and technology, I feel it’s time for another look at developing world and humanitarian communications.

I’m exploring the use rpitx as the heart of HF and UHF data terminals. This clever piece of software turns a Raspberry Pi into a radio transmitter. Evariste (F5OEO) the author of rpitx, has recently developed the v2beta branch that has improved performance, and includes some support for FreeDV waveforms.

Running Tests

I have some utilities for the Codec 2 FSK modem that generate frames of test bits. I modified the fsk_mod_ext_vco utility to drive a utility Evariste kindly wrote for FreeDV experiments with rpitx. So here are the command lines that generate 600 seconds (10 minutes) of 100 bit/s 2FSK test frames, and transmit them out of rpitx, using a 7.177 MHz carrier frequency:

$ ./fsk_get_test_bits - 60000 | ./fsk_mod_ext_vco - ~/rpitx/2fsk_100.f 2 --rpitx 800 100
~/rpitx $ sudo ./freedv 2fsk_100.f 7177000

On the receive side I used my FT-817 connected to FreeDV to sample the signal as a wave file, then fed the signal into C and Octave versions of the demodulator. The RPi is top left at rear, the HackRF in the foreground was used initially as a reference transmitter:


It works really well! One of the most important tests for any modem is adding calibrated noise and measuring the Bit Error Rate (BER). I tried Eb/No = 9dB (-5.7dB SNR), and obtained 1% BER, right on theory for a 2FSK modem:

 $ ./cohpsk_ch ~/Desktop/2fsk_100_rx_rpi.wav - -26 | ./fsk_demod 2 8000 100 - - | ./fsk_put_test_bits -
FSK BER 0.009076, bits tested 11900, bit errors 108
SNR3k(dB): -5.62

This line takes the sample wave file from the FT-817, adds some noise using the cohpsk_ch utility, then pipes the signal to the FSK demodulator, before counting the bit errors in the test frames. I adjusted the 3rd “No” parameter in cohpsk_ch by hand until I obtained about 1% BER, then checked the SNR against the theoretical SNR for an Eb/No of 9dB.

Here are some plots from the Octave version of the demodulator, with no channel noise applied. The first plot shows the time and frequency domain signal at the input of the demodulator. I set the shift at 800 Hz, so you can see one tone at 800 Hz,the other at 1600 Hz:

Here is the output the of the FSK demodulator filters (red and blue for the two filter outputs). We can see a nice separation, but the red “high” level is a bit lower than blue. Red is probably the 1600 Hz tone, the FT-817 has a gentle low pass filter in it’s output, reducing higher frequency tones by a few dB.

There is some modulation on the filter outputs, which I think is explained by the timing offset below:

The sharp jump at 160 samples is expected, that’s normal behaviour for modem timing estimators, where a sawtooth shape is expected. However note the undulation of the timing estimate as it ramps up, indicating the modem sample clock has a little jitter. I guess this is an artefact of rpitx. However the BER results are fine and the average sample clock offset (not shown) is about 50ppm which is better than many sound cards I have seen in use with FreeDV. Many of our previous modem transmitters (e.g. the first pass at Wenet) started with much larger sample clock offsets.

A common question about rpitx is “how clean is the spectrum”. Here is a sample from my Rigol DSA815, with a span of 1MHz around the 7.177 MHz tx frequency. The Tx power is actually 11dBm, but the marker was bouncing around due to FSK modulation. On a wider span all I can see are the harmonics one would expect of a square wave signal. Just like any other transmitter, these would be removed by a simple low pass filter.

So at 7.177 MHz it’s clean to the limits of my spec analyser, and exceeds spectral purity requirements (-43dBc + 10log(Pwatts)) for Amateur (and I presume other service) communications.

Prototyping FreeDV 2200

In the previous post on Codec 2 2200 I described a prototype speech codec for a new, higher quality FreeDV mode. In this post I describe the modem development and initial over the air tests.

Bandwidth, not SNR limited

Turns out I am RF bandwidth limited rather than SNR limited. My target bandwidth over the channel is 2000 Hz, as FreeDV needs to pass through the various crystal filters in Commercial Off the Shelf (COTS) SSB radios. Things get messy near the edge of those filters. I want to stick with QPSK for now as QAM means a 4dB hit in SNR at the same bit rate. By messing with my waveform design spreadsheet it turns out I can get between 2000 and 2400 bit/s of coded speech through the channel by reducing the FEC code rate. This is acceptable for high-ish SNR channels with light fading.

This gives us an AWGN channel operating point of 4dB SNR, so we have 6dB margin for fading if we design for an average channel SNR of 10dB. That feels about right.

For the speech quality target I wanted to used 20ms frame updates (the lower rate Codec 2 1300 and 700C modes use 40ms updates). However that was a bit of a squeeze. I messed about with Vector Quantisation for a while but the quality was slipping beneath the Codec 2 2400 bit/s target I had set for this project. Then I hit on the idea of 25ms updates, and that seems to work well. I can’t hear any difference between 20 and 25ms updates.

Note the interaction between the speech codec frame rate and the modem bandwidth. As the entire system is open source, I can adapt with the codec to help the modem. This is a very powerful engineering technique that is not available to teams using closed source codec or modem software.

FreeDV 2200, like FreeDV 1600 has some “unprotected bits”. The LDPC forward error correction only covers part of the frame. I’m not sure about this approach, and will talk to Bill, VK5DSP sometime about a lower rate code that protects the entire frame.

Initial tests

A digital voice system is complex. Building the real time C code for a new codec, FEC, modem stack can take a lot of time and effort. FreeDV 700D took around 1000 man-hours. So I tend to simulate the whole thing in GNU Octave first. This allows me to prototype the entire system with the least possible effort. If it works, I then port to C, and I have a working reference to test the C port against. The basic idea is find any bad news early.

I modified the Octave version of the modem to handle the different bit rate through changing the number of carriers, symbol rate, and FEC. I started by modifying the raw, uncoded OFDM modem simulations (ofdm_tx.m, ofdm_rx.m), then moved to the versions that include the LDPC FEC.

After a few days of careful programming, testing, and refactoring I was able to transmit and receive test frame using the prototype “FreeDV 2200” waveform. Such a surprise when something complex starts working! Engineers are not used to success, we spend most of our time chasing bugs.

Using the new waveform I ran some coded speech through simulated AWGN and HF channels at the calculated operating points. Ouch! That hurts. Some really high amplitude excursions in the decoded speech, especially when the frames are messed up by a fade or during initial synchronisation.

These sorts of noises used to be common in the early days of speech coding. In contrast, my recent codecs have a “softer” response to bit errors – the speech gets messed up, but not in a way that causes pain.

Error Masking

OK so my initial tests showed that even with 10dB average SNR, sometimes that nasty old HF channel wipes out the codec and it makes loud cracks. What to do? I don’t have any bandwidth left over for more FEC and don’t want to dramatically change the Codec design.

The first thing I did was change the variable bit rate allocation Huffman code to a fixed bit allocation, as explained in the Codec 2 2200 blog post. That helped. The next step was to spent a few days developing some “error masking” ideas. These don’t actually fix the problem, but make it sound more acceptable.

The LDPC decoder kindly tells us when it can’t successfully decode a frame. When that happens, the masking algorithm swings into action. We know some of the bits are rubbish, but not exactly which ones. An AGC algorithm adjusts the energy of the current frame to be similar to the last good one. This helps with the big spikes. If we get a burst of errors the level is smoothly reduced, effectively squelching the signal.

After a few days of tweaking I had this working OK. Here are some plots of the speech signal before and after masking:

Here are some plots that illustrate the masking algorithm. The top plot is the number of errors/frame, they appear in bursts with the channel fading. The lower plot is the energy of the decoded speech before and after masking. When there are no errors, the energy is the same. During an error burst we reduce the gain, exponentially dropping the level if an error burst continues.

Here are some speech samples so you can hear the effect of bit errors:

No errors Listen
HF Channel Listen
HF Channel Masking Listen
HF Channel SSB Listen
HF Channel FreeDV 1600 Listen

The FreeDV 1600 sample takes a while to sync, perhaps due to that deep fade at the start of the sample. The 2200 modem was derived from 700D, which can sync up at very low SNRs. Much to my annoyance the SSB sample doesn’t seem very bothered by the fading – a testament to robustness of SSB and why it’s so hard to beat. The first few seconds of the SSB sample are “down in the mud” and would probably be lost to untrained listeners.

Over the Air Testing with the Peters

So now I had simulations of every part of FreeDV 2200 ready to roll. The modem simulations work on wave files, which can be played over the air through real HF radios. This is a very important test, as real radios have crystal filters, power amplifiers, and audio filtering that can affect the passage of bits through them.

My first test was across the bench. I played the OFDM signal at low power through my IC-7200 tuned to 7.177MHz, and connected to an external dipole. I received the signal on my FT-817 without an antenna, while recording the signal using FreeDV. I took special care to make sure the IC-7200 wasn’t being over driven, adjusting the transmit drive so the ALC wasn’t moving at all.

The scatter diagram was a bit messed up, and SNR about 13dB, which is pretty typical of tests through real radios, even with no noise. I measure SNR from the scatter diagram, so any distortion in the signal will lower the SNR reported by FreeDV. An important figure of merit is the effect on Bit Error Rate (BER), which can be measured in dB. So I generated a transmit signal with additive noise that had an Eb/No of 3dB. When passed directly to the Rx script I measured a BER of 3%. When the same signal was passed through the radios, then to the Rx script, I measured a BER of 5%.

Looking at the PSK BER versus Eb/No curves, this is a difference of 1.2dB. The distortion of the path through the radios is equivalent to 1.2dB more noise in the system. That’s quite acceptable to me; given the wide bandwidth of the signal compared to the crystal filter skirts, PA distortion, likely rx overload as the radios are so close, and other real world factors.

The next step was some ground wave and DX tests. Peter, VK2PTM, happened to be staying with me. So we hopped in his van and set up a receive station about 4km away, where Peter, VK5APR joined us for the tests. We received a very nice signal on Peter’s (2TPM Peter that is) KX3, in fact a nicer scatter diagram than with my FT-817. Zero BER, despite just a few watts Tx power.

Peter has documented our FreeDV experiments on his blog.

For the final test, we radiated the same test signal from Adelaide about 700km to Peter, VK3RV, about 700km away in Sunbury, near Melbourne. Once again we were on 40M, and this time I was using a few 10’s or watts. Hard to tell exactly with these waveforms. Peter kindly recorded the FreeDV 2200 signal, and emailed it back to me.

You don’t have to be called Peter to be a FreeDV early adopter. But it helps.

This time there was some significant fading, and bursts of bit errors. The SNR averaged about 8dB over our 60 second sample but being a real HF channel the SNR wandered all over the place:

Note the deep fade at the start of the sample, which is also evident in the second plot of uncoded and coded errors above. The Scatter diagram is the classic cross shape, as the amplitudes of the QPSK symbols fade in and out. The channel noise is evident in the width of the cross.

When played back through the speech codec it worked OK (Listen), and sounds similar to the simulated HF channel samples above. The deep fades are gently muted and the overall effect was not unpleasant. Long distance HF conditions are poor here at the moment, so I didn’t get a chance to test the system on more benign HF channels.

I’m very pleased with these tests. It’s a very complicated system and so many things can go wrong when it hits the real world. The careful work I put into test and simulation has paid off. I am particularly happy with the 2000 Hz wide modem waveform making it through real radios and real channels as I felt that was a major risk.


These two posts have described the development to date of a new FreeDV mode. It’s a work in progress, and I feel another iteration on the speech codec quality and FEC would be useful. I’m quite pleased with the higher bit rate version of the OFDM modem. It would also be nice to get the current work into real time, so we can test it on air – using the open source principle of “release early and often”.

The development process for a new FreeDV waveform is not linear. I started with the codec, got a feel for the bit rate, played with the modem waveforms design spreadsheet, then headed back to the codec design. When I put the waveform through a simulated channels I found bit errors made the speech codec fall over so it was back to the codec for some rework.

However the cool thing about open source is that we can consider all these factors at the same time. If you are stuck with a closed source codec or modem you have far more limited options and will end up with poorer performance. Ok, I also have the skills to work on both, but that’s why I’m publishing these posts and the source code! It’s a great starting point for anyone else who wants to learn, or build on this work. Yayyy open source!

Help Wanted

An important next step is to put FreeDV 2200 into real time, so we can test with real over the air conversations. This means porting some code from Octave to C, unit tests, and a bunch of integration with the FreeDV API and FreeDV GUI program. Please contact me if you have C programming skills and would like to help. I’ll help you with the Octave side, and you’ll learn a lot!

If you can’t code, but would like to see a high quality FreeDV mode out there, please consider supporting the work using Patreon or PayPal.

Appendix – Command Lines for testing the new Waveform

Here are the command lines I used to test the new FreeDV waveform over the air. I tend to prototype in Octave, to minimise the amount of time it takes to get something over the air that I can listen to. A wrong turn can mean months of work, so an efficient process matters.

Use c2sim to generate files of Codec 2 model parameters from an input speech file:

$ ./c2sim ../../raw/vk5qi.raw --framelength_s 0.0125 --dump vk5qi_l --phase0 --lpc 10 --dump_pitch_e vk5qi_l_pitche.txt

Run an Octave script to encode model parameters to a 2200 (ish) bit/s second bit stream:

octave:40> newamp2_batch("../build_linux/src/vk5qi_l","no_output","mode", "enc", "bitstream", "vk5qi_2200_enc.c2");

Run an Octave script to generate OFDM modem transmitter audio. The payload data is known test frames at 2200 bit/s:

octave:41> ofdm_ldpc_tx("ofdm_ldpc_test.raw","2200",1,300);

Three hundred seconds are generated in this example, about 5 minutes.

This raw file can be “played” over the air through a HF SSB radio, then recorded as a wave file by a HF radio receiver. The received wave (or raw) file, is run through the OFDM modem receiver. Using the known test frame data, it measures the Bit Error rate (BER) and plots various statistics. It also generates a file of errors, that shows each bit that was received in error:

octave:42> ofdm_ldpc_rx('~/Desktop/ofdm_ldpc_test_awgn_sunbury001_rx_60.wav',"2200",1,"subury001_error.bin")

This C utility “corrupts” the Codec 2 bit stream using the file of errors, simulating the path of real Codec 2 bits through the channel:

$ ../build_linux/src/insert_errors vk5qi_2200_enc.c2 vk5qi_2200_sunbury001_dec.c2 subury001_error.bin

Now decode the bit stream to a set of Codec 2 model parameters:

octave:43> newamp2_batch("../build_linux/src/vk5qi_l","output_prefix","../build_linux/src/vk5qi_l_dec_sunbury001", "mode", "dec", "bitstream", "vk5qi_2200_sunbury001_dec.c2", "mask", "subury001_error.bin");

The “mask” parameter uses the error file to implement an error masking algorithm. In a real world system, the LDPC decoder would tell use which frames were bad, so this is a mild cheat to expedite development.

Finally, run C part of Codec 2 decoder to listen to the results:

$ ./c2sim ../../raw/vk5qi.raw --framelength_s 0.0125 --pahw vk5qi_l_dec_sunbury001 --hand_voicing vk5qi_l_dec_sunbury001_v.txt -o - | aplay -f S16

Reading Further

Codec 2 2200
modem waveform design spreadsheet
Steve Ports an OFDM modem from Octave to C

Bench Testing HF Radios with a HackRF

This post describes how we implemented a HF channel simulator to bench test a digital HF radio using modern SDRs.

Yesterday Mark and I bench tested a HF radio with calibrated SNR over simulated AWGN and HF channels. We recorded the radios transmit signal with an AirSpy HF and GQRX, added calibrated noise and “CCIR Poor” fading, and replayed the signal using a HackRF.

For the FreeDV 700C and 700D work I have developed a utility called cohpsk_ch, that takes a real modem signal, adds channel impairments like noise and fading, and outputs another real signal. It has a built in Hilbert Transformer so it can do complex math cleverness like small frequency shifts and ITUT/CCIR HF fading channel models.

Set Up

The basic idea is to upconvert a 8 kHz real sample file to HF in real time. I have some utilities to help with this in codec2-dev:

$ svn co codec2-dev
$ cd codec2-dev/octave
$ octave --no-gui
octave:1> cohpsk_ch_fading("../raw/fast_fading_samples.float", 8000, 1.0, 8000*60)
octave:2> cohpsk_ch_fading("../raw/slow_fading_samples.float", 8000, 0.1, 8000*60)
$ exit
$ cd ..
$ cd codec2-dev && mkdir build_linux && cd build_linux
$ cmake -DCMAKE_BUILD_TYPE=Debug ..
$ make
$ cd unittest 

You also need GNU Octave to generate the HF fading files for cohpsk_ch, and you need to install the very useful CSDR tools.

Connect the HackRF to your SSB receiver, we put a 30dB attenuator in line. Tune the radio to 7.177 MHz LSB. First generate a carrier with your HackRF, offset so we get a 500Hz tone in the SSB radio in LSB mode:

$ hackrf_transfer -f 7176500 -s 8000000 -c 127

Now lets try some DSB audio:

$ cat ../../wav/ve9qrp.wav | csdr mono2stereo_s16 | ./tsrc - - 10 -c |
./tlininterp - - 100 -df | hackrf_transfer -f 5177000 -s 8000000  -t - 2>/dev/null

Don’t change the frequency, but try switching the mode between USB and LSB. Should sound about the same, with a slight frequency offset due to the HackRF. Note that HackRF is tuned to Fs/4 = 2MHz beneath 7.177MHz. “tlininterp” has a simple Fs/4 mixer that we use to shift the signal away from the HackRF DC spike. We up-sample from 8 kHz to 8 MHz in two steps to save MIPs.

The “csdr mono2stereo_s16” just repeats the real output samples, so we get a DSB signal at HF. A bit lazy I know, a better approach would be to modify cohpsk_ch to have a complex output option. Let me know if you want to modify cohpsk_ch – I can tell you how.

Checking Calibration

Now I’m pretty confident that cohpsk_ch works well at baseband on digital signals as I have used it extensively in my HF DV work. However I wanted to make sure the off air signal had the correct SNR.

To check the calibration, we generated a 1000 Hz sine wave Signal + Noise signal:

$ ./mksine - 1000 30  | ./../src/cohpsk_ch - - -30 --Fs 8000 --ssbfilt 0 | csdr mono2stereo_s16 | ./tsrc - - 10 -c | ./tlininterp - - 100 -df | hackrf_transfer -f 12177000 -s 8000000  -t - 2>/dev/null 

Then just a noise signal:

cat /dev/zero | ./../src/cohpsk_ch - - -30 --Fs 8000 --ssbfilt 0 | csdr mono2stereo_s16 | ./tsrc - - 10 -c | ./tlininterp - - 100 -df | hackrf_transfer -f 5177000 -s 8000000  -t - 2>/dev/null

With moderate SNRs (say 10dB), Signal + Noise power is roughly Signal power. So I measured the off air power of the above signals using my FT817 connected to a USB sound card, and an Octave script:

$ rec -t raw -r 8000 -s -2 -c 1 - -q | octave --no-gui -qf power_from_stdio.m

I used alsamixer and the plots from the script to make sure I wasn’t overloading the ADC. You need to turn your receiver AGC OFF, and adjust RF/AF gain to get the levels right.

However from the FT817 I was getting results a few dB off due to the crystal filter bandwidth and non-rectangular shape factor. Mark hooked up his AirSpy HF and GQRX, and we piped the received audio over the LAN to the script:

nc -ul 7355 | octave --no-gui -qf power_from_stdio.m

GQRX had a nice flat response from a few 100 Hz to 3kHz, the same bandwidth cohpsk_ch uses for SNR measurement. OK, so now we had sensible numbers, within 0.2dB of the SNR reported by cohpsk_ch. We moved the levels up and down 3dB, made sure everything was repeatable and linear. We went down to 0dB, where signal and noise power is the same, and Signal+Noise power should be 3dB more than Noise alone. Check.


Then we could play the HF tx signal at a variety of SNRS, by tweaking third (No) argument. In this case we set No to -100dB, so no noise:

cat tx_file_from_radio.wav | ./../src/cohpsk_ch - - -100 --Fs 8000 --ssbfilt 0 | csdr mono2stereo_s16 | ./tsrc - - 10 -c | ./tlininterp - - 100 -df | hackrf_transfer -f 5177000 -s 8000000  -t - 2>/dev/null

At the end of the cohpsk_ch run, it will print the SNR is has measured. So you read that and tweak No as required to get the SNR you need. In our case around -30 was 8dB SNR. You can also add fast (–fast) or slow (–slow) fading, here is a fast fading run at about 2dB SNR:

cat tx_file_from_radio.wav | ./../src/cohpsk_ch - - -24 --Fs 8000 --ssbfilt 0 --fast | csdr mono2stereo_s16 | ./tsrc - - 10 -c | ./tlininterp - - 100 -df | hackrf_transfer -f 5177000 -s 8000000  -t - 2>/dev/null

The “–ssbfilt 0” option switches off the 300-2600 Hz filter inside cohpsk_ch, that is used to simulate a SSB radio crystal filter. For out tests, the modem waveform was too wide for that filter.


I guess we could also have used the HackRF to sample the signal. The nice thing about SDRs is the frequency response is ‘flat”, no crystal filters messing things up.

The only thing we weren’t sure about was the sample rate and frequency offset accuracy of the HackRF, for example if the sample clock was a bit off that might upset modems.

The radio we tested delivered performance pretty much on it’s data sheet at the SNRs tested, giving us extra confidence in the bench testing system described here.

Reading Further

Measuring SDR Noise Figure in Real Time
High Speed Balloon Data Link, here we bench test a UHF FSK data radios
README_ofdm.txt, Lots of examples of using cohpsk_ch to test the latest and greatest OFDM modem.
PathSim is a very nice Windows GUI HF path simulator, that runs well on Linux using Wine.

Solar Boat

Two years ago when I bought my Hartley TS16 sail boat I dreamed of converting it to solar power. In January I installed a Torqueedo electric outboard and a 24V, 100AH Lithium battery back. That’s working really well. Next step was to work out a way to mount some surplus 200W solar panels on the boat. The idea is to (temporarily) detach the mast, and use the boat on the river Murray, a major river that passes within 100km of where I live in Adelaide, South Australia.

Over the last few weeks I worked with my friend Gary (VK5FGRY) to mount solar panels on the TS16. Gary designed and fabricated some legs from 40mm square aluminium:

With a matching rubber foot on each leg, the panels sit firmly on the gel coat of the boat, and are held down by ropes or octopus straps.

The panels maximum power point is at 28.5V (and 7.5A) which is close to the battery pack under charge (3.3*8 = 26.4V) so I decided to try a direct DC connection – no inverter or charger. I ran some tests in the back yard: each panel was delivering about 4A into the battery pack, and two in parallel delivered about 8A. I didn’t know solar panels could be connected in parallel, but happily this means I can keep my direct DC connection. Horizontal panels costs a few amps – a good example of why solar panels are usually angled at the sun. However the azimuth of the boat will be always changing so horizontal is the only choice. The panels are very sensitive to shadowing; a hand placed on a panel, or a small shadow is enough to drop the current to 0A. OK, so now I had a figure for panel output – about 4A from each panel.

This didn’t look promising. Based on my sea voyages with the Torqueedo, I estimated I would need 800W (about 30A) to maintain my target houseboat speed of 4 knots (7 km/hr); that’s 8 panels which won’t ft on my boat! However the current draw on the river might be different without tides, and waves, and I wasn’t sure exactly how many AH I would get over a day from the sun. Would trees on the river bank shadow the panels?

So it was off to Younghusband on the Murray, where our friend Chris (VK5CP) was hosting a bunch of Ham Radio guys for an extended Anzac day/holiday weekend. It’s Autumn here, with generally sunny days of about 23C. The sun is up from from 6:30am to 6pm.

Turns out that even with two panels – the solar boat was really practical! Over three days we made three trips of 2 hours each, at speeds of 3 to 4 knots, using only the panels for charging. Each day I took friends out, and they really loved it – so quiet and peaceful, and the river scenery is really nice.

After an afternoon cruise I would park the boat on the South side of the river to catch the morning sun, which in Autumn appears to the North here in Australia. I measured the panel current as 2A at 7am, 6A at 9am, 9A at 10am, and much to my surprise the pack was charged by 11am! In fact I had to disconnect the panels as the cell voltage was pushing over 4V.

On a typical run upriver we measured 700W = 4kt, 300W = 3.1kt, 150W = 2.5kt, and 8A into the panels in full sun. Panel current dropped to 2A with cloud which was a nasty surprise. We experienced no shadowing issues from trees. The best current we saw at about noon was 10A. We could boost the current by 2A by putting three guys on one side of the boat and tipping the entire boat (and solar panels) towards the sun!

Even partial input from solar can have a big impact. Lets say at 4 knots (30A) I can drive for 2 hours using 60% of my 100AH pack. If I back off the speed a little, so I’m drawing 20A, then 10A from the panels will extend my driving time to 6 hours.

I slept on the boat, and one night I found a paddle steamer (the Murray Princess) parked across the river from me, all lit up with fairy lights:

On our final adventure, my friend Darin (VK5IX) and I were entering Lake Carlet, when suddenly the prop hit something very hard, “crack crack crack”. My poor prop shaft was bent and my propeller is wobbling from side to side:

We gently e-motored back and actually recorded our best results – 3 knots on 300W, 10A from the panels, 10A to the motor.

With 4 panels I would have a very practical solar boat, capable of 4-6 hours cruising a day just on solar power. The 2 extra panels could be mounted as a canopy over the rear of the boat. I have an idea about an extended solar adventure of several days, for example 150km from Younghusband to Goolwa.

Reading Further

Engage the Silent Drive
Lithium Cell Amp Hour Tester and Electric Sailing

Lithium Cell Amp Hour Tester and Electric Sailing

I recently electrocuted my little sail boat. I built the battery pack using some second hand Lithium cells donated by my EV. However after 8 years of abuse from my kids and I those cells are of varying quality. So I set about developing an Amp-Hour tester to determine the capacity of the cells.

The system has a relay that switches a low value power resistor (OK some coat hanger wire) across the 3.2V cell terminals, loading it up at about 27A, roughly the cruise current for my e-boat. It’s about 0.12 ohms once it heats up. This gets too hot to touch but not red hot, it’s only 86W being dissipated along about 1m of wire. When I built my EV I used the coat hanger wire load trick to test 3kW loads, that was a bit more exciting!

The empty beer can in the background makes a useful insulated stand off. Might need to make more of those.

When I first installed Lithium cells in my EV I developed a charge controller for my EV. I borrowed a small part of that circuit; a two transistor flip flop and a Battery Management System (BMS) module:

Across the cell under test is a CM090 BMS module from EV Power. That’s the good looking red PCB in the photos, onto which I have tacked the circuit above. These modules have a switch than opens when the cell voltage drops beneath 2.5V.

Taking the base of either transistor to ground switches on the other transistor. In logic terms, it’s a “not set” and “not reset” operation. When power is applied, the BMS module switch is closed. The 10uF capacitor is discharged, so provides a momentary short to ground, turning Q1 off, and Q2 on. Current flows through the automotive relay, switching on the load to the battery.

After a few hours the cell discharges beneath 2.5V, the BMS switch opens and Q2 is switched off. The collector voltage on Q2 rises, switching on Q1. Due to the latching operation of the flip flip – it stays in this state. This is important, as when the relay opens, the cell will be unloaded and it’s voltage will rise again and the BMS module switch will close. In the initial design without a flip flop, this caused the relay to buzz as the cell voltage oscillated about 2.5V as the relay opened and closed! I need the test to stop and stay stopped – it will be operating unattended so I don’t want to damage the cell by completely discharging it.

The LED was inserted to ensure the base voltage on Q1 was low enough to switch Q1 off when Q2 was on (Vce of Q2 is not zero), and has the neat side effect of lighting the LED when the test is complete!

In operation, I point a cell phone taking time lapse video of the LED and some multi-meters, and start the test:

I wander back after 3 hours and jog-shuttle the time lapse video to determine the time when the LED came on:

The time lapse feature on this phone runs in 1/10 of real time. For example Cell #9 discharged in 12:12 on the time lapse video. So we convert that time to seconds, multiply by 10 to get “seconds of real time”, then divide by 3600 to get the run time in hours. Multiplying by the discharge current of 27(ish) Amps we get the cell capacity:

  12:12 time lapse, 27*(12*60+12)*10/3600 = 55AH

So this cells a bit low, and won’t be finding it’s way onto my boat!

Another alternative is a logging multimeter, one could even measure and integrate the discharge current over time. or I could have just bought or borrowed a proper discharge tester, but where’s the fun in that?


It was fun to develop, a few Saturday afternoons of sitting in the driveway soldering, occasional burns from 86W of hot wire, and a little head scratching while I figured out how to take the design from an expensive buzzer to a working circuit. Nice to do some soldering after months of software based DSP. I’m also happy that I could develop a transistor circuit from first principles.

I’ve now tested 12 cells (I have 40 to work through), and measured capacities of 50 to 75AH (they are rated at 100AH new). Some cells have odd behavior under load; dipping beneath 3V right at the start of the test rather than holding 3.2V for a few hours – indicating high internal resistance.

My beloved sail e-boat is already doing better. Last weekend, using the best cells I had tested at that point, I e-motored all day on varying power levels.

One neat trick, explained to me by Matt, is motor-sailing. Using a little bit of outboard power, the boat overcomes hydrodynamic friction (it gets moving in the water) and the sail is moved out of stall (like an airplane wing moving to just above stall speed). This means to boat moves a lot faster than under motor or sail alone in light winds. For example the motor was registering just 80W, but we were doing 3 knots in light winds. This same trick can be done with a stink-motor and dinosaur juice, but the e-motor is completely silent, we forgot it was on for hours at a time!

Reading Further

Electric Car BMS Controller
New Lithium Battery Pack for my EV
Engage the Silent Drive
EV Bugs

Measuring SDR Noise Figure in Real Time

I’m building a sensitive receiver for FreeDV 2400A signals. As a first step I tried a HackRF with an external Low Noise Amplifier (LNA), and attempted to measure the Noise Figure (NF) using the system Mark and I developed two years ago.

However I was getting results that didn’t make sense and were not repeatable. So over the course of a few early morning sessions I came up with a real time NF measurement system, and wrinkled several bugs out of it. I also purchased a few Airspy SDRs, and managed to measure NF on them as well as the HackRF.

It’s a GNU Octave script called nf_from_stdio.m that accepts a sample stream from stdio. It assumes the signal contains a sine wave test tone from a calibrated signal generator, and noise from the receiver under test. By sampling the test tone it can establish the gain of the receiver, and by sampling the noise spectrum an estimate of the noise power.

The script can be driven from command line utilities like hackrf_transfer or airspy_rx or via software receivers like gqrx that can send SSB-demodaulted samples over UDP. Instructions are at the top of the script.


I’m working from a home workbench, with rudimentary RF skills, a strong signal processing background and determination. I do have a good second hand signal generator (Marconi 2031), that cost AUD$1000 at a Hamfest, and a Rigol 815 Spec An (generously donated by Mel K0PFX, and Jim, N0OB) to support my FreeDV work. Both very useful and highly recommended. I cross-checked the sig-gen calibrated output using an oscilloscope and external attenuator (within 0.5dB). The Rigol is less accurate in amplitude (1.5dB on its specs), but useful for relative measurements, e.g. comparing cable attenuation.

For the NF test method I have used a calibrated signal source is required. I performed my tests at 435MHz using a -100dBm carrier generated from the Marconi 2031 sig-gen.

Usage and Results

The script accepts real samples from a SSB demod, or complex samples from an IQ source. Tune your receiver so that the sinusoidal test tone is in the 2000 to 4000 Hz range as displayed on Fig 2 of the script. In general for minimum NF turn all SDR gains up to maximum. Check Fig 1 to ensure the signal is not clipping, reduce the baseband gain if necessary.

Noise is measured between 5000 and 10000 Hz, so ensure the receiver passband is flat in that region. When using gqrx, I drag the filter bandwidth out to 12000 Hz.

The noise estimates are less stable than the tone power estimate, leading to some sample/sample variation in the NF estimate. I take the median of the last five estimates.

I tried supplying samples to nf_from_stdio using two methods:

  1. Using gqrx in UDP mode to supply samples over UDP. This allows easy tuning and the ability to adjust the SDR gains in real time, but requires a few steps to set up
  2. Using a “single” command line approach that consists of a chain of processing steps concatenated together. Once your signal is tuned you can start the NF measurements with a single step.

Instructions on how to use both methods are at the top of nf_from_stdio.m

Here are some results using both gqrx and command line methods, with and without an external (20dB gain/1dB NF) LNA. They were consistent across two laptops.

SDR Gqrx LNA Cmd Line LNA Cmd Line no LNA
AirSpy Mini 2.0 2.2 7.9
AirSpy R2 1.7 1.7 7.0
HackRF One 2.6 3.4 11.1

The results with LNA are what we would expect for system noise figures with a good LNA at the front end.

The “no LNA” Airspy NF results are curious – the Airspy specs state a NF of just 3.5dB. So we contacted Airspy via Twitter and email to see how they measured their stated NF. We haven’t received a response to date. I posted to the Airspy mailing list and one gentleman (Dave – WØLEV) kindly replied and has measured noise figures of 4dB using calibrated noise sources and attenuators.

Looking into the data sheets for the Airspy, it appears the R820T tuner at the front end of the Airspy has a NF of 3.5dB. However a system NF will always be worse than the first device, as other devices (e.g. the ADC) also inject noise.

Other possibilities for my figures are measurement error, ambient noise sources at my site, frequency dependent NF, or variations in individual R820T samples.

In our past work we have used Bit Error Rate (BER) results as an independent method of confirming system noise figure. We found a close match between theoretical and measured BER when testing with and without a LNA. I’ll be repeating similar low level BER tests with FreeDV 2400A soon.

Real Time Noise Figure

It’s really nice to read the system noise figure in real time. For example you can start it running, then experiment with grounding, tightening connectors, or moving the SDR away from the laptop, or connect/disconnect a LNA in real time and watch the results. Really helps catch little issues in these difficult to perform tests. After all – we are measuring thermal noise, a very weak signal.

Some of the NF problems I could find and remove with a real time measurement:

  • The Airspy mini is nearly 1dB worse on the front left USB port than the rear left USB port on my X220 Thinkpad!
  • The Airspy mini really likes USB extension cables with ferrite clamps – without the ferrite I found the LNA was ineffective in reducing the NF – being swamped by conducted laptop noise I guess.
  • Loose connectors can make the noise figure a few dB worse. Wiggle and tighten them all.
  • Position of SDR/LNA near the radio and other bench equipment.
  • My magic touch can decrease noise figure! Grounding effect I guess?

Development Bugs

I had to work through several problems before I started getting sensible numbers. This was quite discouraging for a while as the numbers were jumping all over the place. However its fair to say measuring NF is a tough problem. From what I can Google its an uncommon measurement for people in home workshops.

These bugs are worth mentioning as traps for anyone else attempting home NF measurements:

  1. Cable loss: I found a 1.5dB loss is some cable I was using between the sig gen and the SDR under test. I Measured the loss by comparing a few cables connected between my sig gen and spec an. While the 815 is not accurate in terms of absolute calibration (rated at 1.5dB), it can still be used for comparative measurements. The cable loss can be added to the calculations or just choose a low loss cable.
  2. Filter shape: I had initially placed the test tone under 1000Hz. However I noticed that the gqrx signal had a few dB of high pass filtering in this region (Fig 2 below). Not an issue for regular USB demodulation, but a few dB really matters for NF! So I moved the test tone to the 2-4kHz region where the gqrx output was nice and flat.
  3. A noisy USB port, especially without a clamp, on the Airspy Mini (photo below). Found by trying different SDRs and USB ports, and finally a clamp. Oh Boy, never expected that one. I was connecting the LNA and the NF was stuck at 4dB – swamped by noise from the USB Port I guess.
  4. Compression: Worth checking the SDR output is not clipped or in compression. I adjusted the sig gen output up and down 3dB, and checked the power estimate from the script changed by 3dB. Also worth monitoring Fig 1 from the script, make sure it’s not hitting the limits. The HackRF needed it’s baseband gain reduced, but the Airspys were OK.
  5. I used latest Airspy tools built from source (rather than Ubuntu 17 package) to get stdout piping working properly and not have other status information from printfs injected into the sample stream!


Thanks Mark, for the use of your RF hardware, and I’d also like to mention the awesome CSDR tools and fantastic gqrx software – both very handy for SDR work.