Reducing FDMDV Modem Memory

For the SM1000 (SmartMic) project I need to run the FDMDV HF modem on the STM32F4 micro-controller. However The STM32F4 only has 192k of internal RAM, and the modem in it’s original form uses over 400k. I wrote a unit test to break down the memory usage:

david@bear:~/tmp/codec2-dev/build_dir$ ./unittest/fdmdv_mem
struct FDMDV..........: 409192
prev_tx_symbols.......: 168
tx_filter_memory......: 1008
phase_tx..............: 168
freq..................: 168
pilot_lut.............: 5120
pilot_baseband1.......: 1840
pilot_baseband2.......: 1840
pilot_lpf1............: 5120
pilot_lpf2............: 5120
S1....................: 2048
S2....................: 2048
phase_rx..............: 168
rx_filter_memory......: 161280
rx_filter_mem_timing..: 3360
rx_baseband_mem_timing: 215040
phase_difference......: 168
prev_rx_symbols.......: 168
fft_buf...............: 4096
kiss_fft_cfg..........: 4

Looks like the memory usage is dominated by just two arrays. This modem has a rather high “over sampling rate”, M. The symbol (or baud) rate of each carrier is just 50 Hz. However the sample rate at the output is 8000 Hz, which gives M = 8000/50=160. We tend to do our processing at the symbol rate. This means that for every symbol we process, we need 160 samples. If we are running at 1600 bit/s there are 17 carriers (one is used for sync and carries no data). For some operations (like filtering), we need a record of the last 5 symbols. Each sample is 8 bytes as it’s a complex number (cos and sin sample) with two floats. So it’s easy to chew up say (160)(17)(5)(8)=108,800 bytes in a single array.

However the are many ways to implement a modem. By re-arranging the processing steps we can sometimes save memory and/or MIPs, or trade them off to get the combination we want for our target platform.

New Re-sampler

The first step was to change the re-sampling algorithm. The demodulator (figure below) has a timing estimation routine that works out the best place to sample a received symbol. This is important for modem performance, if you don’t sample at the right place you get noisy symbols and more errors. Often the position we wish to sample is half way between two existing samples. So we need a way to work out the value of a sample between two existing samples.

The original algorithm re-ran the 5 symbol filter at the optimal timing point. As we have oversampled by a large factor, we can choose any of the M=160 output samples per symbol. However this meant keeping a large memory buffer (rx_baseband_mem_timing) above. Instead, I implemented a linear re-sampler. This just fits a straight line between two existing filtered samples, then uses the year 8 equation of a straight line y=mx + c to find the new sample y at the optimum timing instant x.

The scatter diagram is a good way to evaluate the new, linear re-sampler. The points should be nice and tight. Here are the scatter diagrams for the original filter based re-sampler, followed by the linear re-sampler.

Any modem changes like this can have a big impact on Bit Error Rate (BER) performance, so we need to test very carefully. I used the Octave fdmdv_ut.m to measure the BER at the nominal operating point SNR of 4dB:

octave:3> fdmdv_ut
Bits/symbol.: 2
Num carriers: 14
Bit Rate....: 1400 bits/s
Eb/No (meas): 7.30 (8.20) dB
bits........: 2464
errors......: 16
BER.........: 0.0065
PAPR........: 13.02 dB
SNR...(meas): 3.99 (5.51) dB

The BER is about the same as the previous version so all good. Sixteen errors isn’t very many, so I also tested at a lower Eb/No (SNR) to get more than 100 errors.

Storing One Signal Instead of 17

The next array to tackle was rx_filter_memory. The demod takes the FDM modem signal centred on 1500 Hz, then downconverts it to 17 parallel baseband signals, one for each carrier. To make matters worse we need to store 5 symbols worth of each baseband signal. This uses a lot of storage. So I changed the processing steps to:

  1. Keep 5 symbols worth of demod input samples (the FDM signal centred on 1500 Hz). This is just one signal, rather than 17.
  2. For each carrier, downconvert all 5 symbols to baseband, filter, then throw away the baseband signal, freeing the memory.

This is wasteful of CPU, as we downconvert 5 symbols worth (5*160 samples), rather than just the 160 new samples. However it saves a lot of memory. The final memory breakdown uses 10% of the original:

david@bear:~/tmp/codec2-dev$ ./build_dir/unittest/fdmdv_mem
struct FDMDV..........: 41916
prev_tx_symbols.......: 168
tx_filter_memory......: 1008
phase_tx..............: 168
freq..................: 168
pilot_lut.............: 5120
pilot_baseband1.......: 1840
pilot_baseband2.......: 1840
pilot_lpf1............: 5120
pilot_lpf2............: 5120
S1....................: 2048
S2....................: 2048
phase_rx..............: 168
rx_fdm_mem............: 8960
rx_filter_mem_timing..: 3360
phase_difference......: 168
prev_rx_symbols.......: 168
fft_buf...............: 4096
kiss_fft_cfg..........: 4

Automated Testing

I use a set of automated tests to check the C and Octave versions are identical. First I run the C version, which processes 25 frames of modem data and logs all of the internal states as Octave-format vectors which are saved to a text file.

david@bear:~/tmp/codec2-dev/unittest$ pushd ../build_dir/ && make && popd && ../build_dir/unittest/tfdmdv

The command line is a bit tricky as the new Cmake build system builds out of the source tree, and my program needs to run from the unittest directory in the source tree. Next we run the Octave version, which should be identical. Some automated tests make sure they are. Actually as they are two different float implementations I make sure the total error between a C and Octave vector is within 1 part in 1000:

octave:4> tfdmdv
tx_bits..................: OK
tx_symbols...............: OK
tx_baseband..............: OK
tx_fdm...................: OK
pilot_lut................: OK
pilot_coeff..............: OK
pilot lpf1...............: OK
pilot lpf2...............: OK
S1.......................: OK
S2.......................: OK
foff_coarse..............: OK
foff_fine................: OK
foff.....................: OK
rx filt..................: OK
env......................: OK
rx_timing................: OK
rx_symbols...............: OK
rx bits..................: OK
sync bit.................: OK
sync.....................: OK
nin......................: OK
sig_est..................: OK
noise_est................: OK
 
passes: 23 fails: 0

The Octave version also plots the C and Octave states (vectors) against each other, so you can work out what went wrong. This one is the output of the rx filter, which was the site of my recent rewiring:

You may be interested in software development “process”, or delegating software work to teams of people with varying skills levels, or making a profit from software, or a business that requires software development to not destroy said business. I have used similar, bit exact, schemes for fixed point DSP development, which I whimsically describe in this allegorical tale.

These automated tests give me a lot of confidence. So many things can go wrong with a complex system. So as painful as it is, it’s worthwhile to have some quality “gates” wherever you can. Now when I move to the STM32F4 micro-controller, I can be reasonably sure there aren’t any algorithm or C porting bugs. Reasonably sure.

Lets Do the Time Warp Again

Check out this little snippet of code:

        /*
          now downconvert using current freq offset to get Nfilter+nin
          baseband samples.
    
                     Nfilter             nin
          |--------------------------|---------|
                                      |
                                  phase_rx(c)
    
          This means winding phase(c) back from this point to ensure
          phase continuity.
        */
 
        windback_phase           = -freq_pol[c]*NFILTER;
        windback_phase_rect.real = cos(windback_phase);
        windback_phase_rect.imag = sin(windback_phase);
        phase_rx[c]              = cmult(phase_rx[c],windback_phase_rect);

There are a bunch of local oscillators that are defined by their current phase phase_rx[c], and frequency freq[c]. To calculate the next oscillator output sample we increment the current phase by the frequency. That’s how you make an oscillator in software. It’s normally all done in rectangular coordinates (real and imaginary parts, cos and sin, or in-phase and quadrature depending on where you went to school) as it’s easy on the CPU.

Now, we need to down-convert the signal from 5 symbols (960 samples) in the past. So we need to “wind” the oscillator phase backwards to where it was 960 samples ago. Think of the phase like a hand on a clock. We normally increment it a few minutes for every sample but now we want to wind it back several “days”.

Now that I can send oscillators backwards in time I’ll get to work on that warp drive. Or breaking the Shannon limit. Now that would be useful for these HF channels.

Is There a Better Way?

This work took me about an hour of creative thinking (the fun bit) and several days of implementation pain, off by one errors, fighting to understand filter memories (again), and tracking down differences between the Octave and C versions.

I can’t help wondering if there is an easier way, like snapping something together in GNU radio, or some better way of expressing digital filters in software. A more domain-specific language, or some set of functions that take care of filter memories being shifted so my head doesn’t hurt any more. If the code is easier to understand is would be more hack-able too, and maintainable, and be a better tool for teaching.

I have tried various “toolkit” ideas over the years. They were all going to make DSP software development painless and fast. These sorts of tools are great fun to implement but the idea of a near-zero effort implementation utopia tends to break down somewhere in the messy details. So here I am still grinding away with Octave and C, like I have been since 1990.

Or maybe it’s just hard, and it isn’t going to get any easier? Like hacking the Linux kernel? I mean, I am trying to replace SSB …..

8 comments to Reducing FDMDV Modem Memory

  • Hi David,
    I like to read posts like this one, although my RF knowlegment is still limited.
    I think your initial idea is “just” to replace a SSB tx/rx hardware, but based on this post I figured out your work could open door to development of an open-source USB modem (QAM V.34).
    Currently an analog modem is not so useful to use in big city, but to far away small cities that could be the only option.

  • John

    Hi David,

    You ask about tools that work. I have always been a much lower level kind of guy.

    Right now I am working with an RF chip that the mfgr says is ‘bullet’ proof. Testing going OK, but….

    I built a ‘crystal’ set and hooked it to a digital oscope. And guess what several ‘curiosities’. Not a surprise to you I am sure. But my head hurts too when I wonder how many more surprises there are when I measure this chip in a yet different way. One looks like a burst of RF crap every so often!

    I like what John Lobb said while at the helm of Northern Electric. “The product is sold when the warranty expires.”

    Your work is most impressive :) .

    One suggestion would be to add a ‘ping’ command so that if I leave my gear running you can test against it with no real-time operator required on this end. That can eventually be extended to automatic download and test new features, etc.

    Are we having fun yet? :) :)

    John

  • Hi David,
    In fact it still used in some small cities of Brazil, but last time I remember using a dial-up modem was about 14 years ago. The price of a commercial USB modem is really low (TRENDnet TFM-561U cost less than $30 and works fine on Linux and *BSD: http://daemonforums.org/showthread.php?t=7709).
    Then developing a USB modem open-source should be just out of curiosity or as a proof of concept. I think you are a HAM, could you please tell me whether there are better alternative to transfer data to other people over radio? []s, Alan

  • Matthew

    Tool kits can make people lazy and prevent them from learning the fundamentals. I’ve seen too many recent Engineering graduates adept in using tool kits but without a good grounding in the fundamentals and how to apply them. Eventually they come unstuck with problems they can’t solve.

    I certainly enjoy reading and learning from your experiences. I think I’ve said before that there are no good books, courses or references in the practical implementation of DSP. Having your blog and checking it regularly is helping many of us expand our horizons and knowledge. Even this Engineer is now making progress with his difficult to spell Goertzal algorithm after learning the necessary tricks ;-)

    I don’t think it matters if you’re working on DSP or Embedded firmware the basic problems and they methods to solve them remain the same. Much like the C-language really (*grin*).

    Keep up the excellent work !

    /M.

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>