AMBE+2 and MELPe 600 Compared to Codec 2

Yesterday I was chatting on the #freedv IRC channel, and a good question was asked: how close is Codec 2 to AMBE+2 ? Turns out – reasonably close. I also discovered, much to my surprise, that Codec 2 700C is better than MELPe 600!


Original AMBE+2 3000 AMBE+ 2400 Codec 2 3200 Codec 2 2400
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Original MELPe 600 Codec 2 700C
Listen Listen Listen
Listen Listen Listen
Listen Listen Listen
Listen Listen Listen
Listen Listen Listen
Listen Listen Listen
Listen Listen Listen
Listen Listen Listen
Listen Listen Listen

Here are all the samples in one big tar ball.


I don’t have a AMBE or MELPe codec handy so I used the samples from the DVSI and DSP Innovations web sites. I passed the original “DAMA” speech samples found on these sites through Codec 2 (codec2-dev SVN revision 3053) at various bit rates. Turns out the DAMA samples were the same for the AMBE and MELPe samples which was handy.

These particular samples are “kind” to codecs – I consistently get good results with them when I test with Codec 2. I’m guessing they also allow other codecs to be favorably demonstrated. During Codec 2 development I make a point of using “pathological” samples such as hts1a, cg_ref, kristoff, mmt1 that tend to break Codec 2. Some samples of AMBE and MELP using my samples on the Codec 2 page.

I usually listen to samples through a laptop speaker, as I figure it’s close to the “use case” of a PTT radio. Small speakers do mask codec artifacts, making them sound better. I also tried a powered loud speaker with the samples above. Through the loudspeaker I can hear AMBE reproducing the pitch fundamental – a bass note that can be heard on some males (e.g. 7), whereas Codec 2 is filtering that out.

I feel AMBE is a little better, Codec 2 is a bit clicky or impulsive (e.g. on sample 1). However it’s not far behind. In a digital radio application, with a small speaker and some acoustic noise about – I feel the casual listener wouldn’t discern much difference. Try replaying these samples through your smart-phone’s browser at an airport and let me know if you can tell them apart!

On the other hand, I think Codec 2 700C sounds better than MELPe 600. Codec 2 700C is more natural. To my ear MELPe has very coarse quantisation of the pitch, hence the “Mr Roboto” sing-song pitch jumps. The 700C level is a bit low, an artifact/bug to do with the post filter. Must fix that some time. As a bonus Codec 2 700C also has lower algorithmic delay, around 40ms compared to MELPe 600’s 90ms.

Curiously, Codec 2 uses just 1 voicing bit which means either voiced or unvoiced excitation in each frame. xMBE’s claim to fame (and indeed MELP) over simpler vocoders is the use of mixed excitation. Some of the spectrum is voiced (regular pitch harmonics), some unvoiced (noise like). This suggests the benefits of mixed excitation need to be re-examined.

I haven’t finished developing Codec 2. In particular Codec 2 700C is very much a “first pass”. We’ve had a big breakthrough this year with 700C and development will continue, with benefits trickling up to other modes.

However the 1300, 2400, 3200 modes have been stable for years and will continue to be supported.

Next Steps

Here is the blog post that kicked off Codec 2 – way back in 2009. Here is a video of my 2012 Codec 2 talk that explains the motivations, IP issues around codecs, and a little about how Codec 2 works (slides here).

What I spoke about then is still true. Codec patents and license fees are a useless tax on business and stifle innovation. Proprietary codecs borrow as much as 95% of their algorithms from the public domain – which are then sold back to you. I have shown that open source codecs can meet and even exceed the performance of closed source codecs.

Wikipedia suggests that AMBE license fees range from USD$100k to USD$1M. For “one license fee” we can improve Codec 2 so it matches AMBE+2 in quality at 2400 and 3000 bit/s. The results will be released under the LGPL for anyone to use, modify, improve, and inspect at zero cost. Forever.

Maybe we should crowd source such a project?

Command Lines

This is how I generated the Codec 2 wave files:

~/codec2-dev/build_linux//src/c2enc 3200 9.wav - | ~/codec2-dev/build_linux/src/c2dec 3200 - - | sox -t raw -r 8000 -s -2 - 9_codec2_3200.wav


DVSI AMBE sample page

DSP Innovations, MELPe samples. Can anyone provide me with TWELP samples from these guys? I couldn’t find any on the web that includes the input, uncoded source samples.

Codec 2 700C and Short LDPC Codes

In the last blog post I evaluated FreeDV 700C over the air. This week I’ve been simulating the use of short LDPC FEC codes with Codec 2 700C over AWGN and HF channels.

In my HF Digital Voice work to date I have shied away from FEC:

  1. We didn’t have the bandwidth for the extra bits required for FEC.
  2. Modern, high performance codes tend to have large block sizes (1000’s of bits) which leads to large latency (several seconds) when applied to low bit rate speech.
  3. The error rates we are interested in (e.g. 10% raw, 1% after FEC decoder) are unusual – many codes don’t work well.

However with Codec 2 pushed down to 700 bit/s we now have enough bandwidth for a rate 1/2 code inside a standard 2kHz SSB channel. Over coffee a few weeks ago, Bill VK5DSP offered to develop some short LDPC codes for me specifically for this application. He sent me an Octave simulation of rate 1/2 and 2/3 codes of length 112 and 56 bits. Codec 2 700C has 28 bit frames so this corresponds to 4 or 2 Codec 2 700C frames, which would introduce a latencies of between 80 to 160ms – quite acceptable for Push To Talk (PTT) radio.

I re-factored Bill’s simulation code to produce ldpc_short.m. This measures BER and PER for Bill’s short LDPC codes, and also plots curves for theoretical, HF multipath channels, a Golay (24,12) code, and the current diversity scheme used in FreeDV 700C.

To check my results I compared the Golay BER and ideal HF multipath (Rayleigh Fading) channel curves to other peoples work. Always a good idea to spot check a few values and make sure they are sensible. I took a simple approach to get results in a reasonable amount of coding time (about 1 day of work in this case). This simulation runs at the symbol rate, and assumes ideal synchronisation. My other modem work (i.e experience) lets me move back and forth between this sort of simulation and real world modems, for example accounting for synchronisation losses.

Error Distribution and Packet Error Rate

I had an idea that Packet Error Rate (PER) might be important. Without FEC, bit errors are scattered randomly about. At our target 1% BER, many frames will have 1 or 2 bit errors. As discussed in the last post Codec 2 700C is sensitive to bit errors as “every bit counts”. For example one bit error in the Vector Quantiser (VQ) index (a big look up table) can throw the speech spectrum right off.

However a LDPC decoder will tend to correct all errors in a codeword, or “die trying” (i.e. fail badly). So an average output BER of say 1% will consist of a bunch of perfect frames, plus a completely trashed one every now and again. Digital voice works better with this style of error pattern than a few random errors in each codec packet. So for a given BER, a system that delivers a lower PER is better for our application. I’ve guesstimated a 10% PER target for intelligible low bit rate speech. Lets see how that works out…..


Here are the BER and PER curves for an AWGN channel:

Here are the same curves for HF (multipath fading) channel:

I’ve included a Golay (24,12) block code (hard decision) and uncoded PSK for comparison to the AWGN curves, and the diversity system on the HF curves. The HF channel is modelled as two paths with 1Hz Doppler spread and a 1ms delay.

The best LDPC code reaches the 1% BER/10% PER point at 2dB Eb/No (AWGN) and 6dB (HF multipath). Comparing BER, the coding gain is 2.5 and 3dB (AWGN and HF). Comparing PER, the coding gain is 3 and 5dB (AWGN and HF).

Here is a plot of the error pattern over time using the LDPC code on a HF channel at Eb/No of 6dB:

Note the errors are confined to short bursts – isolated packets where the decoder fails. Even though the average BER is 1%, most of the speech is error free. This is a very nice error distribution for digital speech.

Speech Samples

Here are some speech samples, comparing the current diversity scheme used for FreeDV 700C to LDPC, for AWGN and LDPC channels. These were simulated by extracting the error pattern from the simulation then inserting these errors in a Codec 2 700C bit stream (see command lines section below).

AWGN Eb/No 2dB Diversity LDPC
HF Eb/No 6dB Diversity LDPC

Next Steps

These results are very encouraging and suggest a gain of 2 to 5dB over FreeDV 700C, and better error distribution (lower PER). Next step is to develop FreeDV 700D – a real world implementation using the 112 data-bit rate 1/2 LDPC code. This will require 4 frames of buffering, and some sort of synchronisation to determine the 112 bit frame boundaries. Fortunately much of the C code for these LDPC codes already exists, as it was developed for the Wenet High Altitude Balloon work.

If most frames at the decoder input are now error free, we can consider more efficient (but less robust) techniques for Codec 2, such as prediction (delta coding). This will decrease the codec bit rate for a given speech quality. We could then choose to reduce our bit rate (making the system more robust for a given channel SNR), or raise speech quality while maintaining the same bit rate.

Command Lines

Generating the decoded speech, first run the Octave ldpc_short simulation to generate “error pattern file”, then subject the Codec 2 700C bit stream to these error patterns.

octave:67> ldpc_short
$ ./c2enc 700C ../../raw/ve9qrp_10s.raw - | ./insert_errors - - ../../octave/awgn_2dB_ldpc.err 28 | ./c2dec 700C - - | aplay -f S16_LE -

The simulation generate .eps files as direct generation of PNG leads to font size issues. Converting EPS to PNG without transparent background:

mogrify -resize 700x600 -density 300 -flatten -format png *.eps

However I still feel the images are a bit fuzzy, especially the text. Any ideas? Here’s the eps file if some one would like to try to get a nicer PNG conversion for me! The EPS file looks great at any scaling when I render it using the Ubuntu document viewer.

Update: A friend of mine (Erich) has suggested using GIMP for the conversion. This does seem to work well and has options for text and line anti-aliasing. It would be nice to be able to generate nice PNGs directly from Octave – my best approach so far is to capture screen shots.


LowSNR site Bill VK5DSP writes about his experiments in low SNR communications.

Wenet High Altitude Balloon SSDV System developed with Mark VK5QI and BIll VK5DSP that uses LDPC codes.

LPDC using Octave and the CML library

FreeDV 700C

Codec 2 700C

Testing FreeDV 700C

Since releasing FreeDV 700C I’ve been “instrumenting” the FreeDV GUI program – adding some code to perform various tests of the 700C waveform, especially over the air.

With the kind help of Gerhard OE3GBB, Mark VK5QI, and Peter VK5APR, I have collected some samples and performed some tests. The goals of this work were:

  1. Compare 700C Over the Air (OTA) to simulation on an AWGN channel.
  2. Compare 700C OTA to SSB on an AWGN channel.


Here is a screen shot of the latest FreeDV GUI Options screen:

I’ve added some features to the top three rows:

Test Frames Send a payload of known test bits rather than vocoder bits
Channel Noise Simulate a channel using AWGN noise
SNR SNR of AWGN noise
Attn Carrier Attenuate just one carrier
Carrier The 700C carrier (1-14) to attenuate
Simulated Interference Tone Enable an interfering sine wave of specified frequency and amplitude
Clipping Enable clipping of 700C tx waveform, to increase RMS power
Diversity Combine for plots Scatter and Test Frame plots use combined (7 carrier) information.

To explore these options it is useful to run in full duplex mode (Tools-PTT Half Duplex unchecked) and using a loopback sound device:

  $ sudo modprobe snd-aloop

More information on loopback in the FreeDV GUI README.

Clipping the 700C tx waveform reduces the Peak to Average Power ratio (PAPR) which may be result in a higher average power over the channel. However clipping distorts the waveform and add some “shoulders (i.e. noise) to the spectrum adjacent to the 700C waveform:

Several users have noticed this distortion. At this stage I’m unsure if clipping is useful or not.

The Diversity Combine option is useful to explore each of the 14 carriers separately before they are combined into 7 carriers.

Many of these options were designed to explore tx filtering. I have long wondered if any of the FreeDV carriers were receiving less power than others, for example due to ripple or a low pass response from the crystal filter. A low power carrier would have a high bit error rate, adversely affecting overall performance. Plotting the scatter diagram or bit error rate on a carrier by carrier basis can measure the effect of tx filtering – if it exists.

Some of the features above – like attenuating a single carrier – were designed to “test the test”. Much of the work I do on FreeDV (and indeed other projects) involves carefully developing software and writing “code to test the code”. For example to build the experiments described in this blog post I worked several hours day for several weeks. Not glamorous, but where the real labour lies in R&D. Careful, meticulous testing and experimentation. One percent inspiration … then code, test, test.

Comparing Analog SSB to Digital Voice

One of my goals is to develop a HF DV system that is competitive with analog SSB. So we need a way to compare analog and DV at the same SNR. So I came with the idea of a wave files of analog SSB and DV which have the same average (RMS) power. If these are fed into a SSB transmitter, then they will be received at the same SNR. I added 10 seconds of a 1000Hz sine wave at the start for good measure – this could be used to measure the actual SNR.

I developed two files:

  1. sine_analog_700c
  2. sine_analog_testframes700c

The first has the same voice signal in analog and 700C, the second uses test frames instead of encoded voice.

Interfering Carriers

One feature described above simulates an interfering carrier (like a birdie), something I have seen on the air. Here is a plot of a carrier in the middle of one of the 700C carriers, but about 10dB higher:

The upper RH plot is a rolling plot of bit errors for each carrier. You can see one carrier is really messed up – lots of bit errors. The average bit error rate is about 1%, which is where FreeDV 700C starts to become difficult to understand. These bit errors would not be randomly distributed, but would affect one part of the codec all the time. For example the pitch might be consistently wrong, or part of the speech spectrum. I found that as long as the interfering carrier is below the FreeDV carrier, the effect on bit error rate is negligible.

Take away: The tx station must tune away from any interfering carriers that poke above the FreeDV signal carriers. Placing the interfering tones between FreeDV carriers is another possibility, e.g. a 50Hz shift of the tx signal.

Results – Transmit Filtering

Simulation results suggest 700C should produce reasonable results near 0db SNR. So that’s the SNR I’m shooting for in Over The Air (OTA) testing.

Mark VK5QI sent me several minutes of test frames so I could determine if there were any carriers with dramatically different bit error rates, which would indicate the presence of some tx filtering. Here is the histogram of BERs for each carrier for Mark’s signal, which was at about 3dB SNR:

There is one bar for each I and Q QPSK bit of the 14 carriers – 28 bars total (note Diversity combination was off). After running for a few minutes, we can see a range of 5E-2 and 8E-2 (5 to 8%). In terms of AWGN modem performance, this is only about 1dB difference in SNR or Eb/No, as per the BER versus Eb/No graphs in this post on the COHPSK modem used for 700C. One carrier being pinned at say 20% BER, or a slope of increasing BER with carrier frequency – would have meant tx filtering trouble.

Peter VK5APR, sent me a high SNR signal (he lives just 4 km away). Initially I could see a X shaped scatter diagram, a possible sign of tx filtering. However this ended up being some amplitude pumping due to Fast AGC on my radio. After I disabled fast AGC, I could see a scatter diagram with 4 clear dots, and no X-shape. Check.

I performed an additional test using my IC7200 as a transmitter, and a HackRF as a receiver. The HackRF has no crystal filter and a very flat response, so any distortion would be due to the IC7200 transmit filtering. Once again – 4 clean dots on the scatter diagram and no X-shape.

So I am happy to conclude that transmit filtering does not seem to be a problem, at least of the radios tested. All performance issues are therefore likely to be caused by me and my algorithms!

Results – Low SNR testing

Peter, VK5APR, configured his station to send the analog/700C equi-power test wave files described above at very low power, such that the received SNR at my station was about 0dB. As we are so close it was reasonable to assume the channel was AWGN, indeed we could see no sign of NVIS fading and the band (40M) was devoid of DX at the 12 noon test time.

Here is the rx signal I received, and the same file run through the 700C decoder. Neither the SSB or the decoded 700C audio are pretty. However it’s fair to say we could just get a message through on both modes and that 700C is holding it’s own next to SSB. The results are close to my simulations which was the purpose of this test.

You can decode the off air signal yourself if you download the first file and replay it through the FreeDV GUI program using “Tools – Start/Stop Play File from Radio”.


While setting up these tests, Peter and I conversed comfortably for some time over FreeDV 700C at a high SNR. This proved to me that for our audience (experienced users of HF radio) – FreeDV 700C can be used for conversational contacts. Given the 700C codec is really just a first pass – that’s a fine result.

However it’s a near thing – the 700C codec adds a lot of distortion just compressing the speech. It’s pretty bad even if the SNR is high. The comments on the Codec 2 700C blog post indicate many lay-people can’t understand speech compressed by 700C. Add any bit errors (due to low SNR or fading) and it quickly becomes hard to understand – even for experienced users. This makes 700C very sensitive to bit errors as the SNR drops. But hey – every one of those 28 bits/frame counts at 700 bit/s so it’s not surprising.

In contrast, SSB scales a bit better with SNR. However even at high SNRs, that annoying hiss is always there – which is very fatiguing. Peter and I really noticed that hiss after a few minutes back on SSB. Yuck.

SSB gets a lot of it’s low SNR “punch” from making effective use of peak power. Here is a plot of the received SSB:

It’s all noise except for the speech peaks, where the “peak SNR” is much higher than 0dB. Our brains are adept at picking out words from those peaks, integrating the received phonetic symbols (mainly vowel energy) in our squishy biological receive filters. It’s a pity we didn’t evolve to detect coherent PSK. A curse on your evolution!

In contrast – 700C allocates just as much power to the silence between words as the most important parts of speech. This suggests we could do a better job at tailoring the codec and modem to peak power, e.g. allocating more power to parts of the speech that really matter. I had a pass at Time Variable Quantisation a few years ago. A variable rate codec might be called for, tightly integrated to the modem to pack more bits/power into perceptually important parts of speech.

The results above assumed equal average power for SSB and FreeDV 700C. It’s unclear if this happens in the real world. For example we may need to “back off” FreeDV drive further than SSB; SSB may use a compressor; and the PAs we are using are generally designed for PEP rather than average power operation.

Next Steps

I’m fairly happy with the baseline COHPSK modem, it seems to be hanging on OK as long as there aren’t any co-channel birdies. The 700C codec works better than expected, has plenty of room for improvement – but it’s sensitive to bit errors. So I’m inclined to try some FEC next. Aim for error free 700C at 0dB, which I think will be superior to SSB. I’ll swap out the diversity for FEC. This will increase the raw BER, but allow me to run a serious rate 0.5 code. I’ll start just with an AWGN channel, then tackle fading channels.


FreeDV 700C
Codec 2 700C

FreeDV 700C

Over the past month the FreeDV 700C mode has been developed, integrated into the FreeDV GUI program version 1.2, and tested. Windows versions (64 and 32 bit) of this program can be downloaded from Thanks Richard Shaw for all your hard work on the release and installers.

FreeDV 700C uses the Codec 2 700C vocoder with the COHPSK modem. Some early results:

  • The US test team report 700C contacts over 2500km at SNRs down to -2dB, in conditions where SSB cannot be heard.
  • My own experience: the 700C speech quality is not quite as good as FreeDV 1600, but usable for conversation. That’s OK – it’s very early days for the 700C codec, and hey, it’s half the bit rate of 1600. I’m actually quite excited that 700C can be used conversationally at this early stage! I experienced a low SNR channel where FreeDV 700C didn’t work but SSB did, however 700C certainly works at much lower SNRs than 1600.
  • Some testers in Europe report 700C falling over at relatively high SNRs (e.g. 8dB). I also experienced this on a 1500km contact. Suspect this is a bug or corner case we can fix, especially in light of the US teams results.

Tony, K2MO, has put together this fine video demonstrating the various FreeDV modes over a simulated HF channel:

It’s early days for 700C, and there are mixed reports. However it’s looking promising. My next steps are to further explore the real world operation of FreeDV 700C, and work on improving the low SNR performance further.

Modems for HF Digital Voice Part 2

In the previous post I argued that pushing bits through a HF channel involves much wailing and gnashing of teeth. Now we shall apply numbers and graphs to the problem, which is – in a nutshell – Engineering.

QPSK Modem Simulation

I have worked up a GNU Octave modem simulation called hf_modem_curves.m. This operates at 1 sample/symbol, i.e. the sample rate is the symbol rate. So we takes some random bits, map them to QPSK symbols, add noise, then turn the noisy symbols back into bits and count errors:

The simulation ignores a few real world details like timing and phase synchronisation, so is a best case model. That’s OK for now. QPSK uses symbols that each carry 2 bits of information, here is the symbol set or “constellation”:

Four different points, each representing a different 2 bit combination. For example the bits ’00’ would be the cross at 45 degrees, ’10’ at 135 degrees etc. The plot above shows all possible symbols, but we just send one at a time. However it’s useful to plot all of the received symbols like this, as an indication of received signal quality. If the channel is playing nice, we receive something like this:

Each cross is now a fuzzy dot, as noise has been added by the channel. No bit errors yet – a bit error happens when we get enough noise to move received symbols into another quadrant. This sort of channel is called Additive White Gaussian Noise (AWGN). Line of site UHF radio is a good example of a real world AWGN channel – all you have to worry about is additive noise.

With a fading or multipath channel like HF we end up with something like:

In a fading channel the received symbol amplitudes bounce up and down as the channel fades in and out. Sometimes the symbols dip down into the noise and we get lots of bit errors. Sometimes the signal is reinforced, and the symbol amplitude gets bigger.

The simulation used for the multipath or HF channel uses a two path model, with additive noise as per the AWGN simulation:

Graphs and Modem Performance

Turns out there are some surprisingly good models to help us work out the expected Bit Error Rate (BER) for a modem. By “model” I mean people have worked out the maths to describe the Bit Error Rate (BER) for a QPSK Modem. This graph shows us how to work out the BER for QPSK (and BPSK):

So the red line shows us the BER given Eb/No (E-B on N-naught), which is a normalised form of Signal to Noise Ratio (SNR). Think about Eb/No as a modem running at 1 bit per second, with the noise power measured in 1 Hz of bandwidth. It’s a useful scale for comparing modems and modulation schemes.

Looking at the black lines, we can see that for an Eb/No or 4dB, we can expect a BER of 1E-2 or 0.01 or 1% of our bits will be received in error over an AWGN channel. This curve is for QPSK or BPSK, different curves would be used for other modems like FSK.

Given Eb/No you can work out the SNR if you know the bit rate and noise bandwidth:

    SNR = S/N = EbRb/NoB

or in dB:

    SNR(dB) = Eb/No(dB) + 10log10(Rb/B)

For example at Rb = 1600 bit/s and a noise bandwidth B = 3000 Hz:

    SNR(dB) = 4 + 10log10(1600/3000) = 1.27 dB

OK, so that was for ideal QPSK. Lets add a few more curves to our graph:

We have added the experimental results for our QPSK simulation (green), and for Differential QPSK (DQPSK – blue). Our QPSK modem simulation (green) is right on top of the theoretical QPSK curve (red) – this is good and shows our simulation is working really well.

DQPSK was discussed in Part 1. Phase differences are sent, which helps with phase errors in the channels but costs us extra bit errors. This is evident on the curves – at the 1E-2 BER line, DQPSK requires 7dB Eb/No, 3dB more (double the power) of QPSK.

Now lets look at modem performance for HF (multipath) channels, on this rather busy graph (click for larger version):

Wow, HF sucks. Looking at the theoretical HF QPSK performance (straight red line) to achieve a BER of 1E-2, we need 14dB of Eb/No. That’s 10dB worse than QPSK on the AWGN channel. With DQPSK, we need about 16dB.

For HF, a lot of extra power is required to make a small difference in BER.

Some of the kinks in the HF curves (e.g. green QPSK HF simulated just under red QPSK HF theory) are due to not enough simulation points – it’s not actually possible to do better than theory!

Estimated Performance of FreeDV Modes

Now we have the tools to estimate the performance of FreeDV modes. FreeDV 1600 uses Codec 2 at 1300 bit/s, plus a little FEC at 300 bit/s to give a total of 1600 bit/s. With the FEC, lets say we can get reasonable voice quality at 4% BER. FreeDV 1600 uses a DQPSK modem.

On an AWGN channel, that’s an Eb/No of 4.4dB for DQPSK, and a SNR of:

    SNR(dB) = 4.4 + 10log10(1600/3000) = 1.7 dB

On a multipath channel, that’s an Eb/No of 11dB for DQPSK, and a SNR of:

    SNR(dB) = 11 + 10log10(1600/3000) = 8.3 dB

As discussed in Part 1, FreeDV 700C uses diversity and coherent QPSK, and has a multipath (HF) performance curve plotted in cyan above, and close to ideal QPSK on AWGN channels. The payload data rate is 700 bit/s, however we have an overhead of two pilot symbols for every 4 data symbols. This means we effectively need a bit rate of Rb = 700*(4+2)/4 = 1050 bit/s to pump 700 bits/s through the channel. It doesn’t have any FEC (yet, anyway), so we need a BER of a little lower than FreeDV 1600, about 2%. Running the numbers:

On an AWGN channel, for 2% BER we need an Eb/No of 3dB for QPSK, and a SNR of:

    SNR(dB) = 3 + 10log10(1050/3000) = -1.5 dB

On a multipath channel, diversity (cyan line) helps a lot, that’s an Eb/No of 8dB, and a SNR of:

    SNR(dB) = 8 + 10log10(1050/3000) = 3.4 dB

The diversity model in the simulation uses two carriers. The amplitudes of each carrier after passing through the multipath model are plotted below:

Often when one carrier is faded, the other is not faded, so when we recombine them at the receiver we get an average that is closer to AWGN performance. However diversity is not perfect, occasionally both carriers are wiped out at the same time by a fade.

So we can see FreeDV 700C is about 4 dB in front of FreeDV 1600, which matches the best reports from early adopters. I’ve had reports of FreeDV 700C operating at as low as -2dB , which is presumably on channels that don’t have heavy fading and are more like AWGN. Also some reports of 700C falling over at high SNRs (around like 8dB)! However that is probably a bug, e.g. a sync issue or something else we can track down in time.

Real world channels can vary. The multipath model above doesn’t take into account fast or slow fading, it just calculates the average bit errors rate. In practice, slow fading is hard to handle in digital voice applications, as the whole channel might be wiped out for a few seconds.

Now that we have a reasonable 700 bit/s codec – we can also consider other schemes, such as a more powerful FEC code rather than diversity. Like diversity, FEC codes provide “coding gain”, moving our operating point to the left. Really good codes operate at 10% BER, right over on the Eb/No = 2dB region of the curve. No free lunch of course – such codes may require long latency (seconds) or be expensive to decode.

Next Steps

I’d like to “instrument” FreeDV 700C and work with the 700C early adopters to find out how well it’s working, why and how it falls over, and work through any obvious bugs. Then start experimenting with ways to make it operate at lower SNRs, such as more powerful FEC codes or even non-redundant techniques like Trellis decoding.

Now we have shown Codec 700C has sufficient quality for conversations over the air, I’m planning another iteration of the Codec 2 700C vocoder design to see if we can improve speech quality.


Modems for HF Digital Voice Part 1.

More Eb/No to SNR worked examples.

Similar modem calculations were used to develop a 100 kbit/s telemetry system to send HD images from High Altitude Balloons.

Modems for HF Digital Voice Part 1

The newly released FreeDV 700C mode uses the Coherent PSK (COHPSK) modem which I developed in 2015. This post describes the challenges of building HF modems for DV, and how the COHPSK modem evolved from the FDMDV modem used for FreeDV 1600.

HF channels are tough. You need a lot of SNR to push bits through them. There are several problems to contend with:

When the transmit signal is reflected off the ionosphere, two or more copies arrive at the receiver antenna a few ms apart. These echoes confuse the demodulator, just like a room with bad echo can confuse a listener.

Here is a plot of a BPSK baseband signal (top). Lets say we receive two copies of this signal, from two paths. The first is identical to what we sent (top), but the second is delayed a few samples and half the amplitude (middle). When you add them together at the receiver input (bottom), it’s a mess:

The multiple paths combining effectively form a comb filter, notching out chunks of the modem signal. Loosing chunks of the modem spectrum is bad. Here is the magnitude and phase frequency response of a channel with the two paths used for the time domain example above:

Note that comb filtering also means the phase of the channel is all over the place. As we are using Phase Shift Keying (PSK) to carry our precious bits, strange phase shifts are more bad news.

All of these impairments are time varying, so the echoes/notches, and phase shifts drift as the ionosphere wiggles about. As well as the multipath, it must deal with noise and operate at SNRs of around 0dB, and frequency offsets between the transmitter and receiver of say +/- 100 Hz.

If commodity sound cards are used for the ADC and DAC, the modem must also handle large sample clock offsets of +/-1000 ppm. For example the transmitter DAC sample clock might be 7996 Hz and the receiver ADC 8004 Hz, instead of the nominal 8000 Hz.

As the application is Push to Talk (PTT) Digital Voice, the modem must sync up quickly, in the order of 100ms, even with all the challenges above thrown at it. Processing delay should be around 100ms too. We can’t wait seconds for it to train like a data modem, or put up with several seconds of delay in the receive speech due to processing.

Using standard SSB radio sets we are limited to around 2000 Hz of RF bandwidth. This bandwidth puts a limit on the bit rate we can get through the channel. The amplitude and phase distortion caused by typical SSB radio crystal filters is another challenge.

Designing a modem for HF Digital Voice is not easy!


In 2012, the FDMDV modem was developed as our first attempt at a modem for HF digital voice. This is more or less a direct copy of the FDMDV waveform which was developed by Francesco Lanza, HB9TLK and Peter Martinez G3PLX. The modem software was written in GNU Octave and C, carefully tested and tuned, and most importantly – is open source software.

This modem uses many parallel carriers or tones. We are using Differential QPSK, so every symbol contains 2 bits encoded as one of 4 phases.

Lets say we want to send 1600 bits/s over the channel. We could do this with a single QPSK carrier at Rs = 800 symbols a second. Eight hundred symbols/s times two bit/symbol for QPSK is 1600 bit/s. The symbol period Ts = 1/Rs = 1/800 = 1.25ms. Alternatively, we could use 16 carriers running at 50 symbols/s (symbol period Ts = 20ms). If the multipath channel has echoes 1ms apart it will make a big mess of the single carrier system but the parallel tone system will do much better, as 1ms of delay spread won’t upset a 20ms symbol much:

We handle the time-varying phase of the channel using Differential PSK (DPSK). We actually send and receive phase differences. Now the phase of the channel changes over time, but can be considered roughly constant over the duration of a few symbols. So when we take a difference between two successive symbols the unknown phase of the channel is removed.

Here is an example of DPSK for the BPSK case. The first figure shows the BPSK signal top, and the corresponding DBPSK signal (bottom). When the BPSK signal changes, we get a +1 DBPSK value, when it is the same, we get a -1 DBPSK value.

The next figure shows the received DBPSK signal (top). The phase shift of the channel is a constant 180 degrees, so the signal has been inverted. In the bottom subplot the recovered BPSK signal after differential decoding is shown. Despite the 180 degree phase shift of the channel it’s the same as the original Tx BPSK signal in the first plot above.

This is a trivial example, in practice the phase shift of the channel will vary slowly over time, and won’t be a nice neat number like 180 degrees.

DPSK is a neat trick, but has an impact on the modem Bit Error Rate (BER) – if you get one symbol wrong, the next one tends to be corrupted as well. It’s a two for one deal on bit errors, which means crappier performance for a given SNR than regular (coherent) PSK.

To combat frequency selective fading we use a little Forward Error Correction (FEC) on the FreeDV 1600 waveform. So if one carrier gets notched out, we can use bits in the other carriers to recover the missing bits. Unfortunately we don’t have the bandwidth available to protect all bits, and the PTT delay requirement means we have to use a short FEC code. Short FEC codes don’t work as well as long ones.


Over the next few years I spent some time thinking about different modem designs and trying a bunch of different ideas, most of which failed. Research and disappointment. You just have to learn from your mistakes, talk to smart people, and keep trying. Then, towards the end of 2014, a few ideas started to come together, and the COHPSK modem was running in real time in mid 2015.

The major innovations of the COHPSK modem are:

  1. The use of diversity to help combat frequency selective fading. The baseline modem has 7 carriers. A copy of these are made, and sent at a higher frequency to make 14 tones in total. Turns out the HF channel giveth and taketh away. When one tone is notched out another is enhanced (an anti-fade). So we send each carrier twice and add them back together at the demodulator, averaging out the effect of frequency selective fades:
  2. To use diversity we need enough bandwidth to fit a copy of the baseline modem carriers. This implies the need for a vocoder bit rate of much less than 1600 bit/s – hence several iterations at a 700 bits/s speech codec – a completely different skill set – and another 18 months of my life to develop Codec 2 700C.
  3. Coherent QPSK detection is used instead of differential detection, which halves the number of bit errors compared to differential detection. This requires us to estimate the phase of the channel on the fly. Two known symbols are sent followed by 4 data symbols. These known, or Pilot symbols, allow us to measure and correct for the current phase of each carrier. As the pilot symbols are sent regularly, we can quickly acquire – then track – the phase of the channel as it evolves.

Here is a figure that shows how the pilot and data symbols are distributed across one frame of the COHPSK modem. More information of the frame design is available in the cohpsk frame design spreadsheet, including performance calculations which I’ll explain in the next blog post in this series.

Coming Next

In the next post I’ll show how reading a few graphs and adding a few dBs together can help us estimate the performance of the FDMDV and COHPSK modems on HF channels.


Modems for HF Digital Voice Part 2

cohpsk_plots.m Octave script used to generate plots for this post.

FDMDV Modem Page

Some earlier musings on FreeDV 1600 and why SSB works so well:

FreeDV Robustness Part 1

FreeDV Robustness Part 2

FreeDV Robustness Part 3

CMA Equalisation of FSK

We’ve just released a new experimental mode for Digital Voice called FreeDV 800XA. This uses the Codec 700C mode, 100 bit/s for synchronisation, and a 4FSK modem, actually the same modem that has been so successful for images from High Altitude Balloons.

FSK has the advantage of being a constant amplitude waveform, so efficient class C amplifiers can be used. However as it currently stands, 800XA has no real protection for the multipath common on HF channels, for example symbols that have an echo delayed by a few ms.

So I decided to start looking at equalisers. Some Googling suggested the Constant Modulus Algorithm (CMA) Equaliser might be a suitable choice for FSK, and turned up some sample code on DSP stack exchange.

I had a bit of trouble getting the algorithm to work for bandpass FSK signals, so posted this question on CMA equalisation for FSK. I received some kind help, and eventually made the equaliser work on a simulated HF channel. Here is the Octave simulation cma.m

How it works

The equaliser attempts to correct for the channel using the received signal, which is corrupted by noise.

There is a “gotcha” in using a FIR filter to equalise a channel response. Consider a channel H(z) with a simple 3 sample impulse response h(n). Now we could equalise this with the exact inverse 1/H(z). Here is a plot of our example channel frequency response and the ideal equaliser which is exactly the inverse:

Now here is a plot of the impulse responses of the channel h(n), and equaliser h'(n):

The ideal equaliser response h'(n) is much longer than the 3 samples of the channel impulse response h(n). The CMA algorithm requires our equaliser to be a FIR filter. Counter-intuitively, we need to use an FIR equaliser with a number of taps significantly larger than the expected channel impulse response we are trying to equalise.

One explanation for this – the channel response can be considered to be a Finite Impulse response (FIR) filter H(z). The exact inverse 1/H(z), when expressed in the time domain, is an Infinite Impulse Response (IIR) filter, which have, you know, an infinitely long impulse response!


The figures below show the CMA equaliser doing it’s thing in a multipath channel with AWGN noise. In Figure 1 the error is reduced over time, and the lower plot shows the combined channel-equaliser impulse response. If the equaliser was perfect the combined channel-equaliser response would be 1.

Figure 2 below shows the CMA going to work on a FSK signal. The top subplot is the transmitted FSK signal, you can see the two different frequencies in the waveform. The middle plot shows the received signal, after it has been messed up by the multipath channel. It’s clear that the tone amplitudes are different. Looking carefully at the point where the tones transition (e.g. around sample 25 and 65) there is intersymbol interference due to multipath echoes, messing up the start of each FSK symbol.

However in the bottom subplot the equaliser has worked it’s magic and the waveform is looking quite nice. The tone levels are nearly equal and much of the ISI removed. Yayyyyyy.

Figure 4 shows the magnitude frequency response at several stages in the simulation. The top subplot is the channel response. It’s a comb filter, typical of multipath channels. The middle subplot is the equaliser response. Ideally, this should be the exact inverse of the channel. It’s pretty close at the low end but seems to lose it’s way at very low and high frequencies. The lower plot is the combined response, which is close to 0dB at the low frequencies. Cool.

Figure 4 is the transmit spectrum of the modem signal (top), and the spectrum after the channel has mangled it (lower). Note one tone is now lower than the other. Also note that the modem signal only has energy in the low-mid range of the spectrum. This might explain why the equaliser does a good job in that region of the spectrum – it’s where we have energy to drive the adaption.

Problems for HF Digital Voice

Unfortunately the CMA equaliser only works well at high SNRs, and takes seconds to converge. I am interested in low SNR (around 0dB in a 3000 Hz noise bandwidth) and it’s Push To Talk (PTT) radio so we a need fast initial training, around 100ms. Then it must follow the time varying HF channel, continually retraining on the fly.

For further work I really should measure BER versus Eb/No for a variety of SNRs and convergence times, and measure what BER improvement we are buying with equalisation. BER is King, much easier that squinting at time domain waveforms.

If the CMA cost function was used with known information (like pilot symbols or the Unique Word we have in 800XA) it might be able to work faster. This would involve deconvolution on the fly, rather than using iterative or adaptive techniques.


Jonathan Olds (Jonti) has been Experimening with digital SCA signals. This includes an OQPSK modem with a C implementation of a CMA equaliser.

Codec 2 700C

My endeavor to produce a digital voice mode that competes with SSB continues. For a big chunk of 2016 I took a break from this work as I was gainfully employed on a commercial HF modem project. However since December I have once again been working on a 700 bit/s codec. The goal is voice quality roughly the same as the current 1300 bit/s mode. This can then be mated with the coherent PSK modem, and possibly the 4FSK modem for trials over HF channels.

I have diverged somewhat from the prototype I discussed in the last post in this saga. Lots of twists and turns in R&D, and sometimes you just have to forge ahead in one direction leaving other branches unexplored.


Sample 1300 700C
hts1a Listen Listen
hts2a Listen Listen
forig Listen Listen
ve9qrp_10s Listen Listen
mmt1 Listen Listen
vk5qi Listen Listen
vk5qi 1% BER Listen Listen
cq_ref Listen Listen

Note the 700C samples are a little lower level, an artifact of the post filtering as discussed below. What I listen for is intelligibility, how easy is the same to understand compared to the reference 1300 bit/s samples? Is it muffled? I feel that 700C is roughly the same as 1300. Some samples a little better (cq_ref), some (ve9qrp_10s, mmt1) a little worse. The artifacts and frequency response are different. But close enough for now, and worth testing over air. And hey – it’s half the bit rate!

I threw in a vk5qi sample with 1% random errors, and it’s still usable. No squealing or ear damage, but perhaps more sensitive that 1300 to the same BER. Guess that’s expected, every bit means more at a lower bit rate.

Some of the samples like vk5qi and cq_ref are strongly low pass filtered, others like ve9qrp are “flat” spectrally, with the high frequencies at about the same level as the low frequencies. The spectral flatness doesn’t affect intelligibility much but can upset speech codecs. Might be worth trying some high pass (vk5qi, cq_ref) or low pass (ve9qrp_10s) filtering before encoding.


Below is a block diagram of the signal processing. The resampling step is the key, it converts the time varying number of harmonic amplitudes to fixed number (K=20) of samples. They are sampled using the “mel” scale, which means we take more finely spaced samples at low frequencies, with coarser steps at high frequencies. This matches the log frequency response of the ear. I arrived at K=20 by experiment.

The amplitudes and even the Vector Quantiser (VQ) entries are in dB, which is very nice to work in and matches the ears logarithmic amplitude response. The VQ was trained on just 120 seconds of data from a training database that doesn’t include any of the samples above. More work required on the VQ design and training, but I’m encouraged that it works so well already.

Here is a 3D plot of amplitude in dB against time (300 frames) and the K=20 frequency vectors for hts1a. You can see the signal evolving over time, and the low levels at the high frequency end.

The post filter is another key step. It raises the spectral peaks (formants) an lowers the valleys (anti-formants), greatly improving the speech quality. When the peak/valley ratio is low, the speech takes on a muffled quality. This is an important area for further investigation. Gain normalisation after post filtering is why the 700C samples are lower in level than the 1300 samples. Need some more work here.

The two stage VQ uses 18 bits, energy 4 bits, and pitch 6 bits for a total of 28 bits every 40ms frame. Unvoiced frames are signalled by a zero value in the pitch quantiser removing the need for a voicing bit. It doesn’t use differential in time encoding to make it more robust to bit errors.

Days and days of very careful coding and checks at each development step. It’s so easy to make a mistake or declare victory early. I continually compared the output speech to a few Codec 2 1300 samples to make sure I was in the ball park. This reduced the subjective testing to a manageable load. I used automated testing to compare the reference Octave code to the C code, porting and testing one signal processing module at a time. Sometimes I would just printf rows of vectors from two versions and compare the two, old school but quite effective and spotting the step where the bug crept in.

Command line

The Octave simulation code can be driven by the scripts newamp1_batch.m and newamp1_fby.m, in combination with c2sim.

To try the C version of the new mode:

codec2-dev/build_linux/src$ ./c2enc 700C ../../raw/hts1a.raw - | ./c2dec 700C - -| play -t raw -r 8000 -s -2 -

Next Steps

Some thoughts on FEC. A (23,12) Golay code could protect the most significant bits of 1st VQ index, pitch, and energy. The VQ could be organised to tolerate errors in a few of its bits by sorting to make an error jump to a ‘close’ entry. The extra 11 parity bits would cost 1.5dB in SNR, but might let us operate at significantly lower in SNR on a HF channel.

Over the next few weeks we’ll hook up 700C to the FreeDV API, and get it running over the air. Release early and often – lets find out if 700C works in the real world and provides a gain in performance on HF channels over FreeDV 1600. If it looks promising I’d like to do another lap around the 700C algorithm, investigating some of the issues mentioned above.

SM2000 – Part 8 – Gippstech 2016 Presentation

Justin, VK7TW, has published a video of my SM2000 presentation at Gippstech, which was held in July 2016.

Brady O’Brien, KC9TPA, visited me in June. Together we brought the SM2000 up to the point where it is decoding FreeDV 2400A waveforms at 10.7MHz IF, which we demonstrate in this video. I’m currently busy with another project but will get back to the SM2000 (and other FreeDV projects) later this year.

Thanks Justin and Brady!

FreeDV and this video was also mentioned on this interesting Reddit post/debate from Gary KN4AQ on VHF/UHF Digital Voice – a peek into the future

Codec 2 Masking Model Part 5

In the last post in this series I was getting close to a fully quantised 700 bit/s codec. However as I pushed through I discovered a bug in the post-filter. I was accidentally cheating and using some of the encoder information in the decoder. When I corrected the bug the quality dropped significantly. I’ve hit these sorts of bugs before – the simulation code is complex and it’s easy to “declare victory” prematurely.

So I have abandoned the AbyS approach for now. Oh well, that’s “research and disappointment” for you. Plenty of new ideas though….

For the last few months I have been working on another solution that vector quantises a “fixed rate” version of the spectrum. The masking functions are still used to smooth the spectrum before sampling at the fixed rate. Much like we low pass filter time domain samples before sampling, the masking functions reduce the “bandwidth” and hence sample “rate” we need to represent the spectrum. Here is a block diagram of the current “700C” candidate codec:

The bit allocation is pitch (Wo) 6 bits, 1 bit for voicing, 16 bits for the amplitude VQ, 4 bits for energy and 1 bit spare. All updated every 40ms. The new work is in the “Decimate in Frequency” block, expanded here:

As the pitch of the speech varies, the number of harmonics used to represent the speech, L, varies. The goal is take a vector of L amplitude samples, vector quantise, and send them over a channel. To vector quantise them we need fixed length vectors. So a Discrete Fourier Transform (DFT) is used to resample the L amplitude samples to fixed vectors of length 20 (I have chosen k=10).

BTW a DFT is the generic form of a Fast Fourier Transform (FFT). A FFT is a computationally efficient (fast) way of computing a DFT.

The steps are similar to sampling a time domain signal. The bandwidth of the signal is limited by using the masking function to smooth the variations in the amplitude envelope. The use of masking functions means the smoothing matches the response of the ear, and no perceptually important information is lost.

I’ve recently been playing with OFDM modems, so I used a “cyclic suffix” to further smooth the DFT coefficients. DFTs like cyclic signals. If you have a DFT of an 8kHz signal, the sample at 3900Hz is the “close” to the sample at 0 Hz. If there is a step jump in amplitude – you get a lot of high frequency information in the DFT coefficients which is harder to quantise. So I throw away the last 500Hz of the speech signal (3500-4000 Hz), and replace it with a curve that ensures a smooth match between 3500 Hz and 0 Hz.

Yeah, I don’t know how I dream this stuff up either …… do I use the Force? Too much red wine or espresso? Experience? A life mispent on computers? Subconscious innovation? Plagiarism?

In the past I’ve tried to resample and VQ the spectrum of sinusoidal codecs a few times, without much success. Jean Marc also suggested something similar a few posts back. Anyhoo, getting somewhere this time around.

Here are some plots that show the algorithm in action for a frame of female speech:

Here are the amplitude samples (red crosses). The blue line has the cyclic suffix, note how it meets the first amplitude sample near 0Hz.

This figure shows the difference in the DFT coefficients with (blue) and without (green) the cyclic suffix:

Here is the cumulative energy of DFT coefficients, note that with the cyclic suffix (blue) low frequency energy dominates:

This figure shows a typical 2k=20 length vector that we vector quantise. Note it has zero mean – we extract the DC coefficient and separately quantise this as the frame energy.


Sample 1300 700C Candidate
hts1a Listen Listen
hts2a Listen Listen
forig Listen Listen
ve9qrp_10s Listen Listen
mmt1 Listen Listen
vkqi Listen Listen
cq_ref Listen Listen

Through a couple of years of on-air operation we have established that the 1300 bit/s codec (as used in FreeDV 1600 with 300 bit/s of FEC) has acceptable speech quality for HF. So the goal of this work is similar quality at 700 bit/s.

For some samples above (e.g. hts1a and mmt1a), 1300 is superior to the current 700C candidate. For others (e.g. hts2a and vk5qi) 700 sounds a little better. So I think I’m in the ball park.

There’s a bit of clipping at the start of cq_ref, and some level variations between the two modes on some samples. The 700C candidate has a few problems with unvoiced sounds, e.g. the intake of breath on ve9qrp_10, and the “ch” sound at the start of chicken in hts2a. Not sure why.

The cq_ref_1300 sample is a bit poor as the LPC technique used for spectral amplitudes falls over when the spectral dynamic range is high. In this sample the LF energy has much higher energy than the HF, i.e. a strong “Low Pass Filter” effect or spectral slope.

Next step is some refactoring – the Octave code is an untidy mess of 6 months of dead ends and false starts. A mirror of real world R&D I guess. Creating something new is not a tidy process. At least in my head. So many aspects of this algorithm that I could explore but I’d rather get this on the air and see if we really have something here. Would love to have some help with a port from Octave to C. Contact me if you’d like to work in this area.