ARM NEON Optimisation

I’ve been trying to optimise NEON DSP code on a Raspberry Pi. Using the intrinsics I managed to get a speed increase of about 3 times over vanilla C with just a few hours work. However the results are still significantly slower than the theoretical speed of the machine, which is 4 multiply-acculumates (8 float operations) per cycle. On a 1.2 GHz core that’s 9.6 GFLOPs.

Since then I’ve been looking at ARM manuals, Googling, and trying various ad-hoc ideas. There is a lack of working, fully optimised code examples, and I can’t find any data on cycle times and latency information for the Cortex A53 device used for the Rpi. The number of ARM devices and families is bewildering, and trying to find information in a series of thousand-page manuals daunting.

Fortunately the same NEON assembler seems to work (i.e. it assembles cleanly and you get the right results) on many ARM machines. It’s just unclear how fast it runs and why.

To get a handle on the problem I wrote a series of simple floating point dot product programs, and attempted to optimise them. Each program runs through a total of 1E9 dot product points, using an inner and outer loop. I made the inner loop pretty small (1000 floats) to try to avoid cache miss issues. Here are the results, using cycle counts measured with “perf”:

Program Test Theory cycles/loop Measured cycles/loop GFLOPS
dot1 Dot product no memory reads 1 4 1.2*8/4 = 1.2
dot2 Dot product no memory reads unrolled 1 1 1.2*8/1 = 9.6
dot3 Dot product with memory reads 3 9.6 1.2*8/9.6 = 1
dot4 Dot product with memory reads assembler 3 6.1 1.2*8/6.1 = 1.6
dotne10 Dot product with memory reads Ne10 3 11 1.2*8/11 = 0.87

Cycles/loop is how many cycles are executed for one iteration of the inner loop. The last column assumes a 1.2 GHz clock, and 8 floating point ops for every NEON vector multiply-accumulate (vmul.f32) instruction (a multiply, an add, 4 floats per vector processed in parallel).

The only real success I had was dot2, but that’s an unrealistic example as it doesn’t read memory in the inner loop. I guessed that the latencies in the NEON pipeline meant an unrolled loop would work better.

Assuming (as I can’t find any data on instruction timing) two cycles for the memory reads, and one for the multiply-accumulate, I was hoping at 3 cycles for dot3 and dot4. Maybe even better if there is some dual issue magic going on. Best I can do is 6 cycles.

I’d rather have enough information to “engineer” the system than have to rely on guesses. I’ve worked on many similar DSP optimisation projects in the past which have had data sheets and worked examples as a starting point.

Here is the neon-dot source code on GitLab. If you can make the code run faster – please send me a patch! The output looks something like:

$ make test
sum: 4e+09 FLOPS: 8e+09
sum: 4e+09 FLOPS: 8e+09
sum: 4.03116e+09 target cycles: 1e+09 FLOPS: 8e+09
sum: 4.03116e+09 target cycles: 1e+09 FLOPS: 8e+09
FLOPS: 4e+09
grep cycles dot_log.txt
     4,002,420,630      cycles:u    
     1,000,606,020      cycles:u    
     9,150,727,368      cycles:u
     6,361,410,330      cycles:u
    11,047,080,010      cycles:u

The dotne10 program requires the Ne10 library. There’s a bit of floating point round off in some of the program outputs (adding 1.0 to a big number), that’s not really a bug.

Some resources I did find useful:

  1. tterribe NEON tutorial. I’m not sure if the A53 has the same cycle timings as the Cortex-A discussed in this document.
  2. ARM docs, I looked at D0487 ARMv8 Arch Ref Manual, DDI500 Cortex A53 TRM, DDI502 Cortex A53 FPU TRM, which both reference the DEN013 ARM Cortex-A Series Programmer’s Guide. Couldn’t find any instruction cycle timing in any of them, but section 20.2 of DEN013 had some general tips.
  3. Linux perf was useful for cycle counts, and in record/report mode may help visualise pipeline stalls (but I’m unclear if that’s what I’m seeing due to my limited understanding).

Codec 2 and TWELP

DSP Innovations have recently published comparisons of Codec 2 with their TWELP codec at 2400 and 600 bit/s.

Along with some spirited rhetoric, they have published some TWELP 600 samples (including source). The comparison, especially in the 600 bit/s range, is very useful to my work.

I’ve extracted a random subset of the 600 bit/s a_eng.wav samples, broken up into a small chunks to make them easier to compare. Have a listen, and see what you think:

Sample Source MELP 600e Codec 2 700c TWELP 600
1 Listen Listen Listen Listen
2 Listen Listen Listen Listen
3 Listen Listen Listen Listen
4 Listen Listen Listen Listen
5 Listen Listen Listen Listen
6 Listen Listen Listen Listen

The samples do have quite a bit of background noise. The usual approach for noisy samples is to use a noise suppression algorithm first, e.g. we use the Speex noise suppression in FreeDV. However it’s also a test of the codecs robustness to background noise, so I didn’t perform any noise suppression for the Codec 2 samples.


I am broadly in agreement with their results. Using the samples provided, the TWELP codec appears to be comparable to MELP 2400, with Codec 2 2400 a little behind both. This is consistent with other Codec 2 versus MELP/AMBE comparisons at 2400 bits/s. That’s not a rate I have been focussing on, most of my work has been directed at lower rates required for HF Digital voice.

I think – for these samples – their 600 bit/s codec also does better than Codec 2 700C, but not by a huge margin. Their results support our previous findings that Codec 2 is as good as (or even a little better) than MELP 600e. It does depend on the samples used, as I will explain below.

DSP Innovations have done some fine work in handling non-speech signals, a common weakness with speech codecs in this range.

Technology Claims

As to claims of superior technology, and “30 year old technology”:

  1. MELP 2400 was developed in the 1990’s, and DSP Innovations own results show similar speech quality, especially at 2400 bits/s.
  2. AMBE is in widespread use, and uses a very similar harmonic sinusoidal model to Codec 2.
  3. The fundamental work on speech compression was done in the 1970s and 80’s, and much of what we use today (e.g. in your mobile phone) is based on incremental advances on that.
  4. As any reader of this blog will know, Codec 2 has been under continual development for the past decade. I haven’t finished, still plenty of “DSP Innovation” to come!

While a fine piece of engineering, TWELP isn’t in a class of it’s own – it’s still a communications quality speech codec in the MELP/AMBE/Codec 2 quality range. They have not released any details of their algorithms, so they cannot be evaluated objectively by peer review.

PESQ and Perceptual evaluation of speech quality

DSP Innovations makes extensive use of the PESQ measure, for both this study and for comparisons to other competitors.

Speech quality is notoriously hard to estimate. The best way is through controlled subjective testing but this is expensive and time consuming. A utility to accurately estimate fine differences in speech quality would be a wonderful research tool. However in my experience (and the speech coding R&D community in general), such a tool does not exist.

The problem is even worse for speech codecs beneath 4 kbit/s, as they distort the signal so significantly.

The P.862 standard acknowledges these limits, and explicitly states in Table 3 “Factors, technologies and applications for which PESQ has not currently been validated … CELP and hybrid codecs < 4 kbit/s". The standard they are quoting does not support use of PESQ for their tests.

PESQ is designed for phone networks, and much higher bit rate codecs. In section 2 of the standard they present best-case correlation results of +/- 0.5 MOS points (note on a scale of 1-5, this is +/- 10% error). That’s when it is used for speech codecs > 4 kbit/s that it is designed for.

So DSP Innovations statements like “Superiority of the TWELP 2400 and MELPe 2400 over CODEC2 2400 is on average 0.443 and 0.324 PESQ appropriately” are unlikely to be statistically valid.

The PESQ algorithm (Figure 4a of the standard) throws away all phase information, keeping just the FFT power spectrum. This means it cannot evaluate aspects of the speech signal that are very important for speech quality. For example PESQ could not tell the difference between voiced speech (like a vowel) an unvoiced (like a consonant) with the same power spectrum.

DSP Innovations haven’t shown any error bars or standard deviations on their results. Even the best subjective tests will have error bars wider than the PESQ results DSP Innovations are claiming as significant.

I do sympathise with them. This isn’t a huge market, they are a small company, and subjective testing is expensive. Numbers look good on a commercial web site from a marketing sense. However I suggest you disregard the PESQ numbers.

Speech Samples

Speech codecs tend to work well with some samples and fall over with others. It is natural to present the best examples of your product. DSP Innovations chose what speech material they would present in their evaluation of Codec 2. I have asked them to give me the same courtesy and code speech samples of my choice using TWELP. I have received no response to my request.

Support and Porting

An open source codec can be ported to another machine in seconds (rather than months that DSP Innovations quote) with a cross compiler. At no cost.

Having the source code makes minor problems easy to fix yourself. We have a community that can answer many questions. For tougher issues; well I’m available for paid support – just like DSP Innovations.

Also …. well open source is just plain cool. As a reminder, here are the reasons I started Codec 2, nearly 10 years ago.

To quote myself:

A free codec helps a large amount of people and promotes development and innovation. A closed codec helps a small number people make money at the expense of stifled business and technical development for the majority.

Reading Further

Open Source Low Rate Speech Codec Part 1, the post that started Codec 2.
P.862 PESQ standard.
CODEC2 vs TWELP on 2400 bps. DSP Innovations evaluate Codec 2, MELP, and TWELP at 2400 bits/s.
CODEC2 vs TWELP on 700 bps. DSP Innovations evaluate Codec 2, MELP, and TWELP at 600 (ish) bits/s.
AMBE+2 and MELPe 600 Compared to Codec 2. An earlier comparison, using samples from DSP Innovations.

How Inlets Generate Thrust on Supersonic aircraft

Some time ago I read Skunk Works, a very good “engineering” read.

In the section on the SR-71, the author Ben Rich made a statement that has puzzled me ever since, something like: “Most of the engines thrust is developed by the intake”. I didn’t get it – surely an intake is a source of drag rather than thrust? I have since read the same statement about the Concorde and it’s inlets.

Lately I’ve been watching a lot of AgentJayZ Gas Turbine videos. This guy services gas turbines for a living and is kind enough to present a lot of intricate detail and answer questions from people. I find his presentation style and personality really engaging, and get a buzz out of his enthusiasm, love for his work, and willingness to share all sorts of geeky, intricate details.

So inspired by AgentJayZ I did some furious Googling and finally worked out why supersonic planes develop thrust from their inlets. I don’t feel it’s well explained elsewhere so here is my attempt:

  1. Gas turbine jet engines only work if the air is moving into the compressor at subsonic speeds. So the job of the inlet is to slow the air down from say Mach 2 to Mach 0.5.
  2. When you slow down a stream of air, the pressure increases. Like when you feel the wind pushing on your face on a bike. Imagine (don’t try) the pressure on your arm hanging out of a car window at 100 km/hr. Now imagine the pressure at 3000 km/hr. Lots. Around a 40 times increase for the inlets used in supersonic aircraft.
  3. So now we have this big box (the inlet chamber) full of high pressure air. Like a balloon this pressure is pushing equally on all sides of the box. Net thrust is zero.
  4. If we untie the balloon neck, the air can escape, and the balloon shoots off in the opposite direction.
  5. Back to the inlet on the supersonic aircraft. It has a big vacuum cleaner at the back – the compressor inlet of the gas turbine. It is sucking air out of the inlet as fast as it can. So – the air can get out, just like the balloon, and the inlet and the aircraft attached to it is thrust in the opposite direction. That’s how an inlet generates thrust.
  6. While there is also thrust from the gas turbine and it’s afterburner, turns out that pressure release in the inlet contributes the majority of the thrust. I don’t know why it’s the majority. Guess I need to do some more reading and get my gas equations on.

Another important point – the aircraft really does experience that extra thrust from the inlet – e.g. it’s transmitted to the aircraft by the engine mounts on the inlet, and the mounts must be designed with those loads in mind. This helps me understand the definition of “thrust from the inlet”.

Cafe Dark Ages

Today, like most mornings, I biked to a cafe to hack on my laptop while slurping on iced coffee. Exercise, fresh air, sugar, caffeine and R&D. On this lovely sunny Autumn day I’m tapping away on my lappy, teasing bugs out of my latest digital radio system.

Creating new knowledge is a slow, tedious, business.

I test each small change by running an experiment several hundred thousand times, using simulation software on my laptop. R&D – Science by another name – is hard. One in ten of my ideas actually work, despite being at the peak of my career, having a PhD in the field, and help from many very intelligent peers.

A different process is going on at the table next to me. An “Integrative Health Consultant” is going about her business, speaking to a young client.

In an earnest yet authoritative Doctor-voice the “consultant” revs up with ill-informed dietary advice, moves on to over-priced under-performing products that the consultant just happens to sell, and ends up with a thinly disguised invitation to join her Multi-Level-Marketing (MLM) organisation. With a few side journeys through anti-vaxer land, conspiracy theories, organic food, anti-carbo and anti-gluten, sprinkled with disparaging remarks on Science, evidence based medicine and an inspired stab at dissing oncology (“I know this guy who had chemo and still died!”). All heavily backed by n=1 anecdotes.

A hobby of mine is critical thinking, so I am aware that most of their conversation is bullshit. I know how new knowledge is found (see above) and it’s not from Facebook.

But this post is not about the arguments of alt-med versus evidence based medicine. Been there, done that.

Here is what bothers me. These were both good people, who more or less believe in what they say. They are not stupid, they are intelligent and want to help people get and stay healthy. I have friends and family that I love who believe this crap. But they are hurting society and making people sicker.

Steering people away from modern, evidence based medicine kills people. Someone who is persuaded to see a naturopath rather than an oncologist will find out too late the price of well-meaning ignorance. Anti-vaxers hurt, maim, and kill for their beliefs. I shudder to think of the wasted lives and billions of dollars that could be spent on far better outcomes than lining the pockets of snake oil salespeople.

There is some encouraging news. The Australian Government has started removing social security benefits from people who don’t vaccinate. The Nursing and Midwifery Board is also threatening to take action against Nurses who push an anti-vaccination stance.

But this is beating people with a stick, where is the carrot?

Doctors in the Dark Ages were good people. They really believed leaches, blood letting and prayer where helping the patients they loved. But those beliefs sustained untold human misery. The difference with today?

Science, Education, and Policy.

AMBE+2 and MELPe 600 Compared to Codec 2

Yesterday I was chatting on the #freedv IRC channel, and a good question was asked: how close is Codec 2 to AMBE+2 ? Turns out – reasonably close. I also discovered, much to my surprise, that Codec 2 700C is better than MELPe 600!


Original AMBE+2 3000 AMBE+ 2400 Codec 2 3200 Codec 2 2400
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Original MELPe 600 Codec 2 700C
Listen Listen Listen
Listen Listen Listen
Listen Listen Listen
Listen Listen Listen
Listen Listen Listen
Listen Listen Listen
Listen Listen Listen
Listen Listen Listen
Listen Listen Listen

Here are all the samples in one big tar ball.


I don’t have a AMBE or MELPe codec handy so I used the samples from the DVSI and DSP Innovations web sites. I passed the original “DAMA” speech samples found on these sites through Codec 2 (codec2-dev SVN revision 3053) at various bit rates. Turns out the DAMA samples were the same for the AMBE and MELPe samples which was handy.

These particular samples are “kind” to codecs – I consistently get good results with them when I test with Codec 2. I’m guessing they also allow other codecs to be favorably demonstrated. During Codec 2 development I make a point of using “pathological” samples such as hts1a, cg_ref, kristoff, mmt1 that tend to break Codec 2. Some samples of AMBE and MELP using my samples on the Codec 2 page.

I usually listen to samples through a laptop speaker, as I figure it’s close to the “use case” of a PTT radio. Small speakers do mask codec artifacts, making them sound better. I also tried a powered loud speaker with the samples above. Through the loudspeaker I can hear AMBE reproducing the pitch fundamental – a bass note that can be heard on some males (e.g. 7), whereas Codec 2 is filtering that out.

I feel AMBE is a little better, Codec 2 is a bit clicky or impulsive (e.g. on sample 1). However it’s not far behind. In a digital radio application, with a small speaker and some acoustic noise about – I feel the casual listener wouldn’t discern much difference. Try replaying these samples through your smart-phone’s browser at an airport and let me know if you can tell them apart!

On the other hand, I think Codec 2 700C sounds better than MELPe 600. Codec 2 700C is more natural. To my ear MELPe has very coarse quantisation of the pitch, hence the “Mr Roboto” sing-song pitch jumps. The 700C level is a bit low, an artifact/bug to do with the post filter. Must fix that some time. As a bonus Codec 2 700C also has lower algorithmic delay, around 40ms compared to MELPe 600’s 90ms.

Curiously, Codec 2 uses just 1 voicing bit which means either voiced or unvoiced excitation in each frame. xMBE’s claim to fame (and indeed MELP) over simpler vocoders is the use of mixed excitation. Some of the spectrum is voiced (regular pitch harmonics), some unvoiced (noise like). This suggests the benefits of mixed excitation need to be re-examined.

I haven’t finished developing Codec 2. In particular Codec 2 700C is very much a “first pass”. We’ve had a big breakthrough this year with 700C and development will continue, with benefits trickling up to other modes.

However the 1300, 2400, 3200 modes have been stable for years and will continue to be supported.

Next Steps

Here is the blog post that kicked off Codec 2 – way back in 2009. Here is a video of my 2012 Codec 2 talk that explains the motivations, IP issues around codecs, and a little about how Codec 2 works (slides here).

What I spoke about then is still true. Codec patents and license fees are a useless tax on business and stifle innovation. Proprietary codecs borrow as much as 95% of their algorithms from the public domain – which are then sold back to you. I have shown that open source codecs can meet and even exceed the performance of closed source codecs.

Wikipedia suggests that AMBE license fees range from USD$100k to USD$1M. For “one license fee” we can improve Codec 2 so it matches AMBE+2 in quality at 2400 and 3000 bit/s. The results will be released under the LGPL for anyone to use, modify, improve, and inspect at zero cost. Forever.

Maybe we should crowd source such a project?

Command Lines

This is how I generated the Codec 2 wave files:

~/codec2-dev/build_linux//src/c2enc 3200 9.wav - | ~/codec2-dev/build_linux/src/c2dec 3200 - - | sox -t raw -r 8000 -s -2 - 9_codec2_3200.wav


DVSI AMBE sample page

DSP Innovations, MELPe samples. Can anyone provide me with TWELP samples from these guys? I couldn’t find any on the web that includes the input, uncoded source samples.

Physics of Road Rage

A few days ago while riding my bike I was involved in a spirited exchange of opinions with a gentleman in a motor vehicle. After said exchange he attempted to run me off the road, and got out of his car, presumably with intent to assault me. Despite the surge of adrenaline I declined to engage in fisticuffs, dodged around him, and rode off into the sunset. I may have been laughing and communicating further with sign language. It’s hard to recall.

I thought I’d apply some year 11 physics to see what all the fuss was about. I was in the middle of the road, preparing to turn right at a T-junction (this is Australia remember). While his motivations were unclear, his vehicle didn’t look like an ambulance. I am assuming he as not an organ-courier, and that there probably wasn’t a live heart beating in an icebox on the front seat as he raced to the transplant recipient. Rather, I am guessing he objected to me being in that position, as that impeded his ability to travel at full speed.

The street in question is 140m long. Our paths crossed half way along at the 70m point, with him traveling at the legal limit of 14 m/s, and me a sedate 5 m/s.

Lets say he intended to brake sharply 10m before the T junction, so he could maintain 14 m/s for at most 60m. His optimal journey duration was therefore 4 seconds. My monopolization of the taxpayer funded side-street meant he was forced to endure a 12 second journey. The 8 second difference must have seemed like eternity, no wonder he was angry, prepared to risk physical injury and an assault charge!

Balloon Meets Gum Tree

Today I attended the launch of Horus 38, a high altitude ballon flight carrying 4 payloads, one of which was the latest version of the SSDV system Mark and I have been working on.

Since the last launch, Mark and I have put a lot of work into carefully integrating a rate 0.8 LDPC code developed by Bill, VK5DSP. The coded 115 kbit/s system is now working error free on the bench down to -112dBm, and can transfer a new hi-res image in just a few seconds. With a tx power of 50mW, we estimate a line of site range of 100km. We are now out-performing commercial FSK telemetry chip sets using our open source system.

However disaster struck soon after launch at Mt Barker High School oval. High winds blew the payloads into a tree and three of them were chopped off, leaving the balloon and a lone payload to continue into the stratosphere. One of the payloads that hit the tree was our SSDV, tumbling into a neighboring back yard. Oh well, we’ll have another try in December.

Now I’ve been playing a lot of Kerbal Space Program lately. It’s got me thinking about vectors, for example in Kerbal I learned how to land two space craft at exactly the same point on the Mun (Moon) using vectors and some high school equations of motion. I’ve also taken up sailing – more vectors involved in how sails propel a ship.

The high altitude balloon consists of a latex, helium filled weather balloon a few meters in diameters. Strung out beneath that on 50m of fishing line are a series of “payloads”, our electronic gizmos in little foam boxes. The physical distance helps avoid interference between the radios in each box.

While the balloon was held near the ground, it was keeled over at an angle:

It’s tethered, and not moving, but is acted on by the force of the lift from the helium and drag from the wind. These forces pivot the balloon around an arc with a radius of the tether. If these forces were equal the balloon would be at 45 degrees. Today it was lower, perhaps 30 degrees.

When the balloon is released, it is accelerated by the wind until it reaches a horizontal velocity that matches the wind speed. The payloads will also reach wind speed and eventually hang vertically under the balloon due to the force of gravity. Likewise the lift accelerates the balloon upwards. This is balanced by drag to reach a vertical velocity (the ascent rate). The horizontal and vertical velocity components will vary over time, but lets assume they are roughly constant over the duration of our launch.

Now today the wind speed was 40 km/hr, just over 10 m/s. Mark suggested a typical balloon ascent rate of 5 m/s. The high school oval was 100m wide, so the balloon would take 100/10 = 10s to traverse the oval from one side to the gum tree. In 10 seconds the balloon would rise 5×10 = 50m, approximately the length of the payload string. Our gum tree, however, rises to a height of 30m, and reached out to snag the lower 3 payloads…..

Organic Potato Chips Scam

I don’t keep much junk food in my pantry, as I don’t like my kids eating too much high calorie food. Also if I know it’s there I will invariably eat it and get fat. Fortunately, I’m generally too lazy to go shopping when an urge to eat junk food hits. So if it’s not here at home I won’t do anything about it.

Instead, every Tuesday at my house is “Junk Food Night”. My kids get to have anything they want, and I will go out and buy it. My 17 year old will choose something like a family size meat-lovers pizza with BBQ sauce. My 10 year old usually wants a “slushie”, frozen coke sugar laden thing, so last Tuesday off we went to the local all-night petrol (gas) station.

It was there I spied some “Organic” potato chips. My skeptical “spidey senses” started to tingle…….

Lets break it down from the information on the pack:

OK so they are made from organic grains. This means they are chemically and nutritionally equivalent to scientifically farmed grains but we need to cut down twice as much rain forest to grow them and they cost more. There is no scientifically proven health advantage to organic food. Just a profit advantage if you happen to sell it.

There is nothing wrong with Gluten. Nothing at all. It makes our bread have a nice texture. Humans have been consuming it from the dawn of agriculture. Like most marketing, the Gluten fad is just a way to make us feel bad and choose more expensive options.

And soy is suddenly evil? Please. Likewise dairy it’s a choice, not a question of nutrition. I’ve never met a cow I didn’t like. Especially served medium rare.

Whole grain is good, if the micro-nutrients survive deep frying in boiling oil.

There is nothing wrong with GMO. Another scam where scientifically proven benefits are being held back by fear, uncertainty, and doubt. We have been modifying the genetic material in everything we eat for centuries through selection.

Kosher is a religious choice and has nothing to do with nutrition.

Speaking of nutrition, lets compare the nutritional content per 100g to a Big Mac:

Item Big Mac Organic Chips
Energy 1030 kJ 1996 kJ
Protein 12.5 g 12.5 g
Carbohydrates 17.6 g 66 g
Fat 13.5 g 22.4 g
Sodium 427 mg 343 mg

This is very high energy food. It is exactly this sort of food that is responsible for first world health problems like cardio-vascular disease and diabetes. The link between high calorie snack food and harm is proven – unlike the perceived benefits of organic food. The organic label on these chips is dangerous, irresponsible marketing hype to make us pay more and encourage consumption of food that will hurt us.


Give Us Our Daily Bread – A visit to a modern wheat farm.

Energy Equivalents of a Krispy Kreme Factory – How many homes can you run on a donut?

Binary Telemetry Protocol

Last week I tagged along on a Project Horus balloon launch with Mark, VK5QI. The purpose of this launch was to test a new balloon release and telemetry system that uses the closed source Lora chipset. We had an enjoyable day driving about the Adelaide Hills tracking the balloon, then DF-ing the payload on the ground.

The balloon also flew the RTTY based telemetry system. To receive the RTTY telemetry I used the fsk_horus.m FSK modem developed in October. This modem has near ideal performance in converting radio signals to binary digits. However its performance is limited by the RTTY protocol.

On the way home Mark suggested we fly another balloon in a few days, and we decided to try a new, binary protocol with the ideal modem. A furious two days of coding and integration ensued, but we managed to develop a Horus Layer 2 protocol in C, get it running on the payload, and integrate with the HabHub tracking system.

On Saturday 2 Jan 2016 we launched and it worked really well! Here is a plot of the balloons path:

Even with a very weak signal (we could just hear it on the SSB radios), the binary protocol was pulling packets with valid checksums out of the noise. Here is the telemetry from this sample of the received signal we recorded from the Mt Barker home of VK5FJ:
HORUS,1204,02:57:53,-34.)0819,539.59149 95 6 72,9,-3;,1416 CRC BAD
HORUS,12 5,02858:05,,34.90794,139.59418,9601,71,1,-13,1613 CRC BAD
HORUS,1210,02:59:05,-34 CRC BAD
HORUS,1211,02:59:17,-34.90725,139.61046,9697,74,9,-12,1408 CRC OK
HORUS,1212,02:59:29,-34.90722,139.61319,9714,76,9,-12,1418 fixed
HORUS,1213,02:59:41,-34.90720,139.61592,9729,74,9,-12,1418 fixed

1,1202,02:57:29,-34.908298,139.984512,9556,67,203,-13,156,adbf CRC BAD
1,1204,02:57:53,-34.908089,139.591492,9586,72,9,-13,156,adb3 CRC OK
1,1205,02:58:05,-34.907940,139.594177,9601,71,9,-13,154,999c CRC OK
1,1207,02:58:29,-34.907639,139.599594,9633,74,9,-13,155,9b28 CRC OK
1,1210,02:59:05,-34.907299,139.607758,9683,73,9,-12,154,27eb CRC OK
1,1211,02:59:17,-34.907249,139.610458,9697,74,9,-12,154,8fb9 CRC OK
1,1212,02:59:29,-34.907219,139.613190,9714,76,9,-12,155,a2e8 CRC OK
1,1213,02:59:41,-34.907200,139.615921,9729,74,9,-12,156,b378 CRC OK

The payload is transmitting RTTY and binary packets. The lines starting with “HORUS” come from the RTTY protocol. The lower lines starting with “1” from the new binary protocol. The binary protocol was delivering packets at Eb/No as low as 6dB (SNR in 3000Hz of -9dB) at 100 bit/s.

Here are some packets from the very end of the flight, from a sample provided by VK5EI in Adelaide:
HORUS,2513,07:19:41,-35.12791,140.72295,7992,50,9,-14,1393 CRC OK
HORUS,2514,07:19:53,-35.12800,140.72472,7838,49,9,-14,1386 CRC OK
HORUS,2515,07:20:05,-35.12794,140.72639,7680,43,9,-13,1395 CRC OK
HOR-(SMJJIRKH IKANHS H )12780,140VHIHCN@HHH0,38,9,-13,1400 CRC BAD
HORUS,2517,07 MOEMBA LJ@N HIIS K !72926,738C I SD PLM#! (1 CRC BAD

1,2513,07:19:41,-35.127911,140.722946,7992,50,9,-14,151,f565 CRC OK
1,2514,07:19:53,-35.127998,140.724716,7838,49,9,-14,151,c1ac CRC OK
1,2515,07:20:05,-35.127941,140.726395,7698,43,9,-13,150,0634 CRC BAD
1,2517,07:20:29,-0.000000,26334306.000000,3671,108,1,84,128,a66f CRC BAD

The payload was 300km to the East, and disappearing behind the Mt Lofty ranges as it descended beneath 7700m. Once again RTTY at the top, binary at the bottom. In this case RTTY managed to decode the last packet. The following plot helps explain why:

This is a plot of the output energy from the two FSK filters inside the demodulator. The gap between them is a measure of signal quality or SNR. The x-axis is the time in “bits”, there are 100 bit/s. At the start of this sample, the signal is very clean. Then at about bit 25000 it disappears abruptly into the noise, and by bit 26000 it is gone. One thousand bits is about the time it takes to send one RTTY and one binary packet. Once the signal is gone completely, neither protocol can do much with it.

Overall, a very satisfying result, especially on top of the “ideal” FSK modem development from October. I feel like we are pushing the art of open source telemetry forward. While useful and fun for balloon work, this work has far wider applications, such as IoT.

Protocol Design

Mark provided a packed binary structure for the payload data. I put some thought into the protocol design, carefully considering the use case. This is a lesson I learned from FreeDV – where “voice is not like data”.

I realised that with balloon telemetry data, losing a few packets is OK. When floating along at high altitude the last packet is often very similar to the next one. However it is really important to get some packets through. We don’t want the link to fall over entirely, but can tolerate a high packet error rate.

However when the payload is descending rapidly and close to the ground, reliably receiving packets every few seconds is important. It gives you a good chance of finding the payload on the ground.

The new binary protocol consists of a 16-bit unique word for finding the start of the packet, 176 bits (22 bytes) of binary payload data, which are protected by a (23,11) Golay block code to give a total packet size of 360 bits.

Given we have a few 100 bits of payload data I estimated a Bit Error Rate (BER) of 1E-3 would give us a fair chance of getting a packet through. Add a rate 1/2 code and we can handle a few % BER. Here is the unit test output:
$ gcc horus_l2.c -o horus_l2 -Wall -DHORUS_L2_UNITTEST $ ./horus_l2 test 0: BER: 0.00 ...........: 0
test 1: BER: 0.01 ...........: 0
test 2: BER: 0.05 ...........: 0
test 3: BER: 0.10 ...........: 10

OK, so it’s correcting (0 bit errors after decode) at random BER up to 5%. The Golay (23,11) code can correct 3 errors in a 23 bit codeword which (IIRC) means it falls over at about BER=0.08. This channel is arguably perfect AWGN – a balloon 30km in the air with a line of sight path to our receivers. So random (rather than burst) errors is a reasonable channel model.

Looking at the Eb/No versus BER curves for ideal 2FSK we get a BER of 0.05 at an Eb/No of 6.5dB. So thats where we would expect our binary protocol to fall over. Which is exactly what happened in our tests.

A BER of 1E-3 after FEC decoding is rather high. Its a region where FEC codes don’t work too well. I compared a rate 1/2 convolutional code at the same operating point. It had the same coding gain as the Golay block code (which is just 1.5dB). So might as well use the much simpler block code.

In fact, I am wondering if FEC helps us at all. We may be better off just sending the binary data at half the bit rate, and getting a 3dB increase in our energy/bit. Or send it twice, then combining the received symbols (diversity). More research required.

As I discovered with FreeDV, FEC is not a panacea. Simply slapping FEC onto your system without considering the requirements is naive.

The RTTY protocol has long packets of about 600 hundred bits and almost no protection from bit errors. So we could argue it requires a BER of 1E-3, which is an Eb/No of 10.5dB. This means our binary protocol has a “gain” of around 4dB. I haven’t confirmed this, but suspect most of this gain is from simply having a shorter packet.

However the real world improvement with the binary protocol was significant. There were many times during the cruise phase of the flight where it reliable returned packets when the RTTY protocol experienced problems. So perhaps there are other sources of bit errors that mean a little FEC helps a lot.

The Horus Binary protocol is implemented in a single C file horus_l2.c. Using #defines, the encoder/tx side can be compiled down to a very small module that will run on a tiny 8-bit uC. It can also be compiled to run unit tests, or as the decoder/rx side for the ground station.

Towards Open Source Telemetry

I’m interested in developing an open source telemetry system in 2016, and think we can outperform closed source systems such as Lora because … open source. In October we developed an “ideal” FSK Modem, now we have experience and good results with a protocol. Here is a work flow diagram for the project:

The fsk_horus.m modem needs to be ported to C, converted to fixed point, and then run on a modest uC which will give us a complete, open source telemetry system.

One important step is some simple, low cost, radio hardware. Not a chipset, but our very own open radio hardware. I have prototyped some of the radio already – and received 440MHz signals using a Si5351, NE602, and a few transistors (block diagram above)

Open Source – When Experts Collide

As our balloon was wafting about South Australia I was admiring the HabHub software. Some web developers really know their stuff and now I enjoy the benefits of that. I have no idea how to make nice web sites.

It dawned on me that what Mark and I are doing is applying our expertise to the physical layer of the system – modems and radio hardware. The web developers, smart as they are, would be amazed by our skills in that area. However we can link our code to theirs in a few minutes – no NDAs, no permission required. No road blocks to our innovation.

I keep seeing (and then demonstrating) large gains in modems – HF digital voice, VHF digital voice, and now telemetry. As I explore assumptions (“you can’t violate OSI model layers”, “you must have FEC”, “Chipset XXX is the best”, “you can’t build your own codec/modem/radio hardware”, “DSP must run on custom hardware”) I find many of them misleading or plain wrong.

In real terms – performance – the incumbent closed source systems have been crippled by the fact they are closed source. Then they tell us “you can’t play there”. Wrong.

Even the RF hardware is now “opening up” – I managed to get a prototype telemetry Rx working in a few hours on my bench. It’s not scary when you know how. Just open up the black box and peer inside. Refuse to accept the black box is all there is. Don’t stop until you hit the laws of physics.

Just like nuclear fusion – push together a few domain experts and a great deal of energy is released.

Further Work

Mark pointed out we need the new modem and protocol into a form usable by end users. We are currently piping a bunch of scripts together written in GNU Octave, C, and python. Fantastic for rapid prototyping but the end users need a cross platform GUI application like fldigi or FreeDV.

The current system sends RTTY/Binary packets one after the other. The RTTY packets take 70% of the time. With just the binary protocol we could get 3 times the packet rate which would improve the likelihood of getting valid packets.

An interleaver may also help, for times when there are burst errors. I have tested an initial version but it doesn’t separate the bits enough. More work needed.

It was unclear if some long strings of 1’s and 0’s were upsetting the fsk_horus.m frequency and timing offset estimators. More work needed to determine if this is a real problem. The interleaver would help, and we could always use a scrambler if it turns out to be a real problem.

Halving the bit rate to 50 baud would give us 3dB, and still an acceptable a packet update rate. Using 4FSK rather than 2FSK gives us another 3dB, so in total that’s an easy 6dB gain. 4FSK is possible to generate using more or less the current payload hardware (you might need two GPIOs bits driving a VCO). In a line of sight channel 6dB is double the range. In terms of transmit power that’s like having 4 times the transmit power. That may be “enough”; at 30km altitude the curvature of the Earth may obscure the signal first before you run out of link budget!