WaveNet and Codec 2

Yesterday my friend and fellow open source speech coder Jean-Marc Valin (of Speex and Opus fame) emailed me with some exciting news. W. Bastiaan Kleijn and friends have published a paper called “Wavenet based low rate speech coding“. Basically they take bit stream of Codec 2 running at 2400 bit/s, and replace the Codec 2 decoder with the WaveNet deep learning generative model.

What is amazing is the quality – it sounds as good an an 8000 bit/s wideband speech codec! They have generated wideband audio from the narrowband Codec model parameters. Here are the samples – compare “Parametrics WaveNet” to Codec 2!

This is a game changer for low bit rate speech coding.

I’m also happy that Codec 2 has been useful for academic research (Yay open source), and that the MOS scores in the paper show it’s close to MELP at 2400 bit/s. Last year we discovered Codec 2 is better than MELP at 600 bit/s. Not bad for an open source codec written (more or less) by one person.

Now I need to do some reading on Deep Learning!

Reading Further

Wavenet based low rate speech coding
Wavenet Speech Samples
AMBE+2 and MELPe 600 Compared to Codec 2

Hamburgers versus Oncology

On a similar, but slightly lighter note, this blog was pointed out to me. The subject is high (saturated) fat versus carbohydrate based diets, which is an ongoing area of research, and may (may) be useful in managing diabetes. This gentleman is a citizen scientist (and engineer no less) like myself. Cool. I like the way he using numbers and in particular the way data is presented graphically.

However I tuned out when I saw claims of “using ketosis to fight cancer”, backed only by an anecdote. If you are interested, this claim is throughly debunked on www.sciencebasedmedicine.org.

Bullshit detection 101 – if you find a claim of curing cancer, it’s pseudo-science. If the evidence cited is one persons story (an anecdote) it’s rubbish. You can safely move along. It shows a dangerous leaning towards dogma, rather than science. Unfortunately, these magical claims can obscure useful research in the area. For example exploring a subtle, less sensational effect between a ketogenic diet and diabetes. That’s why people doing real science don’t make outrageous claims without very strong evidence – its kills their credibility.

We need short circuit methods for discovering pseudo science. Otherwise you can waste a lot of time and energy investing spurious claims. People can get hurt or even killed. Takes a lot less effort to make a stupid claim than to prove it’s stupid. These days I can make a call by reading about 1 paragraph, the tricks used to apply a scientific veneer to magical claims are pretty consistent.

A hobby of mine is critical thinking, so I enjoy exploring magical claims from that perspective. I am scientifically trained and do R&D myself, in a field that I earned a PhD in. Even with that background, I know how hard it is to create new knowledge, and how easy it is to fool myself when I want to believe.

I’m not going to try bacon double cheeseburger (without the bun) therapy if I get cancer. I’ll be straight down to Oncology and take the best that modern, evidence based medicine can give, from lovely, dedicated people who have spent 20 years studying and treating it. Hit me with the the radiation and chemotherapy Doc! And don’t spare the Sieverts!

wxWidgets Checkbox Tooltips

I need to post this so that no one else experiences the same pain with wxWidgets (2.9.4). Tooltips weren’t working for me when I hovered over checkboxes. This has bothered me for about 2 years and Google doesn’t seem to throw up a solution.

We are using wxWidgets for FreeDV, as we needed cross platform support. That has actually worked out pretty well, with Linux, Win32, OSX and recently FreeBSD working just fine.

So, the Tooltip problem is something to do with the checkbox being inside a StaticBox. Here is what works and what doesn’t for me:

#define WORKING_CHECKBOX_TOOLTIPS
#ifdef WORKING_CHECKBOX_TOOLTIPS
wxStaticBoxSizer* sbSizer_testFrames;
wxStaticBox *sb_testFrames = new wxStaticBox(this, wxID_ANY, _("Test Frames"));
sbSizer_testFrames = new wxStaticBoxSizer(sb_testFrames, wxVERTICAL);

m_ckboxTestFrame = new wxCheckBox(this, wxID_ANY, _("Enable"), wxDefaultPosition, wxDefaultSize, wxCHK_2STATE);
sb_testFrames->SetToolTip(_("Send frames of known bits instead of compressed voice"));
sbSizer_testFrames->Add(m_ckboxTestFrame, 0, wxALIGN_LEFT, 0);
#else
wxStaticBoxSizer* sbSizer_testFrames;
sbSizer_testFrames = new wxStaticBoxSizer(new wxStaticBox(this, wxID_ANY, _("Test Frames")), wxVERTICAL);

m_ckboxTestFrame = new wxCheckBox(this, wxID_ANY, _("Enable"), wxDefaultPosition, wxDefaultSize, wxCHK_2STATE);
m_ckboxTestFrame->SetToolTip(_("Send frames of known bits instead of compressed voice"));
sbSizer_testFrames->Add(m_ckboxTestFrame, 0, wxALIGN_LEFT, 0);
#endif

Fixing Broken Plastic Fridge Shelves

I have a small Samsung fridge that developed several cracks in the shelves on the door. After a while the cracks spread and one of the shelves became unusable. We couldn’t seem to buy a spare. This was really annoying. I felt like Samsung was telling me to buy a new fridge! So I tried gluing it with epoxy, that lasted a few days then broke. I tried gluing again, this time with reinforcing plastic strips along the cracks. That also only lasted a few days.

Then while rummaging through the garage one day I found this fabric backed tape that I had bought from an auto store to patch up some rips in the upholstery of my Electric Car. So I tried that on the fridge shelf. It’s not pretty but has worked well now for over a month. The shelf feels more robust than ever.

Fox Hunting

This weekend I have been attending the SERG Ham radio convention in Mount Gambier where I have enjoyed the company of some good friends and learnt a lot about fox hunting.

A group of us decided to go 3 weeks ago. The well-connected Mark VK5QI has been scouring Adelaide for fox hunting equipment, including beam mounts, beams, Doppler arrays, sniffers, and mapping software. We spent the last few weekends setting up Andy VK5AKH’s car as our fox hunt vehicle.

I really enjoyed the mechanical side, for example I spent a pleasant Friday afternoon machining an alloy plate with an angle grinder and drill to fit the beam mount. Here I am test fitting on my EV:

My daughter reacted in horror when she saw this – “what have you done to my EV!” (yes – it’s “her” EV now since she started driving).

Three weeks is not enough time to get everything to work properly, but after a lot of hard work by Mark and Andy we were ready to compete in the sniffer hunts and 2m/70cm/23cm fox hunts.

Sniffer Hunt

The sniffer hunt required us to find 10 hidden transmitters on foot in the area around the Valley Lake. I partnered with Mandy VK5FMOO for the hunt:

I really enjoyed the sniffer hunt, especially the outdoor physical side for what can be an otherwise sedentary hobby. Reminded me of playing paint pall or orienteering. I found myself running up and down hills covered with scrub in order to get cross bearings, and getting stuck behind fences after a long walk only to find the wiley fox was on the other side.

I also learnt about radio propagation. Reflections (multipath) can give false bearings that lead you off in the wrong direction. Long distance bearings are usually more reliable. Foxes located high up can be heard a long way away, those at ground level can be hard to find until you are very close. The signals come and go as you move up and down on the terrain. High points in the terrain are a good spot for bearings. Alternate the antenna position between vertical and horizontal to get the best signal and bearing indication. Note the change in signal strength to determine range. Continuously re-tune as you may stumble across one fox while looking for another.

Mandy and I found 5/10 foxes which was a good first effort.

Fox Hunt

I also participated in the car based fox hunt with Andy and Mark. Here they are messing with a Doppler Array:

The fox would torment us by only transmitting for a few seconds every minute, this made it very hard to get a reliable bearing.

Some lessons learned:

A bad bearing at the beginning of the hunt can mean driving off in the wrong direction and losing the fox entirely. This was our fate for the first 2m leg and we had to admit defeat. Multiple bearings from a few different high positions at the beginning of the hunt is a good idea.

Like the sniffer hunts, long distance bearings are more reliable, as you get closer the situation gets confused by multipath. We spent an hour wandering around in circles in a clump of pine plantation:

All our bearings pointed into a 100 square metre chunk of this forest. We drove around and had multiple bearings into the forest. I thought we had that fox bracketed! So we hopped out with the sniffer but just got confusing, low level readings. We spent an hour in there. Eventually we found strong signal in another direction and finally found the fox several km away! It was very well obscured, the antenna taped to a small pine tree and the radio buried at the base of the tree:

The lessons I learned were – pine forest really confuses the bearings, use change in signal strength as an indication of range, and get more long distance bearings.

linux.conf.au (LCA) 2013

Last week I attended lca.conf.au 2013, my sixth LCA. It was a very well organised and enjoyable conference for me. After a few days back I miss it. I have made some good friends at LCA over the years, and catching up with them is as important for me as the conference talks. It also leads to some fascinating “hallway track” talks and lots of bright ideas for new projects.

Codec 2 and FreeDV

This year because of my Codec 2 & FreeDV work I met many people who were involved with Amateur (Ham) Radio. Mark VK5QI, and Josh VK3XJM, set up portable antennas and worked some FreeDV from various sites around the conference. Although I am an author of FreeDV I don’t have an operational HF station to test it on, so it’s an eye opener for me to see it in action.

There is a lot in common between the Open Source and Ham Radio communities, for example experimentation, communication, sharing information, open hardware and software, and the way new comers are welcomed and helped. There are also some contrasts – the average age of Hams and Linux users are several decades apart and the majority of Hams use Windows.

I can see a lot of benefit in bringing the two groups together. Linux users are fascinated with radio, and Hams can benefit from open source.

I spoke on Open Source Digital Radio (Open Office slides and OGV video and MP4 video). I had some good feedback on my explanation of Codec 2, which is based on this Octave Script which I run during the presentation. The script has buttons to allow flipping between the time domain, frequency domain, and harmonic sample views. It allows single stepping through frames to create an animated effect. Watch the video to see how it comes together.

As I described last year there is an art to presenting a deeply technical topic (like speech coding) to an audience without specific domain knowledge. I want the talk to be interesting, comprehensible, and to send each member of audience away with 3 pieces of new knowledge. So I vary each presentation, and take care to observe what works and what doesn’t.

I was also involved with a great presentation by Joel Stanley. Joel is running FreeDV on an Android phone, using a homebrew Double Sideband (DSB) receiver.

Several people approached me after Joel’s presentation and commented on how they enjoyed Joel’s simple explanation of how radio receivers work. I noticed that Linux users are naturally interested in radio, and how things work in general. So a good approach for an engaging talk at LCA is to explain how technology used on the periphery of Linux works. Demystify it, make it less of a black box for the smart, but not domain aware LCA audience.

I would like to make a special thank you to Mark, V5KQI, who operated a radio transmitter so Joel and I could demonstrate FreeDV at LCA. Mark has also been very helpful with FreeDV testing and the development of Joel’s FreeDV on Android project.

Given the strong interest in radio topics, a conversation with Tridge lead to the idea of a radio miniconf for LCA 2014. Some possible topics:

  • Get your Foundation license at lca.conf.au
  • GNU radio tutorial
  • FreeDV
  • Build (solder) a SDR radio kit for the Ham, CB, or ISM bands.

Keynotes

A very good set of keynotes (available to view on line). They showed some really tough problems where progress were unfortunately being blocked by human nature. For example:

  • Bdale Garbee covered (among other things) the lack of adoption of Linux on the desktop. One of the main reasons is that companies selling Windows PCs in high volume actually get income from Windows and demo-ware. Windows is a profit centre, not a cost. So selling a PC with Linux installed actually loses money!
  • Radia Perlman spoke about networking protocols, and they are standardised. A key take away was that standards we consider to be sacred are often handed down by committees behaving like “drunken sports fans”! A funny and engaging talk.

Here is nice picture I took during the Speakers Dinner at the top of the Telstra Tower overlooking Canberra. LCA really does treat it’s speakers nicely:

HTML Bison Adventure

My 14 year old son William likes animals and his favourite is the North American Bison. He recently had to prepare a story for school. I showed him how he could use HTML to write the story, pull in images from the Internet, and put options on each page for “what do you want to do next” that linked to different narratives. It was a good way for him to learn HTML and he scored an “A”. Here is the story.

linux.conf.au (LCA) 2012

Well, it’s that time of year again – my annual geek-week at linux.conf.au. Every day there were many interesting talks to attend and many I had to miss. I am currently catching the ones I missed by watching the LCA 2012 videos.

Keynotes and Open Source DSP

The Bruce Perens keynote had many good points on why open source is becoming essential to our security and well-being in the 21st century. These themes were expanded by the other keynote speakers. Bruce stated that “open software is the only credible producer of software”, we can choose to be “slaves to tools or the people who control the tools”. Watch the talk for more information on these memes.

I am interested in “the art” of presentation (as distinct from the content) so was also interested in Bruce’s presentation style from that perspective. He appeared on stage in a suit, which is uncommon in a geeky crowd. It raised a few eyebrows. However Bruce then explained that “a suit at LCA” was a theatrical device to underscore the point that we, as an open source community, should be facing outward. Open source communities are good at talking to people within our community, but our image outside of that community is poor and misunderstood (e.g. not many people understand their email is relayed via open source, or how it helps security problems and can help preserve democracy). Our external image must be improved.

Another great point by Bruce was how Open Source is now solving tough, previously opaque problems that were traditionally considered too hard due to patents or specialised knowledge. All you need is one guy to really get into and understand the problem. Suddenly the voodoo evaporates. People then know it’s possible – a problem that our peers have solved always appears easier. This one guy publishes source and shares what he has learnt. Others start to hack the code. Bruce, much to my embarrassment (!), actually cited my DSP work as an example, for example Oslec for echo cancellation and projects like Codec 2. When Oslec started I had many people tell me “it can’t be done”, “you need a DSP chip to do it in hardware”, or “it’s all covered by patents”. The truth is that Open Source DSP algorithms are now out performing closed source competitors. For example the Opus guys have developed a world-beating open source audio and voice codec. More on that below.

Actually I really enjoyed all of the keynotes. Paul Fenwick spoke on how our mind works, including topics like “decoy choices” and the “planning fallacy”, and how playing Tetris can help with traumatic experiences. I also recommend you watch the keynotes by Jacob Applebaum and Karen Sandler.

It was great to see Jean-Marc Valin and Timothy B. Terriberry in person, presenting on the Opus Codec. I wonder if this will be the “last” audio codec. It’s open source, royalty and patent free, will be an IETF standard, codes speech and music signals from 6,000 bit/s up, and outperforms other codecs like MP3.

Codec 2 talk

As I mentioned above I am interested in how conference presentations work. Like a lot of my work, I am inclined to experiment. Try something different. Hack it. A presentation on Codec 2, like many presentations at LCA, has a strong technical component that is hard for the average LCA attendee to follow. My Codec 2 work is an extension of my PhD research in speech coding. It took me 3 years to get my head around speech coding for the PhD. So how do I communicate Codec 2 topics to a smart, but non-speech-coding aware audience?

One way to handle this is “tutorial” style. You spend about half the talk bringing people up to speed on your technical topic. Enough to explain in the second half how you applied Linux or open source to this field. This is a common approach at LCA. It can work well, but also means a lot to absorb for the audience. This can be a challenge after a day (let alone a week) of great ideas and intellectual stimulation at a conference like LCA.

Instead of the tutorial approach I hit on a different idea. Rather than confine my talk to Codec 2 and DSP theory, I tried approaching Speech Coding from a variety of tangential topics that matter to LCA attendees. I talked about codec patents, how Ham radio relates to Open Source, and finally a really easy to grasp graphical explanation of how the sinusoidal model used in Codec 2 works. I left out a bunch of DSP topics, and didn’t even put up a block diagram of the codec. I wanted the audience to walk away knowing 3 or 4 things about speech coding really well, rather than try to cover the entire, technically deep, acronym rich subject at a shallow level.

This worked well. Really well in fact – my Codec 2 talk was voted the best of the conference and I was asked to repeat it later in the week. Wow! This was especially amazing for me as the voting is done by the attendees. A nice way to start 2012 for me, after working through some tough personal issues in 2011. Here are the slides for my Codec 2 LCA 2012 talk in Open Office format.

How Good Was Your Conference Talk?

It is important to me that my talks are well received. For me it’s part of my job and I take it seriously. Here is what I look for. This applies mainly to conferences with multiple parallel threads where people have a choice in what they attend:

  • Did I fill the room (or nearly so)?
  • Were people still asking questions at the end of your allocated time? Extra points if people come up to you afterwards and ask more questions. Even better if you get hustled off by the conference organisers because the next speaker is overdue to start.
  • Did some of the more popular speakers/major contributors to the conference attend the talk?
  • Was the applause loud and enthusiastic?

Oh, and I also like to make my talks short and leave more time than usual for questions. For example in a 50 minute slot I will time my talk to be 30 minutes rather than the nominal 40, allowing 20 minutes for questions. I feel strongly that the audience should drive a good chunk of the talk through their questions. This feels much better to me than running over time and not allowing enough time for questions.

21 Second Lightning Talk

Lightning talks are a fun part of LCA. These usually last 5 minutes. I had an idea for a lightning talk on my Electric Car that I wanted to try. I figured I could get my talk done in 10-15 seconds. Yes, I was experimenting with presentation styles again. I wanted to use lots of slides connected with just a few words (normally we are encouraged to do it the other way around). This year I managed to get a lightning talk slot and presented the talk. You can see it on the lightning talk video starting at 50:20. From when I start to when I stop talking is 21 seconds, not quite the sub-15 seconds I was aiming at. I always was a bit talkative. However the applause was pretty loud so I think the idea worked!

Codec 2 Hacking

While at LCA Jean-Marc did some great LSP vector quantiser work for Codec 2 and explained some of the techniques involved. This was very useful, and will be part of Codec 2 soon. Thanks Jean-Marc!

At the end of the conference Bruce Perens (left), Timothy and Jean-Marc (right), came to stay for a few days at Nuria’s (centre) house. It was really nice to have them all, we did some good work on Codec 2, and the dinner-time conversation (fuelled by Nuria’s fine lasagne and BBQ) was fascinating. As Bruce pointed out, there is great value in the small number of open source speech coding guys meeting face to face.

I was a bit nervous travelling in the same car with Jean-Marc and Timothy. People working on open source voice codecs are rare – so we figured we had 60% of the world’s open source codec guys in one car!

My IP04 hacked and SIP ALG

Well this is a bit embarrasing. I make and sell embedded Asterisk boxes. My IP04 has been hacked! Some one made a bunch of calls to Guyana, among other places:

Call Summary

Destination Calls Amount
Algeria – Mobile Orascom 1 $0.39
Algeria – Mobile Wataniya 1 $0.59
Australia – 13/1300 2 $0.52
Australia – Adelaide 14 $1.82
Australia – Mobile 2 $3.18
Cape Verde – Mobile 1 $0.34
Cayman Is – Mobile C&W 1 $0.21
Dominica 1 $0.27
East Timor 1 $0.50
Guyana – Mobile 47 $16.91
Ireland – VOIP 1 $0.11

I hadn’t taken any special secuity precautions with this IP04 as it’s behind a NAT broadband router. It’s used for development and testing on my LAN and mesh networks so I don’t want it too restricted. However it’s also used for VOIP calls to the outside world, so has a SIP connection to Jazmin Commnications, my ITSP.

Mike at Jazmin caught the attack early and disabled my account. I was interstate at the time and couldn’t reach the box remotely. So I asked my daughter to power down my entire home network, just on case my whole LAN was compromised. A few days later I returned home and started looking into the problem.

I poked around the GUI of my (nearly new) NetComm NB6Plus4W router. This was supplied by Internode, one of the most reputable ISPs here in Australia. One possibility was the “SIP ALG” option under the “Advanced” options:

A bit of Googling on SIP ALG didn’t seem to suggest it was a huge security issue. However several people suggested poor implementations of SIP ALG can break SIP. It was on by default when I set up the router, so I hadn’t touched it. And I really needed to find and if possible reproduce the attack before re-enabling my account to minimise the possibility of more abuse. I was running out of ideas so I phoned Mike. He suggested using sipvicious to investigate the problem. I installed sipvicious on a Linux box on my LAN to get a feel for it, and tried a few commands from the getting started notes.

Then I ran sipvicious on a remote Linux Box. It managed to detect my IP04, even though it was behind the firewall (note 121.45.13.78 is a fictional IP):
ubuntu@ip:~/david/sipvicious$ ./svmap.py 121.45.13.78
| 121.45.13.78:5060 | Asterisk PBX | Asterisk |

The svwar and svcrack tools didn’t work for me (they couldn’t find and crack my SIP user accounts) but svmap told me enough: the SIP 5060 port on my Asterisk box was visible to anyone on the Internet!

I tried disabling the router SIP ALG option and straight away svmap showed the security hole was gone:
ubuntu@ip:~/david/sipvicious$ ./svmap.py 121.45.13.78
WARNING:root:found nothing

If I re-enabled SIP-ALG nothing happened, but when I rebooted the router the problem returned.

I also had another (well actually several) security problems. In /etc/asterisk/sip.conf I had “allowguest=no” commented out. This meant that anonymous people could make “guest” SIP calls, with no authentication at all. Great when I am messing around with Mesh Potatoes and want to set something up fast but not so clever when my IP04 is wide open to the Internet.

But I wanted to find the “smoking gun” – could some one really make a call through the open 5060 port? I needed a command line tool to make calls from the remote Linux box. So I used what I know – another Asterisk instance running on the remote Linux box.

I added some dialplan to try to call my IP04 as the guest user:
[default]
exten => 4000,1,Dial(SIP/121.45.13.78/6004)

Then from the Asterisk CLI:

ip*CLI> console dial 4000

and “ring ring” when a phone connected to my IP04! Ouch! Uncommenting “allowguest=no” and a “SIP reload” stoppped guest calls. However I have to admit – the main protection I am relying on is the firewall, now working properly since disabling SIP ALG.

Writing as Therapy

It’s been a long time since I last blogged. I have been dealing with some big personal problems since March and haven’t had time for technical work or the will to blog on anything. I have however been doing a lot of private writing or journalling. So I would like to talk a little about writing as therapy.

So far I have written 73,000 words in 5 months in a private journal. According to the wikipedia entry on word count that is nearly enough words for two full length novels.

Every day I open a text editor and write about what has been going on, what I have been worrying about, and in particular – how I have felt. Some times I combine the writing with exercise, like riding my bike to a cafe with my laptop to write and think. If I don’t have my laptop I jot down a few notes on paper as thoughts enter my mind, then type them up later.

I think it helps. Perhaps getting the thoughts down means getting them out of your head. It makes you express the ideas clearly, rather than half formed thoughts. It’s really interesting to go back a few months, read what you wrote, and see your thoughts and emotions evolving.

Another useful technique is writing an email that you don’t send. When there is a lot of tension it can be really difficult to write an email. It’s easy to tie yourself in knots trying to get the wording right. It’s hard to write while trying to avoid offense or unnecessary hurt. But the problem is you really need to express the bad stuff. Unfortunately if you say what you really feel, or even just mess up the wording, it can make the situation much worse.

So I have gotten into the habit of writing what I really feel, then not sending the email. Sometimes it helps to print it, or just save it as a draft. Tip: remove the intended recipient from the “To:” box – avoids embarrassing accidents.