Microphone Placement and Speech Codecs

This week I have been looking at the effect different speech samples have on the performance of Codec 2. One factor is microphone placement. In radio (from broadcast to two way HF/VHF) we tend to use microphones closely placed to our lips. In telephony, hands free, or more distance microphone placement has become common.

People trying FreeDV over the air have obtained poor results from using built-in laptop microphones, but good results from USB headsets.

So why does microphone placement matter?

Today I put this question to the codec2-dev and digital voice mailing lists, and received many fine ideas. I also chatted to such luminaries as Matt VK5ZM and Mark VK5QI on the morning drive time 70cm net. I’ve also been having an ongoing discussion with Glen, VK1XX, on this and other Codec 2 source audio conundrums.

The Model

A microphone is a bit like a radio front end:

We assume linearity (the microphone signal isn’t clipping).

Imagine we take exactly the same mic and try it 2cm and then 50cm away from the speakers lips. As we move it away the signal power drops and (given the same noise figure) SNR must decrease.

Adding extra gain after the microphone doesn’t help the SNR, just like adding gain down the track in a radio receiver doesn’t help the SNR.

When we are very close to a microphone, the low frequencies tend to be boosted, this is known as the proximity effect. This is where the analogy to radio signals falls over. Oh well.

A microphone 50cm away picks up multi-path reflections from the room, laptop case, and other surfaces that start to become significant compared to the direct path. Summing a delayed version of the original signal will have an impact on the frequency response and add reverb – just like a HF or VHF radio signal. These effects may be really hard to remove.

Science in my Lounge Room 1 – Proximity Effect

I couldn’t resist – I wanted to demonstrate this model in the real world. So I dreamed up some tests using a couple of laptops, a loudspeaker, and a microphone.

To test the proximity effect I constructed a wave file with two sine waves at 100Hz and 1000Hz, and played it through the speaker. I then sampled using the microphone at different distances from a speaker. The proximity effect predicts the 100Hz tone should fall off faster than the 1000Hz tone with distance. I measured each tone power using Audacity (spectrum feature).

This spreadsheet shows the results over a couple of runs (levels in dB).

So in Test 1, we can see the 100Hz tone falls off 4dB faster than the 1000Hz tone. That seems a bit small, could be experimental error. So I tried again with the mic just inside the speaker aperture (hence -1cm) and the difference increased to 8dB, just as expected. Yayyy, it worked!

Apparently this effect can be as large as 16dB for some microphones. Apparently radio announcers use this effect to add gravitas to their voice, e.g. leaning closer to the mic when they want to add drama.

Im my case it means unwanted extra low frequency energy messing with Codec 2 with some closely placed microphones.

Science in my Lounge Room 2 – Multipath

So how can I test the multipath component of my model above? Can I actually see the effects of reflections? I set up my loudspeaker on a coffee table and played a 300 to 3000 Hz swept sine wave through it. I sampled close up and with the mic 25cm away.

The idea is get a reflection off the coffee table. The direct and reflected wave will be half a wavelength out of phase at some frequency, which should cause a notch in the spectrum.

Lets take a look at the frequency response close up and at 25cm:

Hmm, they are both a bit of a mess. Apparently I don’t live in an anechoic chamber. Hmmm, that might be handy for kids parties. Anyway I can observe:

  1. The signal falls off a cliff at about 1000Hz. Well that will teach me to use a speaker with an active cross over for these sorts of tests. It’s part of a system that normally has two other little speakers plugged into the back.
  2. They both have a resonance around 500Hz.
  3. The close sample is about 18dB stronger. Given both have same noise level, that’s 18dB better SNR than the other sample. Any additional gain after the microphone will increase the noise as much as the signal, so the SNR won’t improve.

OK, lets look at the reflections:

A bit of Googling reveals reflections of acoustic waves from solid surfaces are in phase (not reversed 180 degrees). Also, the angle of incidence is the same as reflection. Just like light.

Now the microphone and speaker aperture is 16cm off the table, and the mic 25cm away. Couple of right angle triangles, bit of Pythagoras, and I make the reflected path length as 40.6cm. This means a path difference of 40.6 – 25 = 15.6cm. So when wavelength/2 = 15.6cm, we should get a notch in the spectrum, as the two waves will cancel. Now v=f(wavelength), and v=340m/s, so we expect a notch at f = 340*2/0.156 = 1090Hz.

Looking at a zoomed version of the 25cm spectrum:

I can see several notches: 460Hz, 1050Hz, 1120Hz, and 1300Hz. I’d like to think the 1050Hz notch is the one predicted above.

Can we explain the other notches? I looked around the room to see what else could be reflecting. The walls and ceiling are a bit far away (which means low freq notches). Hmm, what about the floor? It’s big, and it’s flat. I measured the path length directly under the table as 1.3m. This table summarises the possible notch frequencies:

Note that notches will occur at any frequency where the path difference is half a wavelength, so wavelength/2, 3(wavelength)/2, 5(wavelength)/2…..hence we get a comb effect along the frequency axis.

OK I can see the predicted notch at 486Hz, and 1133Hz, which means the 1050 Hz is probably the one off the table. I can’t explain the 1300Hz notch, and no sign of the predicted notch at 810Hz. With a little imagination we can see a notch around 1460Hz. Hey, that’s not bad at all for a first go!

If I was super keen I’d try a few variations like the height above the table and see if the 1050Hz notch moves. But it’s Friday, and nearly time to drink red wine and eat pizza with my friends. So that’s enough lounge room acoustics for now.

How to break a low bit rate speech codec

Low bit rate speech codecs make certain assumptions about the speech signal they compress. For example the time varying filter used to transmit the speech spectrum assumes the spectrum varies slowly in frequency, and doesn’t have any notches. In fact, as this filter is “all pole” (IIR), it can only model resonances (peaks) well, not zeros (notches). Codecs like mine tend to fall apart (the decoded speech sounds bad) when the input speech violates these assumptions.

This helps explain why clean speech from a nicely placed microphone is good for low bit rate speech codecs.

Now Skype and (mobile) phones do work quite well in “hands free” mode, with rather distance microphone placement. I often use Skype with my internal laptop microphone. Why is this OK?

Well the codecs used have a much higher bit rate, e.g. 10,000 bit/s rather than 1,000 bits/s. This gives them the luxury to employ codecs that can, to some extent, code arbitrary waveforms as well as speech. These employ algorithms like CELP that use a hybrid of model based (like Codec 2) and waveform based (like PCM). So they faithfully follow the crappy mic signal, and don’t fall over completely.

Self Driving Cars

I’m a believer in self driving car technology, and predict it will have enormous effects, for example:

Our cars currently spend most of the time doing nothing. They could be out making money for us as taxis while we are at work.
How much infrastructure and frustration (home garage, driveways, car parks, finding a park) do we . . . → Read More: Self Driving Cars

FreeDV Robustness Part 6 – Early Low SNR Results

Anyone who writes software should be sentenced to use it. So for the last few days I’ve been radiating FreeDV 700 signals from my home in Adelaide to this websdr in Melbourne, about 800km away. This has been very useful, as I can sample signals without having to bother other Hams. Thanks John! . . . → Read More: FreeDV Robustness Part 6 – Early Low SNR Results

8 Mega Watts in your bare hands

I recently went on a nice road trip to Gippstech, an interstate Ham radio conference, with Andrew, VK5XFG. On the way, we were chatting about Electric Cars, and how much of infernal combustion technology is really just a nasty hack. Andrew made the point that if petrol cars had been developed now, we would . . . → Read More: 8 Mega Watts in your bare hands

Trellis Decoding for Codec 2

OK, so FreeDV 700 was released a few weeks ago and I’m working on some ideas to improve it. Especially those annoying R2D2 noises due to bit errors at low SNRs.

I’m trying some ideas to improve the speech quality without the use of Forward Error Correction (FEC).

Speech coding is the art of “what can I . . . → Read More: Trellis Decoding for Codec 2

WTF Internal Combustion?

At the moment I’m teaching my son to drive in my Electric Car. Like my daughter before him it’s his first driving experience. Recently, he has started to drive his grandfathers pollution generator, which has a manual transmission. So I was trying to explain why the clutch is needed, and it occurred to . . . → Read More: WTF Internal Combustion?

FreeDV Robustness Part 5 – FreeDV 700

We’ve just released FreeDV v0.98 GUI software, which includes the new FreeDV 700 mode. This new mode has poorer speech quality than FreeDV 1600 but is far more robust, close to SSB on low SNR fading HF channels. Mel Whitten and the test team have made contacts over 1000 km using just 1 Watt!

You can . . . → Read More: FreeDV Robustness Part 5 – FreeDV 700

New Charger for my EV

On Sunday morning I returned home and plugged in my trusty EV to feed it some electrons. Hmm, something is wrong. No lights on one of the chargers. Oh, and the charger circuit breaker in the car has popped. Always out for adventure, and being totally incompetent at anything above 5V and 1 . . . → Read More: New Charger for my EV

Lower SNR limit of Digital Voice

I’m currently working on a Digital Voice (DV) mode that will work at negative SNRs. So I started thinking about where the theoretical limits are:

Lets assume we have a really good rate 0.5 FEC code that approaches the Shannon Limit of perfectly correcting random bit errors up to a channel BER of 12%
A real-world code . . . → Read More: Lower SNR limit of Digital Voice

SM1000 Part 13 – Shipping!

The enclosure has arrived from the new manufacturer! Edwin and team at Dragino are now assembling, testing, and shipping the first batch of 100 SM1000s. We plan to ship all Aliexpress pre-orders in week starting 3 May, Australian orders the week starting 10 May.

We have sold almost all of the first batch . . . → Read More: SM1000 Part 13 – Shipping!