Microphone Placement and Speech Codecs

This week I have been looking at the effect different speech samples have on the performance of Codec 2. One factor is microphone placement. In radio (from broadcast to two way HF/VHF) we tend to use microphones closely placed to our lips. In telephony, hands free, or more distance microphone placement has become common.

People trying FreeDV over the air have obtained poor results from using built-in laptop microphones, but good results from USB headsets.

So why does microphone placement matter?

Today I put this question to the codec2-dev and digital voice mailing lists, and received many fine ideas. I also chatted to such luminaries as Matt VK5ZM and Mark VK5QI on the morning drive time 70cm net. I’ve also been having an ongoing discussion with Glen, VK1XX, on this and other Codec 2 source audio conundrums.

The Model

A microphone is a bit like a radio front end:

We assume linearity (the microphone signal isn’t clipping).

Imagine we take exactly the same mic and try it 2cm and then 50cm away from the speakers lips. As we move it away the signal power drops and (given the same noise figure) SNR must decrease.

Adding extra gain after the microphone doesn’t help the SNR, just like adding gain down the track in a radio receiver doesn’t help the SNR.

When we are very close to a microphone, the low frequencies tend to be boosted, this is known as the proximity effect. This is where the analogy to radio signals falls over. Oh well.

A microphone 50cm away picks up multi-path reflections from the room, laptop case, and other surfaces that start to become significant compared to the direct path. Summing a delayed version of the original signal will have an impact on the frequency response and add reverb – just like a HF or VHF radio signal. These effects may be really hard to remove.

Science in my Lounge Room 1 – Proximity Effect

I couldn’t resist – I wanted to demonstrate this model in the real world. So I dreamed up some tests using a couple of laptops, a loudspeaker, and a microphone.

To test the proximity effect I constructed a wave file with two sine waves at 100Hz and 1000Hz, and played it through the speaker. I then sampled using the microphone at different distances from a speaker. The proximity effect predicts the 100Hz tone should fall off faster than the 1000Hz tone with distance. I measured each tone power using Audacity (spectrum feature).

This spreadsheet shows the results over a couple of runs (levels in dB).

So in Test 1, we can see the 100Hz tone falls off 4dB faster than the 1000Hz tone. That seems a bit small, could be experimental error. So I tried again with the mic just inside the speaker aperture (hence -1cm) and the difference increased to 8dB, just as expected. Yayyy, it worked!

Apparently this effect can be as large as 16dB for some microphones. Apparently radio announcers use this effect to add gravitas to their voice, e.g. leaning closer to the mic when they want to add drama.

Im my case it means unwanted extra low frequency energy messing with Codec 2 with some closely placed microphones.

Science in my Lounge Room 2 – Multipath

So how can I test the multipath component of my model above? Can I actually see the effects of reflections? I set up my loudspeaker on a coffee table and played a 300 to 3000 Hz swept sine wave through it. I sampled close up and with the mic 25cm away.

The idea is get a reflection off the coffee table. The direct and reflected wave will be half a wavelength out of phase at some frequency, which should cause a notch in the spectrum.

Lets take a look at the frequency response close up and at 25cm:

Hmm, they are both a bit of a mess. Apparently I don’t live in an anechoic chamber. Hmmm, that might be handy for kids parties. Anyway I can observe:

  1. The signal falls off a cliff at about 1000Hz. Well that will teach me to use a speaker with an active cross over for these sorts of tests. It’s part of a system that normally has two other little speakers plugged into the back.
  2. They both have a resonance around 500Hz.
  3. The close sample is about 18dB stronger. Given both have same noise level, that’s 18dB better SNR than the other sample. Any additional gain after the microphone will increase the noise as much as the signal, so the SNR won’t improve.

OK, lets look at the reflections:

A bit of Googling reveals reflections of acoustic waves from solid surfaces are in phase (not reversed 180 degrees). Also, the angle of incidence is the same as reflection. Just like light.

Now the microphone and speaker aperture is 16cm off the table, and the mic 25cm away. Couple of right angle triangles, bit of Pythagoras, and I make the reflected path length as 40.6cm. This means a path difference of 40.6 – 25 = 15.6cm. So when wavelength/2 = 15.6cm, we should get a notch in the spectrum, as the two waves will cancel. Now v=f(wavelength), and v=340m/s, so we expect a notch at f = 340*2/0.156 = 1090Hz.

Looking at a zoomed version of the 25cm spectrum:

I can see several notches: 460Hz, 1050Hz, 1120Hz, and 1300Hz. I’d like to think the 1050Hz notch is the one predicted above.

Can we explain the other notches? I looked around the room to see what else could be reflecting. The walls and ceiling are a bit far away (which means low freq notches). Hmm, what about the floor? It’s big, and it’s flat. I measured the path length directly under the table as 1.3m. This table summarises the possible notch frequencies:

Note that notches will occur at any frequency where the path difference is half a wavelength, so wavelength/2, 3(wavelength)/2, 5(wavelength)/2…..hence we get a comb effect along the frequency axis.

OK I can see the predicted notch at 486Hz, and 1133Hz, which means the 1050 Hz is probably the one off the table. I can’t explain the 1300Hz notch, and no sign of the predicted notch at 810Hz. With a little imagination we can see a notch around 1460Hz. Hey, that’s not bad at all for a first go!

If I was super keen I’d try a few variations like the height above the table and see if the 1050Hz notch moves. But it’s Friday, and nearly time to drink red wine and eat pizza with my friends. So that’s enough lounge room acoustics for now.

How to break a low bit rate speech codec

Low bit rate speech codecs make certain assumptions about the speech signal they compress. For example the time varying filter used to transmit the speech spectrum assumes the spectrum varies slowly in frequency, and doesn’t have any notches. In fact, as this filter is “all pole” (IIR), it can only model resonances (peaks) well, not zeros (notches). Codecs like mine tend to fall apart (the decoded speech sounds bad) when the input speech violates these assumptions.

This helps explain why clean speech from a nicely placed microphone is good for low bit rate speech codecs.

Now Skype and (mobile) phones do work quite well in “hands free” mode, with rather distance microphone placement. I often use Skype with my internal laptop microphone. Why is this OK?

Well the codecs used have a much higher bit rate, e.g. 10,000 bit/s rather than 1,000 bits/s. This gives them the luxury to employ codecs that can, to some extent, code arbitrary waveforms as well as speech. These employ algorithms like CELP that use a hybrid of model based (like Codec 2) and waveform based (like PCM). So they faithfully follow the crappy mic signal, and don’t fall over completely.

Thanks

In Sep 2014 I had some interesting discussions around the effect of microphones, small speakers, and speech samples with Mike, OH2FCZ, who has is an audio professional. Thanks Mike!

5 thoughts on “Microphone Placement and Speech Codecs”

  1. Remembering my community radio days, the rule of thumb was to have a hands width between you and the mic, i.e. 10 cm-ish / 4 in-ish.
    From what I recall, the main rationale was that the curve was decently flat at that distance (i.e. easily resolvable with the mixing console EQ), but also, spit accumulation on the mic tended to be much less, leading to longer lasting mics.
    Maybe that rule could be applicable to speech codecs?

    1. Teodor, speak across the microphone rather than into it. That helps spit accumulation and condensation. It also gives you a better audio spectrum to work with. An “Old Hand” taught me that one long long ago and far far away – or something like that.

      {^_^} Joanne/W6MKU

      1. I actually use both – mic element on side, roughly a hands width away from the mouth – when I do ham radio. You do need to know where the mic is located in your PTT tho 😛
        Provided the mic gain of the transmitter is okay, I just about always come across well modulated.

  2. Hi David and all,

    First. I am working a proper response to your earlier challenge :).

    Second. Microphones….

    Carbon microphones [old telephones] give a MUCH nicer communications experience. They do real-time compression and so make weak sounds MUCH easier to understand and loud sounds MUCH less annoying. I have never measured the typical compression achieved, but this effect whether applied before or after the speech has passed through whatever channel is IMHO worth some study and will most probably make the channel seem much better.

    The speech captured from the side of the mouth is MUCH easier to understand than the speech from the front of the mouth.

    Thus a headset with a noise cancelling mike gives great sounding speech.

    We make special purpose telephones, so my personal telephone has both noise in both ears and a noise cancelling mike.

    I made a simple adapter to allow a dual headset with noise cancelling mike to be plugged into two different cell phones. Even given the lousy GSM encoding the result sounds pretty good. Not as good as G711 but MUCH better than the built-in mike and earpiece. One of my cell phones even has a proper treatment of side tone [hearing a little of your own speech in the headset.]

    Lots of fun.

    John

  3. Thanks for those comments. Yes I speak across PTT mics, can’t remember why!

    I think all of you have suggested ways to reduce the proximity effect. Placing the mic to the side will result in unequal sound amplitude across the surface of the mic, which I understand is required for the proximity effect (see Wikipedia article).

    Side placement will also avoid low frequency, high pressure air that might push the microphone into a non-linear region, and any aerosols like saliva.

    John – your experience with your companies products sounds pretty useful in this area :-)

    – David

Comments are closed.