A common task in speech coding is to take a real (floating point) number and quantise it to a fixed number of bits for sending over the channel. For Codec 2 a good example is the energy of the speech signal. This is sampled at a rate of 25Hz (once every 40ms) and quantised to 5 bits.
Here is an example of a 3 bit quantiser that can be used to quantise a real number in the range 0 to 1.0:
The quantiser has 8 levels and a step size of 0.125 between levels. This introduces some quantisation “noise”, as the quantiser can’t represent all input values exactly. The quantisation noise reduces as the number of bits, and hence number of quantiser levels, increases. Every additional bit doubles the number of levels, so halves the step size between each level. This means the signal to noise ratio of the quantiser increases by 6dB per bit.
We use a modem to send the bits over the channel. Each bit is usually allocated the same transmit power. In poor channels, we get bit errors when the noise overcomes the signal and a 1 turns into a 0 (or a 0 into a 1). These bit errors effectively increases the noise in the decoded value, and therefore reduce the SNR. We now have errors from the quantisation process and bit errors during transmission over the channel.
However not all bits are created equal. If the most significant bit is flipped due to an error (say 000 to 100), the decoded value will be changed by 0.5. If there is an error in the least significant bit, the change will be just 0.125. So I decided to see what would happen if I allocated a different transmit power to each bit. I chose the 5 bits used in Codec 2 to transmit the speech energy. I wrote some Octave code to simulate passing these 5 bits through a simple BPSK modem at different Eb/No values (Eb/No is proportional to the the SNR of a radio channel, which is different to the SNR of the quantiser value).
I ran two simulations, first a baseline simulation where all bits are transmitted with the same power. The second simulation allocates more power to the more significant bits. Here are the amplitudes used for the BPSK symbol representing each bit. The power of each bit is the amplitude squared:
Both simulations have the same total power for each 5 bit quantised value (e.g 1*1 + 1*1 + 1*1 + 1*1 + 1*1 = 5W). Here are some graphs from the simulation. The first graph shows the Bit Error Rate (BER) of the BPSK modem. We are interested in the region on the left, where the BER is higher than 10%.
The second graph shows the quantiser SNR performance for the baseline and variable power schemes. At high BER the variable power scheme is about 6dB better than the baseline.
The third figure shows the histograms of the quantiser errors for Eb/No = -2dB. The middle bar on both histograms is the quantisation noise, which is centred around zero. The baseline quantiser has lots of large errors (outliers) due to bit errors, however the variable power scheme has more smaller errors near the centre, where (hopefully) it has less impact on the decoded speech.
The final figure shows a time domain plot of the errors for the two schemes. The baseline quantiser has more large value errors, but a small amount of noise when there are no errors. The variable power scheme look a lot nicer, but you can see the amplitude of the smaller errors is higher than the baseline.
I used the errors from the simulation to corrupt the 5 bit Codec 2 energy parameter. Listen to the results for the baseline and variable power schemes. The baseline sample seems to “flutter” up and down as the energy bounces around due to bit errors. I can hear some “roughness” in the variable transmit power sample, but none of the flutter. However both are quite understandable, even though the bit error rates are 13.1% (baseline) and 18.7% (variable power)! Of course – this is just the BER of the energy parameters, in practice with all of the Codec bits subjected to that BER the speech quality would be significantly worse.
The simple modem simulation used here was BPSK modem over an AWGN channel. For FreeDV we use a DQPSK modem over a HF channel, which will give somewhat poorer results at the same channel Eb/No. However it’s the BER operating point that matters – we are aiming for intelligible speech over a channel between 10 and 20%, this is equivalent to a 1600 bit/s DQPSK modem on a “CCIR poor” HF channel at around 0dB average SNR.
codec2-dev/src$ ./c2enc 1300 ../raw/ve9qrp.raw - | ./insert_errors - - ../octave/energy_errors_baseline.bin 56 | ./c2dec 1300 - - | play -t raw -r 8000 -s -2 -
codec2-dev/src$ ./c2enc 1300 ../raw/ve9qrp.raw - | ./insert_errors - - ../octave/energy_errors_varpower.bin 56 | ./c2dec 1300 - - | play -t raw -r 8000 -s -2 -
Note the 1300 bit/s mode actually used 52 bits per frame but c2enc/c2dec works with an integer number of bytes so for the purposes of simulating bit errors we round up to 7 bytes/frame (56 bits).
As I wrote this post I realised the experiments above used natural binary code, however Codec 2 uses Gray code. The next post looks into the difference in SNR performance between natural binary and Gray code.