Codec 2 at 1400 bits/s

I’m in the process of releasing a 1400 bit/s version of Codec 2. Through efficient quantisation of the LSPs I have reduced the bit rate from 2500 to 1400 bit/s with only a small change in quality. This bit rate makes the codec very useful for digital voice over HF radio channels.

Here are some samples:

Codec Male Female
Original male female
Codec 2 V0.1A 2550 bit/s male female
Codec 2 V0.2 1400 bit/s male female
GSM Full rate 13000 bit/s male female

GSM full rate is what you might have been using on your mobile phone a few years ago. It’s a good example of “communications quality” speech. Compared to GSM, Codec 2 does a reasonable job at just 10% of the bit rate. There are some more samples on the Codec 2 page.

I think it’s possible to eventually push Codec 2 beneath 1000 bit/s with the same quality level. Improvements in the speech quality of Codec 2 at 1400 and 2500 bit/s are also possible with further algorithm development.

Some factoids:

  • At 1400 bit/s you can send 45 phone calls in the same bandwidth required for a standard 64 kbit/s phone channel.
  • 1400 bit/s is 175 bytes/second.
  • A 30 second voice mail can be stored in 5250 bytes.
  • A 30 minute pod cast can be stored in 308 kbytes.
  • At 1400 bit/s Codec 2 uses 56 bit (7 byte) packets, sent every 40ms. If used for VOIP the RTP+UDP+IP overhead is 40 bytes/packet. So the payload is just 15 % of the total VOIP packet.

History

Building a 1400 bit/s communications quality speech codec is a highlight of my career.

My interest in speech coding started in 1989 just after I graduated from engineering. I was working with a team or researchers on Mobilesat – one of the first satellite-based mobile phone services. We were working mainly on the modems for the Mobilesat system. Just at that point in history, it became possible to use digital speech rather than analog systems such as FM or SSB. During the 1980′s breakthrough speech coding algorithms were developed that could deliver communications quality speech at less than 10 kbit/s. At the same time, the invention of DSP chips meant we could (just) run these complex algorithms in real time. Prior to that it took 30 minutes to process 3 seconds of speech on the PCs or workstations of the day.

Although I was meant to be working on sat-com modems, I was fascinated by speech coding; first the DSP hardware, then the challenges of real time implementation, then the speech coding algorithms themselves.

The CELP based speech codecs I built in the early 90′s could deliver communications quality speech at 9600 bit/s, or (with a significant drop in quality) run at 4800 bit/s.

So managing to get about the same quality at 1400 bit/s is a nice personal achievement for me. Giving it away to the world is even cooler.

2500 to 1400 bits/s

Here is the bit allocation of Codec 2 running at 2500 bit/s:

Parameter bits/frame
Spectral magnitudes (LSPs) 36
Energy 5
Voicing (updated each 10ms) 2
Fundamental Frequency (Wo) 7
Total 50

The Line Spectrum Pairs (LSPs) dominate the bit rate so I focused my attentions there for a couple of weeks. Here is the bit allocation of Codec 2 running at 1400 bit/s:

Parameter bits/frame
(even sub-frame)
bits/frame
(odd sub-frame)
bits/frame
Spectral magnitudes (LSPs) 25 7 32
Energy 5 5 10
Voicing (updated each 10ms) 2 2 4
Fundamental Frequency (Wo) 7 3 10
Total 36 17 56

A Graphical Explanation of LSPs

Rather than going into the math of LSPs let me explain graphically. Here is a plot of 10 LSPs over 400ms of male speech:

The LSPs were extracted from this section of male speech:

The segment is the word “force” from the male sample at the top of this post.

In our case there are 10 LSPs. They are spread over 0 to 3500Hz. Together they represent the spectrum of the speech signal at any given point in time. They evolve over time as the speech signal changes, so we have to keep sending updated versions every 20ms or so.

Speech coding is the art of “what can we throw away”. So the idea is to send each LSP frequency across the channel to the decoder with the minimum number of bits, but still maintain good speech quality.

Closely spaced LSPs represent peaks in the speech spectrum. Our ear is very sensitive to these peaks so we must take special care with closely spaced LSPs. In the example above, you can see LSPs 1 and 2 close together between frames 7 and 25, at the same time we have a high energy vowel. You can also see LSPs 8&9, and 9&10 coming together during a consonant between frame 25 and 30. This indicates two peaks in the spectrum at high audio frequencies.

The ear is more sensitive to low frequencies, so it turns out we can use a coarser representation (less bits) for higher frequency LSPs.

There is another property that helps us. Perceptually important frames of voiced speech like vowels tend to change slowly. This suggests that coding the difference between frames will lead to coding efficiencies, as the frame-frame differences are very small.

Scalar and Vector Quantisation

The 2500 bit/s version of Codec 2 uses scalar quantisers. For example LSP 8 is “quantised” to one of 8 values represented by a table or array:

Index LSP 8
0 2300
1 2400
2 2500
3 2600
4 2700
5 2800
6 2900
7 3000

To quantise LSP 8, we find the closest value in the table, then send the index of that value. For LSP 8 this requires 3 bits/frame for the 8 possible values. The 2500 bit/s version of Codec 2 uses 36 bits total, with one table for each LSP. Because each LSP has it’s own quantiser table, it is known as scalar quantisation.

A Vector Quantiser (VQ) can be more efficient as it quantises several values at once, which can be referred to with just one index:

Index LSP 1 LSP 2 LSP 3 LSP 4
0 325 425 700 1275
1 350 425 700 1225
.
.
4095 550 625 725 1100

This example vector quantises LSPs 1 to 4 using a 12 bit (4096 entry) table. VQs can be very efficient, as they quantise several values at once and can take into account correlations in the input data. The trade off is that VQs tend to be noisy, as the table entries may not quite match all of the values in the input vector. Also I have found designing a VQ that works well to be quite a challenge.

From 2500 to 1400 bit/s

Here are some highlights of a couple of weeks of trial and error. I am not claiming any of this is particularly original, just new or important to me so worth logging here:

I developed a 25 bit/frame quantiser using scalar quantisers for LSPs 1-4, then a 12 bit (4096 entry) Vector Quantiser (VQ) for LSPs 5-10. I used VQ for the higher LSPs, as they are less sensitive to quantisation noise. I freely admit I don’t completely understand VQ, so there may be room for improvement here.

I found that I could “pre-quantise” or bandwidth expand the LSPs without any drop in quality. For example if LSPs 5-10 are quantised to 100Hz steps there is no perceptual difference in the decoded speech. This suggests that quantising the LSPs to any finer resolution is a waste of bits – we can’t hear the difference. I used this effect to design the 12 bit Vector Quantiser (VQ) for LSPs 5-10, that (for me at least) worked better than the same size VQ I designed with traditional minimum mean square error (MSE) training methods.

Then I started playing with delta-time quantisation of LSPs. During high energy, strongly voiced speech, the LSPs change slowly from frame to frame. So I experimented with just transmitting this small change.

Perceptually important, closely spaced LSPs are really sensitive to quantisation noise. Small changes in closely spaced LSPs can have a big effect on the decoded speech quality. Fortunately, during this sort of speech the frame to frame changes are very small. So for coding delta changes in LSPs I designed a VQ codebook by hand with the properties I wanted. I took the approach of constraining the VQ codebook to very small changes.

Here are some of the delta-time codebook entries:

Index LSP 1 LSP 2 LSP 3 LSP 4
0 -25 -25 -50 -50
1 0 -25 -50 -50
2 25 -25 -50 -50
3 -25 0 -50 -50
4 0 0 -50 -50
.
.
79 0 25 50 50
80 25 25 50 50

This full codebook is here.

It’s a bit like counting in binary, except the base changes for each element of the vector. It’s probably not optimal, but it works. As there are 81 values in the codebook it can be transmitted with 7 bits (entries 82 to 127 are not used).

Another twist – I discovered was that I could get away with just updating the lowest LSPs 1-4 on the delta frames. On odd delta frames I just copy the previous values for the LSPs 5-10. This means just 7 bits/frame are required on the odd frames for LSPs.

A further reduction came from delta-coding the pitch (fundamental frequency), as it also changes slowly during perceptually important voiced speech frames. Just 3 bits on odd frames resulted in no loss of quality.

Next Steps

Over the next few weeks I will release a separate encoder/decoder version of 1400 bit/s Codec 2. At the moment you can run the same algorithm using the “c2sim” simulation program:

$ svn -r 306 co https://freetel.svn.sourceforge.net/svnroot/freetel/codec2-dev
$ cd codec2-dev && ./configure && make && cd src
$ ./c2sim ../raw/hts1a.raw --1400 -o hts1a_1400.raw
$ ../script/playraw.sh hts1a_1400.raw

I’d really like to see Codec 2 combined with a modem and running over the HF bands. Some early experimentation to get real world user feedback and rapid development of an “open” digital voice mode would be great. This mode could be implemented as a Linux or Windows PC application that uses two sound cards to connect to a SSB radio and head set.

A key issue to explore is robustness to bit errors. I favour unequal error protection modes, for example just a small amount of FEC that protects just a few key bits.

There are many areas where Codec 2 could be improved. The LSP quantisation could be developed further to improve the quality or lower the bit rate. I’d also like to work on the model used to synthesis phases at the decoder, and track down some issues with different speakers.

Links

Codec 2 Project Page

49 comments to Codec 2 at 1400 bits/s

  • EXCELLENT! OUTSTANDING! Thanks for all you hard work on this project. HF DV will rise again on the hambands! I remember when there were over 500 members in the original HF DV group. They will be back in force to provide “real world” user feedback for CODEC 2. An FDM modem has shown to be quite robust with minimal latency for fast tx>rx and rx>tx similar to the SSB experience without QRN/QSB! Yes, just a small amount of FEC to fill in the “notches” created by adverse signal conditions (deep multipath and QRM) would help avoid the dropouts experienced with FDMDV’s excellent modem.

    Mel, K0PFX

  • Congratulations David! As you have said, this is a huge achievement and a massive contribution to the world. Thanks for sharing your progress with us.

    This is more motivation to get my licence – I look forward to talking to you over the ham bands with codec2.

  • Bob McGwier

    In order to mitigate HF channels, typically a combination of frequency and time diversity is needed to provide maximum benefit. This typically requires FEC that spreads the impact of each data bit out in the serial stream which is emitted from the FEC (whether or not it is a block code). The frequency diversity is easily achieved by using FDM (OFDM being one type) by taking the serial stream from the FEC and doing a serial to parallel translation and then encoding the bits across the frequencies after the parallel data is ready. If a large block code is used AND the data is rearranged by a permutation, then we achieve both frequency and time diversity. It is intuitively clear and mathematically provable that this is required to maximum “distance” between right and wrong decoding paths. This takes typical HF errors (which come in bursts) and spreads them out in time while the energy of each data bit is spread out in time and frequency and thus allows the FEC to do a much better job. It doesn’t matter if we are protecting a few really important bits or all bits, these statements remain true. Some latency is REQUIRED therefore for really robust communications. It would be a shame to come up with a Codec that equals or improves upon MELP and then kill it with an inadequate physical layer to communicate it.

    • david

      What sort of delay did you have in mind Bob?

      • Bob McGwier

        David:

        It depends on the channel (the doppler spread and coherence time).

        This is such great work David. I really want to make sure it is presented on the air in a way that will encourage adoption and experimentation. So, we always want to use the least latency inducing permutation and code that will allow the errors to be nearly uncorrelated when they are presented to the FEC decoder.

  • Graham Bryce

    That really is a dramatic bitrate reduction – I can understand why you’re feeling so chuffed! I only heard about Codec2 over the summer, so I’m very encouraged that you’ve continued to make such dramatic progress. My cheque’s in the post, as it were… ;o)

  • Hi Dave,

    Congratulations on your achievement with codec-2. The 1400 bit voice samples sound very good.

    Thanks,

    Tony, K2MO

  • I was thinking of doing something similar on VHF. A radio ham from the local club and I were thinking of using something like this codec (initially the GSM codec) over 9K6 and using any spare bandwidth to send low resolution webcam frames too. Think you guys are on to something. I’d be very interested if there is an example program that I can use; I see you mention combining it with a modem. Thanks ever so much! Best Regards, George, M1GEO.

  • Oliver Goldenstein

    Hi David and All !

    Your codec works beautiful using the ghpsdr3 framework and QtRadio.

    I did the following recording transmitting with a Softrock SDR, QtRadio and codec2 (latest stable version).

    http://www.youtube.com/watch?v=vsIdDfSHsWA

    For more info on the project please see:

    http://napan.ca/ghpsdr3/index.php/Main_Page

    73 Oliver, DL6KBG

  • Oliver Goldenstein

    Thanks David !

    Credits go to Alex Lee 9V1Al and John Melton G0ORX/N6LYT.

    I am only one of the heavy beta testers.

    My recording in that video was done with a Samsung G-Track Studio Microfon. What is that clipping noise you see and hear on the Spectrum?

    While the recording was made, everything was totally quiet. Only my breath and voice.

    73 Oliver DL6KBG

  • david

    Rick has a few interesting comments on the issue if overhead for VOIP over Wifi here.

    Low protocol overheads are also of interest to me for HF mesh appications.

  • How does this work for tonal languages like Chinese? I know that original GSM codecs had problems. As codec2 is going to be used by people around the world, it would be good to verify non-English words?

  • Alex

    You should probably put a warning on the release about US export control regs; specifically, 5A001.b.6, which restricts codecs operating at less than 2.4kbps. Dunno whether any Australian ones apply.

  • Tony

    It cannot be exported to Cuba, Iran, and North Korea under 740.2(6) from the US.

  • Tony

    Alex – I believe Codec2 Does not apply to US export controls except as listed above, nor does it need reporting.

    However, you’re quite right that Codec2 does fall under 5A001.b.6.

    Here’s the thought process:
    1)relevant definition of 5A001.b.6:
    “Employing functions of digital “signal processing” to provide ‘voice coding’ output at rates of less than 2,400 bit/s.”
    Source here: http://goo.gl/xTm9c

    2)Relevant exemption to export control (same source as above):
    “LVS: N/A for 5A001.a, b.5, e, and h; $5000 for 5A001b.1, b.2, b.3, b.6, d, f, and g; $3000 for 5A001.c.”

    3)Export control exemption “LVS” is “shipments of limited value”. That means that for the above section in bold, anything less than $5000 is exempt.
    Source Here:http://goo.gl/VZPTc

    4)Export control excemption “LVS” also has reporting requirements per “§743.1 of the EAR”. This section can be
    Found Here:http://goo.gl/gjYn4

    The relevant section is 743.1(4)(c):
    “(c) Items for which reports are required. (1) You must submit reports to BIS under the provisions of this section only for exports of items controlled under the following ECCNs”

    The relevant section is 743.1(4)(c)(1)(v) “Category 5″ (same link as above).

    There is no mention of ECCN 5A001.b.6.

  • Tim,

    GSM 06.20 works OK for Chinese. You might be thinking of the orignal VSELP codec for the US TDM cellular standard. A high pitched woman’s voice speaking Cantonese sounds pretty bad with that codec.

  • Tony

    correction, the reporting section is 743.1(c) not (4)(c)

  • Alex

    Tony: Ah, that makes sense. Mind you, I wouldn’t put it past the US authorities to regard ‘posting to the internet’ as tantamount to ‘exporting to North Korea’.

  • Tony

    There’s hundreds (thousands?) of other open source projects such as PGP or OpenSSL that would be more concerning to any gungho US authority. A simple post about that issue is usually all it takes to CYA.

  • akib

    really amazing work.
    is there any port created for asterisk??
    can we use this in asterisk??
    if yes how can we??

  • akib

    please explain your words

  • charles t lester N5RWJ

    Why not test the codec with blind people?

  • Don Barr KA2YDX

    Have you done any PESQ scoring of the codec at various bitrates? Does it contain SID/VAD functionality to compress out silence?

      • so...

        As I was looking for some test results on the speech quality of Codec2 I got the impression (from the comments here and mailinglist thread “MOS quality contest for codec2″) that you haven’t been using quality tests at all so far. As far as I know, extensive testing is seen as beeing crucial in the development of new codec standards, for example. While real tests with real ears and brains are ridiculously expensive, tools like PESQ have been developed that give some of the benefits at practically no cost.
        So, if you’re not yet making massive use of such tools all the time you may be missing a very valuable aid in your development.
        Now I just came to realise that with the official ITU standards here like PESQ or POLQA those f***** b******* don’t give those tools out to the world for free like one would expect from being used to development processes of internet standards or Free Software and instead also try to thwart progress through license fees.
        Fair enough – they at least tell you how it works and give some pseudo-code…

        I think someone(tm) should donate a license to Dave to help this wonderful project. And in the long run someone(tm) should make us FLOSS versions of the POLQA and/or PExQ test suits…

        • david

          I wouldn’t say formal subjective testing is essesntial. It would possible to crowd source subjective testing using a web site that plays pairs of samples to listeners which they rank A/B. The PESQ/POLQA guys do have eval licenses aaulable for people like us.

  • Don Barr KA2YDX

    Ran your test files at the top of this page through a tool we have here. Above 3 for all your male files – pretty nice.

    My tool might not be scoring them 100% accurately – it normally uses a set reference file that it’s designed to score against (as your GSM ref should be higher) and it’s designed to use a Mu/ALaw reference.

    Talk to me privately if you want to run some samples through I’ll provide the info for you.

    Codec2-1400-mu.pcm: 3.101
    Codec2-2500-male-mu.pcm: 3.275
    Codec2-gsmref-male-mu.pcm: 3.553

    Codec2-1400-female.pcm: 2.811
    Codec2-2500-female.pcm: 3.051
    Codec2-female-gsmref.pcm: 3.561

    • david

      That’s interesting Don – the tools seems to be grading the samples in about the right order.

      • Don Barr KA2YDX

        FYI – for the product that I work on for a living the expected passing range for GSM_EFR would be from 3.6-3.7. AMR-NB at 12_2 would be 3.7-3.8.

        Given that, I think my tool is underestimating your PESQ in general – so it could even be better than what I posted before..

    • yo8rna

      Thanks Don for sharing your results.
      I’ve compiled in cygwin the PESQ sources from ITU:
      > http://www.itu.int/rec/T-REC-P.862-200511-I!Amd2/en and I’ve run it against some of ITU’s official samples (the ones for certification).
      Can you help with their interpretations, as from what I understand some of the values seem too low. Maybe I understand them wrong…
      ./pesq.exe +8000 FULL_8kHz_A_eng.wav FULL_8kHz_A_eng.wav_1200.wav P.862.2 Prediction (MOS-LQO): = 2.463
      ./pesq.exe +8000 FULL_8kHz_A_eng.wav FULL_8kHz_A_eng.wav_1400.wav P.862.2 Prediction (MOS-LQO): = 2.528
      ./pesq.exe +8000 FULL_8kHz_A_eng.wav FULL_8kHz_A_eng.wav_1500.wav P.862.2 Prediction (MOS-LQO): = 2.527
      ./pesq.exe +8000 FULL_8kHz_A_eng.wav FULL_8kHz_A_eng.wav_2500.wav P.862.2 Prediction (MOS-LQO): = 2.882
      ./pesq.exe +8000 FULL_8kHz_B_eng.wav FULL_8kHz_B_eng.wav_1200.wav P.862 Prediction (Raw MOS, MOS-LQO): = 2.282 1.890
      ./pesq.exe +8000 FULL_8kHz_B_eng.wav FULL_8kHz_B_eng.wav_1400.wav P.862 Prediction (Raw MOS, MOS-LQO): = 2.362 1.975
      ./pesq.exe +8000 FULL_8kHz_B_eng.wav FULL_8kHz_B_eng.wav_1500.wav P.862 Prediction (Raw MOS, MOS-LQO): = 2.310 1.919
      ./pesq.exe +8000 FULL_8kHz_B_eng.wav FULL_8kHz_B_eng.wav_2500.wav P.862 Prediction (Raw MOS, MOS-LQO): = 2.548 2.194
      ./pesq.exe +8000 FULL_8kHz_Ch.wav FULL_8kHz_Ch.wav_1200.wav P.862.2 Prediction (MOS-LQO): = 2.367
      ./pesq.exe +8000 FULL_8kHz_Ch.wav FULL_8kHz_Ch.wav_1400.wav P.862.2 Prediction (MOS-LQO): = 2.455
      ./pesq.exe +8000 FULL_8kHz_Ch.wav FULL_8kHz_Ch.wav_1500.wav P.862.2 Prediction (MOS-LQO): = 2.415
      ./pesq.exe +8000 FULL_8kHz_Ch.wav FULL_8kHz_Ch.wav_2500.wav P.862.2 Prediction (MOS-LQO): = 2.954
      ./pesq.exe +8000 FULL_8kHz_Fr.wav FULL_8kHz_Fr.wav_1200.wav P.862 Prediction (Raw MOS, MOS-LQO): = 0.853 1.130
      ./pesq.exe +8000 FULL_8kHz_Fr.wav FULL_8kHz_Fr.wav_1400.wav P.862.2 Prediction (MOS-LQO): = 2.504
      ./pesq.exe +8000 FULL_8kHz_Fr.wav FULL_8kHz_Fr.wav_1500.wav P.862.2 Prediction (MOS-LQO): = 2.473
      ./pesq.exe +8000 FULL_8kHz_Fr.wav FULL_8kHz_Fr.wav_2500.wav P.862.2 Prediction (MOS-LQO): = 2.908
      ./pesq.exe +8000 FULL_8kHz_Ger.wav FULL_8kHz_Ger.wav_1200.wav P.862 Prediction (Raw MOS, MOS-LQO): = 0.684 1.101
      ./pesq.exe +8000 FULL_8kHz_Ger.wav FULL_8kHz_Ger.wav_1400.wav P.862 Prediction (Raw MOS, MOS-LQO): = 2.332 1.942
      ./pesq.exe +8000 FULL_8kHz_Ger.wav FULL_8kHz_Ger.wav_1500.wav P.862 Prediction (Raw MOS, MOS-LQO): = 2.295 1.903
      ./pesq.exe +8000 FULL_8kHz_Ger.wav FULL_8kHz_Ger.wav_2500.wav P.862 Prediction (Raw MOS, MOS-LQO): = 2.528 2.170
      ./pesq.exe +8000 FULL_8kHz_Ru.wav FULL_8kHz_Ru.wav_1200.wav P.862 Prediction (Raw MOS, MOS-LQO): = 2.204 1.812
      ./pesq.exe +8000 FULL_8kHz_Ru.wav FULL_8kHz_Ru.wav_1400.wav P.862 Prediction (Raw MOS, MOS-LQO): = 2.352 1.964
      ./pesq.exe +8000 FULL_8kHz_Ru.wav FULL_8kHz_Ru.wav_1500.wav P.862 Prediction (Raw MOS, MOS-LQO): = 2.250 1.856
      ./pesq.exe +8000 FULL_8kHz_Ru.wav FULL_8kHz_Ru.wav_2500.wav P.862 Prediction (Raw MOS, MOS-LQO): = 2.586 2.243
      ./pesq.exe +8000 FULL_8kHz_Sp.wav FULL_8kHz_Sp.wav_1200.wav P.862.2 Prediction (MOS-LQO): = 2.504
      ./pesq.exe +8000 FULL_8kHz_Sp.wav FULL_8kHz_Sp.wav_1400.wav P.862.2 Prediction (MOS-LQO): = 2.591
      ./pesq.exe +8000 FULL_8kHz_Sp.wav FULL_8kHz_Sp.wav_1500.wav P.862.2 Prediction (MOS-LQO): = 2.563
      ./pesq.exe +8000 FULL_8kHz_Sp.wav FULL_8kHz_Sp.wav_2500.wav P.862.2 Prediction (MOS-LQO): = 3.030

      Your comments would be very valuable.

      • Shyam

        See the PESQ Document it is clearly mentioned that PESQ is not validated for codecs <4kbps :) u cannot trust above results better try new algo POLQA but damn crap thing is POLQA must be licensed all people wont have good thoughts as david has

  • wiz

    Hi David,

    Very nice :) . Do you have any idea about MIPS and RAM needed? I am wondering if I could port to PIC32? 80MIPS limited RAM, executable flash, big SD card?

    Wiz

    • david

      At this stage I haven’t considered MIPs and RAM and it’s in floating point. But eventually I would like to see it running on a 32 bit microcontroller.

  • wiz

    Hi David,

    I’ve got a PIC32 box running. I thought it would be a neat application:). Nice to see your progress. I am most impressed.

    I was involved in 202 modem in the early days and despite its 1200 baud reputation, depending upon implementation it could do up to 1800 baud. I did a version in software, so I was wondering whether codec 2 could fit in too and of course a ulaw codec.

    Sounds like a pretty big chore given my coding skills! Maybe someone smarter?

    Of course we could include the HF radio as well since that’s just software these days too :) .

    warm regards,

    Wiz

  • Ron Cook

    Hi David,
    Most interesting. Amazing really. I played around with some narrow band digital voice on HF a few years ago but quality was poor. We used two sound cards on a laptop. Audio was sent and received over SSB systems. Any thoughts on adapting the codec to run on such a system?

    Cheers,

    Ron

    • david

      Hello Ron,

      Yes that sort of system is what we are trying to build now, something like FDMDV. I’m working on a modem suitable for the codec right now.

      Cheers,

      David

  • Aon

    But this is high latency at 40ms per frame, in order to achieve 1400 bits/s. I wonder what the bits/s compression performance would be with a 20ms frame. Probably somewhere close to the 2550 bits/s performance of the original codec.

  • anon

    David, but this is high latency at 40ms per frame, in order to achieve 1400 bits/s, isn’t it?

    I wonder what the bits/s compression performance would be with a 20ms frame. Probably somewhere close to the 2550 bits/s performance of the original codec, no?

    • david

      Yes the additional latency helps get a lower bit rate.

      • anon

        Ok, I know telephone conversations or conference calls are not the use case for this codec. But what about low bit rate while still being useful for interactive audio applications?

        David how much lower can you improve the bit rate while maintaining a 20ms frame?

        (I’m thinking of applications where this audio data would piggy back onto GSM full rate data, like where the voice call is encoded then encrypted and inserted into the GSM data, and on the other end the GSM stream is decrypted and then decoded to yield the voice call. This way people could talk securely without being eavesdropped on.)

        Regards.

        • david

          Predictive coding (sending differences) would lower the bit rate while keeping the frame rate constant. This works well if bit errors are not an issue.

          However I’m not sure its a problem worth solving – over VOIP most people wouldn’t notice the difference in latency in a conversation between 20 and 40ms codec buffering. I’ve worked on VOIP systems where we concatenated 4 GSM frames (80ms) and no one noticed the latency. Other sources of delay (like the IP network, jitter buffer) are usually much greater than the codec buffering.

          • anon

            Well I didn’t have in mind VOIP, just regular digital GSM cell phone networks. From one cell phone to another. The frames are typically sent every 20ms, and quality of service guarantees usually mean jitter tends to not be an issue.

            (International phone calls are another matter.)

  • umanga

    I wish I could marry your brain :)

    • John

      That comment was very good. I couldn’t stop laughing.

      Seriously though, this was all very interesting reading. I was looking to buy the AOR ARD9000MKii Digital Voice Modem on ebay but got outbid today. That made me a little mad so I decided maybe I ought to try to build one instead and then I ran across this page. Very nice work David. I commend you. Now I have something interesting to spend my retirement time on.

  • Robert

    Have you considered making a wide band version (7kHz audio bandwidth)? This could help improve intelligibility of unvoiced sounds. Using an 8kHz sample rate relies on the premise that all speech frequencies are below 4kHz, which is certainly not true.

    • david

      Yes I have thought about wideband, with a small increase in bit rate (like extending the LP model to say 14th order) this should be possible. Although I am not convinced it would add a lot to intelligability – we have been using 3kHz band limited speech for telephones for 100 years and that seems to work Ok. Still, something I would like to try in future.

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>