Codec 2 700C

My endeavor to produce a digital voice mode that competes with SSB continues. For a big chunk of 2016 I took a break from this work as I was gainfully employed on a commercial HF modem project. However since December I have once again been working on a 700 bit/s codec. The goal is voice quality roughly the same as the current 1300 bit/s mode. This can then be mated with the coherent PSK modem, and possibly the 4FSK modem for trials over HF channels.

I have diverged somewhat from the prototype I discussed in the last post in this saga. Lots of twists and turns in R&D, and sometimes you just have to forge ahead in one direction leaving other branches unexplored.

Samples

Sample 1300 700C
hts1a Listen Listen
hts2a Listen Listen
forig Listen Listen
ve9qrp_10s Listen Listen
mmt1 Listen Listen
vk5qi Listen Listen
vk5qi 1% BER Listen Listen
cq_ref Listen Listen

Note the 700C samples are a little lower level, an artifact of the post filtering as discussed below. What I listen for is intelligibility, how easy is the same to understand compared to the reference 1300 bit/s samples? Is it muffled? I feel that 700C is roughly the same as 1300. Some samples a little better (cq_ref), some (ve9qrp_10s, mmt1) a little worse. The artifacts and frequency response are different. But close enough for now, and worth testing over air. And hey – it’s half the bit rate!

I threw in a vk5qi sample with 1% random errors, and it’s still usable. No squealing or ear damage, but perhaps more sensitive that 1300 to the same BER. Guess that’s expected, every bit means more at a lower bit rate.

Some of the samples like vk5qi and cq_ref are strongly low pass filtered, others like ve9qrp are “flat” spectrally, with the high frequencies at about the same level as the low frequencies. The spectral flatness doesn’t affect intelligibility much but can upset speech codecs. Might be worth trying some high pass (vk5qi, cq_ref) or low pass (ve9qrp_10s) filtering before encoding.

Design

Below is a block diagram of the signal processing. The resampling step is the key, it converts the time varying number of harmonic amplitudes to fixed number (K=20) of samples. They are sampled using the “mel” scale, which means we take more finely spaced samples at low frequencies, with coarser steps at high frequencies. This matches the log frequency response of the ear. I arrived at K=20 by experiment.

The amplitudes and even the Vector Quantiser (VQ) entries are in dB, which is very nice to work in and matches the ears logarithmic amplitude response. The VQ was trained on just 120 seconds of data from a training database that doesn’t include any of the samples above. More work required on the VQ design and training, but I’m encouraged that it works so well already.

Here is a 3D plot of amplitude in dB against time (300 frames) and the K=20 frequency vectors for hts1a. You can see the signal evolving over time, and the low levels at the high frequency end.

The post filter is another key step. It raises the spectral peaks (formants) an lowers the valleys (anti-formants), greatly improving the speech quality. When the peak/valley ratio is low, the speech takes on a muffled quality. This is an important area for further investigation. Gain normalisation after post filtering is why the 700C samples are lower in level than the 1300 samples. Need some more work here.

The two stage VQ uses 18 bits, energy 4 bits, and pitch 6 bits for a total of 28 bits every 40ms frame. Unvoiced frames are signalled by a zero value in the pitch quantiser removing the need for a voicing bit. It doesn’t use differential in time encoding to make it more robust to bit errors.

Days and days of very careful coding and checks at each development step. It’s so easy to make a mistake or declare victory early. I continually compared the output speech to a few Codec 2 1300 samples to make sure I was in the ball park. This reduced the subjective testing to a manageable load. I used automated testing to compare the reference Octave code to the C code, porting and testing one signal processing module at a time. Sometimes I would just printf rows of vectors from two versions and compare the two, old school but quite effective and spotting the step where the bug crept in.

Command line

The Octave simulation code can be driven by the scripts newamp1_batch.m and newamp1_fby.m, in combination with c2sim.

To try the C version of the new mode:

codec2-dev/build_linux/src$ ./c2enc 700C ../../raw/hts1a.raw - | ./c2dec 700C - -| play -t raw -r 8000 -s -2 -

Next Steps

Some thoughts on FEC. A (23,12) Golay code could protect the most significant bits of 1st VQ index, pitch, and energy. The VQ could be organised to tolerate errors in a few of its bits by sorting to make an error jump to a ‘close’ entry. The extra 11 parity bits would cost 1.5dB in SNR, but might let us operate at significantly lower in SNR on a HF channel.

Over the next few weeks we’ll hook up 700C to the FreeDV API, and get it running over the air. Release early and often – lets find out if 700C works in the real world and provides a gain in performance on HF channels over FreeDV 1600. If it looks promising I’d like to do another lap around the 700C algorithm, investigating some of the issues mentioned above.

70 thoughts on “Codec 2 700C”

  1. sounds great … hopefully we can also add that mode to mchf firmware like it is done with the other mode :-)

  2. Thanks for sharing David. This is very interesting work. The 700C to me seems almost the same as the 1300. I’m looking forward to hearing how in works on HF.

    73,
    Rob
    KL7NA

  3. Amazing job. Some of the 700c samples sound even better than 1300, and none was noticeably worse.. That’s really impressive.

    Jacek
    KW4EP

  4. Excellent work! The 700C samples are definitely in the ballpark of the 1300 samples. While I can’t agree that the “quality” of the 700C samples is better – I am sensitive to the artifacts of the lower bitrate codec. However, on about half of the samples, the readability was better than the corresponding 1300 sample, and to me, that’s the most important aspect. I’m looking forward to trying this one on air.

  5. Great work! David, you definitly are on the right track. I am looking forward to test it on the long distance. Thank you for your outstandig work.
    Gerhard OE3GBB

  6. My hearing is pretty good. The 1300 and 700C samples are remarkably similar in quality and in particular understandability.
    So, well done. Yes, add the FEC as this improves DMR quite a lot.
    Really impressive.

  7. Very clever approach! I developed something similar in SDR# for noise reduction where the high amplitude bins are “boosted” and the weak ones are damped.
    I propose adding some comfort noise in the decoder and shape it with heuristics.

  8. I have no previous experience listening to voice digitized at such a low data rate, but overall I’m favorably impressed. This is very good work.

    I listened to these samples alternating between each bit rate for the first listen. The 700 bit/s alternatives all seemed distinctly less intelligible to me. The most extreme example, to my ears, was the vk5qi 1% BER sample, where the 700 bit/s version appeared to turn the word “testing” into “toting.” That’s how I still hear it after multiple listens.

    I would suggest that it would be better to bias toward the higher data rates. The SDRs of the future will allow the receiver to replay, and even to re-process, the received signal, reducing the need for the sender to repeat a transmission and therefore substantially increasing the effective bandwidth available to each sender.

    I wrote an ultra-short sketch of how contesting can work with SDR technology in Relay, the Mike & Key club newsletter, at http://www.mikeandkey.org/newsletters/Relay_2014_10a.pdf (page 8). I invite everyone to take a fresh look at the potential of this technology rather than trying to fit it into existing regulations and operating traditions.

    1. SDR can’t improve on the laws of physics. To get through a low SNR channel (like the HF channel under extreme conditions) we need a low bit rate.

      1. Channel capacity is proportional to the bandwidth times the log of 1+S/N. Bandwidth isn’t fixed, and with MIMO and other techniques, S/N isn’t either.

        We know that communications systems work better when wider channels are shared than when narrower channels are used exclusively; this is a clear conclusion from all the work done over the last 20 years by the cellular telephone industry.

        Your work can help us show the FCC that we need to escape the old limits based on single-user channel widths, and it’ll be better for everyone to find the real minimum data rate for good intelligibility rather than to force-fit your results into limits that need to be made obsolete anyway.

        1. Yes the cohpsk modem used with the 700 bit/s codec uses diversity to help with frequency selective fading. The low codec source rate means we can fit this into a standard SSB radio channel, the default at present. We are now also able to use FSK with Class C PAs for voice over a similar channel, due to the low codec bit rate rate.

          Several years of on-air experience has shown the voice quality of Codec 2 at 1300 bit/s to be a reasonable alternative to SSB, and the balance of comments above indicate 700 is roughly the same. We shall soon see when we get it on the air. If the quality needs further improvement – it will happen.

          Your proposed approach has some merit – but wideband is not a free lunch and there is much work between where you are today and your vision. Our current approach also has merit, is getting incrementally better over time, and importantly is based on real world experience, working code, and can be tested in today’s regulatory environment.

          Alas I only have so much time, but I would be very happy to see others working in the wideband area in their own volunteer time, and welcome them to use my work as a starting point.

          1. Sounds good, and I appreciate the work you’re doing.

            I dropped the ball in my previous reply, omitting a concept I originally meant to include. SNR and channel capacity can also be improved by reducing range. This means the 1,300 bit/s codec won’t have the range of the 700 bit/s codec, all else being equal, but it may be of more net benefit to amateur radio to make that tradeoff. Reliable long-range digital communications should make HF substantially more popular, thereby increasing the opportunities for message relaying (I keep going back to the origins of the ARRL there :-) instead of pure point-to-point communication. The cellphone system relies on relaying; with the right technology, we can too.

            Just wanted to get that out there, but I think you’ve already addressed the basic issue. Thanks again.

    2. Peter,
      I agree with much of what you say, especially the part about how a wide shared channel is better than a bunch of narrow dedicated ones. Not only is this the lesson of mobile telephony, but even more so the Internet. I’d like to design a medium speed packetized VHF/UHF modem optimized for VoIP, but able to carry other traffic as well. This will make interworking with other terminals much easier (that’s what the Internet was all about). By “medium speed” I mean a few hundred kb/s, fast enough to minimize store-and-forward delay but slow enough to not use excessive amounts of spectrum or excessive peak transmitter power. The big question is how to do multipath mitigation at this rate. Our experience at Qualcomm way back in the early days of CDMA showed that at 900 MHz, the most significant multipath components were within 10 microseconds or so, and often within just a few microseconds. This is long enough to imply that some multipath mitigation might be necessary, perhaps OFDM with a minimal set of carriers. You only need to reduce the symbol rate of each carrier enough to mitigate multipath; going overboard with lots of very slow carriers makes it harder to track Doppler, and I want this thing to be usable by mobiles.

      You also say “The SDRs of the future will allow the receiver to replay, and even to re-process, the received signal, reducing the need for the sender to repeat a transmission”.

      On this I have to differ. Replaying a signal that has already been received does nothing to improve its SNR, and SNR limits channel capacity as you know. The problem is that the noise sequence is the same each time you replay. By requesting a retransmission of the same sequence and combining it with the one you’ve already received, the signal power will add coherently while the noise, being different each time, will add noncoherently, resulting in a 3 dB increase in SNR.

      The only time you can benefit from “replaying a received signal” is when processing limitations keep you from always taking full advantage the first time. This is becoming less and less common as CPUs get faster and especially more parallel. I write all my own demodulators to wring every last bit (!) of information from a given buffer of raw samples before proceeding to the next. If a signal is there I’ll get it, even if the modem hadn’t yet synchronized to it. Basically, I already iterate over each buffer as many times as it takes to detect synchronization or to conclude that there’s nothing there. Once I’ve given up, there’s no point to trying it all over again.

      The only drawback to this scheme is that my modems consume the most CPU time when fed pure noise, because they’re repeatedly searching for synchronization. Once a signal is acquired, it is rarely necessary to iterate anything except a Turbo decoder if one is in use. So I can generally use a squelch or some sort of out-of-band indication of when a signal is present to save CPU time; e.g., in a satellite modem, there’s no point in trying to decode a signal before the satellite is above the horizon.

      The only time

  9. What’s the latency from speaker to listener, ie: time to convert audio to bitstream and back to audio?

    1. Unlike general purpose data compression this is lossy compression and can achieve much higher compression ratios and operate with a known latency.

  10. I know little about this tech, but it seems to me that, if you clean up the source signal, you would have less to encode. There seems to be a lot of “garbage” going on in the background. Can that be partially filtered out by level or some other simple means?

  11. Hi

    Great work. I actually prefer the 700C on several of the samples and the rest are more or less the same.

    Suggestion: Since you only have 18bits totalt for VQ, why use two? A single VQ of 18bits should yield better results and be no problem to run on a decent DSP or cheap ARM SOC or even a high end MCU.

    Question: With the breakthroughs done in tasks like this using machine learning and access to easy to use tools like tensorflow, have you thought of investigating this technique for improving speech coding?

    Cheers

    1. 18 bits is 2^18 compares of K=20 vectors ….. could be significant CPU. Plus it is also more storage 2^18 * 20 * 4 bytes/float = 21Mbyte

  12. As a non-native english speaker i have a hard time understanding the 700C samples.

    The cq_ref is the strangest one, it sounds better in 700C, but the second half of it is only understandable to me in 1300.

    1. Yes it does take some getting used to, even for English speakers. We are targeting the sort of quality experienced on low SNR SSB radio channels, which also takes some getting used to for lay people.

    1. Hi Phil – yes I found that on a page on your web site discussing DV. It has quite a “spectral slope” across it (low freq energy much higher than high freqs) that tends to upsets vocoders, so I use it to stress vocoders I develop.

      1. Interesting. I must have recorded it well over 20 years ago, so I don’t remember what I used for equipment. Probably a Sun workstation.
        I never thought anybody looked at that page…

        Great work, btw. But to be honest, I find all the low bit rate codecs somewhat tiring to listen to. It sounds like the speaker cone has a tear in it. You certainly do better than anybody else with the few bits you have, but for local VHF/UHF use I’m thinking we need higher rates (maybe opus at 6-8 kb/s) and faster and more efficient modems to really compete with FM. Three incompatible DV schemes are all widely used here in southern California: Fusion, DMR and D-Star, and I’m not impressed by any of them. When I looked at C4FM, I was very surprised to find that the tones aren’t even orthogonally spaced. No wonder it doesn’t do any (or much) better than FM despite the low bit rate!

        1. I agree that we should shoot for FM or better quality at VHF/UHF. The FreeDV2400A/SM2000 work is aiming at that. We also discovered that the C4FM/DMR modems are intentionally crippled and are doing much better with our 4FSK modem (e.g. DV at -135dBm using FreeDV 2400A).

          My own experience is that after a few minutes FreeDV 1600 is nicer than SSB when the channel is good. While the codec sounds a little artificial, the background noise from even high SNR SSB is very fatiguing for me – and it’s just gone with DV.

          I’m also planning a “broadcast” HF mode, long latency so we can use a LPDC code and perhaps enough bit rate for the low end of Opus. Time for DV to exceed analog in quality.

          1. How would you say that the C4FM/DMR modems are “crippled”? Is it more than the tones being too close together to be optimal? They’re usually demodulated noncoherently with an FM discriminator, so I thought I could do a better job with something more purpose-built. But then I saw how badly the signal is designed, and got discouraged.

          2. The modem waveform appears to under-performs theory for 4FSK by >6dB, as you pointed out due to the tone spacings. My background is sat-com modems where fractions of a dB matter so I burst into tears when I simulated this waveform :-). I searched but couldn’t find any justification. I can only assume, as you suggested, it was to save bandwidth, and perhaps to re-use legacy analog FM radio architecture. We can do much better, especially with control over the codec … hence the SM2000 and FreeDV 2400A waveform.

          3. Yes I’d like to prototype a 7kHz bandwidth (e.g. Skype, AM Radio) PTT radio system and see what it feels like to use.

          4. I’m not convinced by the intelligibility argument with wider audio bandwidth. The POTs telephone network has worked just fine for 100 years with a 3kHz bandwidth. SSB/FM PTT radio the same. However it might “feel” better to use. And with a decent modem there is no down side in terms of SNR/MDS.

          5. Just read the link you made from “intentionally crippled”. I’m pretty sure the crappy performance you saw with C4FM is because the tones are too close together for the symbol rate to be orthogonal. I.e., much of the carrier power is simply wasted instead of helping you determine what bits were sent.

            Orthogonality requires that the tone spacing be k Rs Hz, where k is a non-zero positive integer and Rs is symbol rate in Hz.

            C4FM uses “tones” of -2700, -900, +900 and +2700 Hz at a 4800 Hz symbol rate. The tone spacing is therefore only 1800 Hz, less than the 2400 Hz (minimum) needed for orthogonality. One way to look at this is if you try to demodulate with a 4-point FFT over a 1/4800 sec interval, the four symbol values won’t get a different integral number of cycles in that interval, so there will necessarily be “leakage” between the bins. Not all of the transmitted energy is helping you figure out which symbol was sent.

            This is a good example of what happens when radio people make a fetish out of “saving” bandwidth. You end up with power-inefficient modulation schemes that require everyone to turn up the wick, which also creates more interference for distant co-channel users, requiring greater geographical reuse spacing and therefore worse spectral efficiency where it really counts.

          6. I agree – it’s the bizzare but intentional choice of non-Rs spaced tones. Or if we are really concerned about RF bandwidth go coherent FSK demod – harder to implement but you get to use Rs/2 spacing. FreeDV2400A is >10dB more power efficient that C4FM by using a decent modem and lower bit rate voice compression.

            Re co-channel interference – phase noise causing adjacent channel interference is becoming a limiting factor in DMR channel spacings, and that is made worse by needing high tx powers due to crappy modem waveforms and hence poor link budgets. (Source a colleague who makes DMR radios). You could also save battery, have greater range, etc etc

          7. Correction: the 1800 Hz tone spacing in C4FM is less than the 4800 Hz (minimum) required for orthogonality.

            I’m going to see if I can analyze C4FM from first principles and see if the ber performance matches your simulation results.

          8. It would be great to have you simulate the waveforms and compare results – thanks. Could be we missed something and this waveform can be made to perform better. I hope so, although our results seem to match the MDS results of commercial DMR radios.

          9. I hadn’t known about that problem with phase noise, but it merely underscores the need to overthrow the conventional wisdom that “narrower” == “more spectrally efficient”. This is a battle we (Qualcomm) fought back in the early 1990s when we first demonstrated CDMA. It was difficult to get people to accept that a signal 1.25 MHz wide could be more efficient than one 15 kHz wide. But it was, by a lot.

            BTW, all those original CDMA patents have expired by now, and some of the techniques might be interesting to hams. The closed-loop power control scheme was (IMHO) one of the most elegant features. It was needed to resolve the near-far problem with direct sequence spread spectrum, but even without spreading we hams could use it on repeaters to minimize co-channel interference. (Not sure how it could be used on simplex since it requires a full duplex channel.) We also used a variable rate codec without any kind of explicit header indicating the rate; the receiver simply tried decoding all four and picked the one with the correct CRC (if present) or otherwise the best Viterbi decoder metric. This allowed the transmitter to vary its power with speech activity on a per-frame basis (20 ms) to reduce average transmit power and therefore average QRM to co-channel users.

            The initial codec was “QCELP” with maximum rate of 9.6 kb/s. The telcos (our customers) actually asked to increase that to 14.4 kb/s to improve voice quality even at the expense of system capacity. Speech codecs have gotten a lot better in the past 25 years but they’re your field not mine (mine is modems and protocols). You seem to own the very low bit rate realm in terms of quality if not deployed base, but as I said I think we still need something better (and probably faster) to match NBFM. What do you think of opus? I understand it’s a marriage between speex (at the low end) and Vorbis (at the high end). I’ve briefly played with it at 6-8 kb/s (its low end) and was pretty impressed, but I’d have to listen more carefully to see what bit rate would really equal or better NBFM quality.

          10. Yes I remember listening to the Qualcomm variable rate cellular system played trough some PoC hardware at a speech coding conference in the early 1990s. Very cool. Also know one of the engineers who worked on that codec.

            I agree a higher speech quality is needed to match, or rather exceed the quality of NBFM (e.g. 7kHz audio bandwidth Skype type quality). With a decent modem the link budget for UHF/VHF PTT radio would support Opus type bit rates. I’d also like to explore good quality speech at the 2-5 kbit/s range using Codec 2. This range is not covered very well by open source. For the past few years I’ve been focused on low SNR/low bit rate work, with 700C being the latest output.

          11. Low bit rate codecs are neat, but they’re really for niche applications like slow, standalone point-to-point links.

            They’re not as useful in networked applications. Today that implies packetization, usually with an IPv4 or IPv6 header, and keeping the latency tolerable means keeping the packet size so small that, at these low bit rates, the header overwhelms the voice data. I.e., there’s rapidly diminishing returns as you further lower the bit rate.

            Still, there are tricks such as IP header compression that can be useful, and because we’re hams used to half duplex PTT operation latency isn’t as important as in commercial full duplex telephony. Also, low (especially variable) rate voice is fine in non-real-time voice applications like voice mail. I think we hams could do more with voicemail-like applications in, e.g., emergency communications since much of it consists of record traffic anyway.

  13. Hey man, good stuff.

    One small ask, can you please add Supercache or add some cache to make your site faster? There’s a lot of clicks to hear and do stuff and the response is slow.

  14. I’m curious if the testing you are doing involves listening for intelligibility of the 700C samples standalone. For me, a few have unintelligible parts unless I listen to the 1300 sample as well. After that, if I hear the 700C one again, it makes more sense. Maybe it’s just me though.

    1. Low bit rate codecs (and indeed SSB) take some getting used to. I’m one guy working part time and this is a new algorithm with the bits barely dry. So my approach is to get 700C on the air ASAP and get people trying it in real conversations. Then iterate if necessary.

      1. This is amazing work btw. I had assumed a long time ago that it would be impossible for an individual to produce a low-bandwidth audio codec like this. Thank you for sharing your expertise with the radio community. :)

        1. It’s hard, and there aren’t many people who can work in this area, but you only need one person who will release the source code and explain what they are doing. That opens it up a little for others ….. the wonder of open source.

  15. Just had a listen and made comparisons with 1300 and 700C.
    Initially found little difference between the two. Wife was understanding a little better.
    When we worked out what the phrase was, it was interesting to note that I found the 700C easier to resolve.

    Keep up the good work. Can’t understand how you can code for the very many different variations of human speech notwithstanding the difference between male and female.

    Rgds
    Richard

    1. Thanks Richard. Passing the wife test is an important step!

      There are some general features between all speakers – for example the only real difference between male, female, and children is the pitch. Different input filtering and acoustic noise are tougher problems.

  16. I have been following the development of CODEC 2 for some time now, and I must congratulate Dave on his tremendous achievements. I have spent a lot of time in my work listening to MELP transmissions at data rates from 2400 to 600 bit/s. At comparable data rates, CODEC 2 is, without a doubt, the closest competitor. That it could be used at 2400 bits/s for high quality audio is probably assured.

    However, its use for HF voice transmission has not been as successful as I think we, as digital voice proponents, had wanted and hoped it would be. Given the extreme difficulty of matching analog SSB voice intelligibility at useful SNRs, I understand the new goal is to achieve better than low SNR analog SSB performance by means of a low bit rate version of CODEC 2.

    Whether or not the 700 bit/s mode is preferable to low SNR analog SSB certainly does remain to be seen/heard. Without doing the actual math the risk appears quite high that it will not perform much better than CODEC 2 1600. With the data rate only 3.6dB lower than 1600, even if somehow it requires 6dB less SNR, that’s only one S-unit, not hardly enough to make it competitive with low SNR analog SSB based on experiences with 1600 bit/s performance.

    I do very much agree with Phil with respect to bandwidth. If the goal is to exceed the performance of low SNR analog SSB, then is no benefit to trying to achieve less than 2.7KHz bandwidth. If performance can be optimized by use of a full “standard” SSB voice channel bandwidth, certainly take advantage of that.

    The other competitor to digital voice, when used as a method for better intelligibility, are the various noise reduction algorithms available for processing analog SSB audio. Using a high quality receiver, I can (not everyone can), with some ease, copy SSB down to say two to three S-units above the noise floor depending upon the quality (frequency content) of the source audio. This is extended to one to two S-units above the noise floor when using the “NR2” noise reduction implementation in PowerSDR mRX.

    If the performance of the 700 bit/s CODEC can be transmitted in a way that obtains reliable performance at 12dB SNR (say two S-units above the noise) then it might be a viable method, with the argument then falling to whether or not the artifacts inherent in the speech coding make it more or less intelligible (or fatiguing/pleasing) to listen to. Certainly many of you are familiar with PESQ, which is a measurement standard for determination of speech quality. Indeed, a Google search showed a relevant Rowetel blog discussion in 2012. In 2012, I did subjectively agree with the low PESQ scores obtained for the CODEC at its current state of evolution at that time.

    Listening to the initial samples, certainly CODEC 2 700 is quite good when compared to the “Gold Standard”, MELPe 600. Nevertheless it’s only usable if it can a) be transmitted with sufficiently low BER at useful SNRs (e.g. ~2 S-units above the noise floor) and b) if its PESQ score is sufficiently matched to that of analog SSB voice under the same conditions (with and without noise reduction). Both are extreme challenges facing quite a bit of risk to achieve.

    1. Hi Scott,

      Good to see we are in the ball park with MELP at 600 bit/s.

      Agreed there are risks in meeting the goal of surpassing SSB at low SNRs. This is R&D, the bleeding edge of experimental radio. It wouldn’t be fun otherwise. SSB has been around for 50 years for good reason. No one said it would be easy, or that we are there yet.

      There is some confusion in your units. SNR can’t be defined by s-points over an arbitrary, subjective noise floor. It is defined as signal power to noise power in a given bandwidth. An even better measure for modems is Eb/No. FreeDV 1600 already works at 2dB SNR on AWGN channels. A 6dB gain would get us to -4dB SNR, where analog SSB is a very difficult copy.

      The other factor is the cohpsk modem, that greatly outperforms the DQPSK modem used for FreeDV 1600. There are reports of earlier FreeDV 700 modes outperforming SSB at low SNRs. The weak link was the earlier 700/700B codecs, hence the latest work on 700C.

      There are precedents. Commercial HF DV systems (e.g. Envoy from Codan) that work at -3dB SNR. From a theoretical standpoint we have calculated the lower limits of SNR for Digital Voice, which lets us know there is still room for improvement. This means it is an engineering problem. It is not impossible. No laws of physics need to be broken.

      There are also niches – for example perhaps we can offer analog noise free communications at SNRs around 0dB where SSB is copyable but fatiguing to listen to. I prefer FreeDV 1600 at high SNRs (8dB) over SSB for that reason.

      Last of all – Codec 2 and FreeDV is not finished. This is not the best we can do. So we will keep evolving the components until the goal is met.

  17. Thanks for the link to the SNR calculation. Agreed that S-meter levels are not SNR, specifically, and that Eb/No is a good figure of merit. Nevertheless, the lower SNR limit of -8dB seems somewhat unrealistic in practice. Time will tell.

    From a more practical standpoint, hams measure things in terms of S-units above the noise. Since we regularly make difficult SSB contacts near the noise level then 700 needs to work at least that well. After that it becomes a question of PESQ. Leaving actual SNRs aside, the motivation for making contacts at a given PESQ score or difficulty level is directly related to the ham radio operation at hand: “important” DX, run of the mill DX, contesting, ragchew, nets, emcomm, etc. If the mission and message is of high value then clearly a poor PESQ score will be acceptable. Given the robotic nature of low bit rate CODEC speech, it is likely that even a completely successful implementation of CODEC 2 700, i.e. an implementation that achieves the -8dB SNR theoretical limit, with PESQ scores equivalent or even perhaps a bit better than MELPe 600, will nevertheless see low adoption rates. Not only do most operations eschew poor PESQ scores, but also the latency associated with digital voice. No doubt there will always be a small population of DV aficionados (myself included). And this might be just the thing for MARS operations, particularly since encryption can be applied easily. Although even for MARS there will be a fine line between digital voice and digital text.

    At any rate, I will certainly be among those experimenting with your remarkable acheivements, even if it remains a small community.

    1. Hi Scott,

      I am not proposing 700C will replace all SSB communications. It’s possible to support higher quality codecs at higher SNR. It’s also possible to support higher quality in the same waveform given higher SNR.

      SSB is not a gold standard for subjective quality, it has significant issues even at high SNRs (that hiss, narrow audio bandwidth, pops and clicks). Many people object to the quality of SSB at any SNR. It’s just what we are used to.

      Low latency is possible with digital voice (e.g. the cell phone network), and the FreeDV modes are designed to sync up quite quickly, the SM1000 boots like an analog radio.

      I listened to the Codan Envoy 600 bit/s samples, and I agree – Codec 2 is competitive.

      1. One nice thing about SSB that’s difficult to emulate with digital voice is its linear, additive nature. If two stations transmit SSB at the same time, you simply hear the sum of their voice signals. With most digital schemes, you hear nothing unless one is sufficiently stronger than the other. Then you only hear the stronger one.

        But a digital system can be built to behave (to the user) just like SSB, by giving each transmitter its own channel, time slot or spreading code and having the receivers demodulate them all and sum the audio after decoding to PCM. The same system can do neat things like place each station at a specific point in the stereo spectrum, somewhat mimicking the experience of a “real” in-person round table. And it can show which station (or stations) are talking.

        Although SSB is now ancient as modulation methods go, it has still benefited from other technologies. Digital filters have certainly helped, but probably the most important development is the modern TCXO, especially when slaved to GPS. Since I wasn’t all that active on the bands until I retired about 5 years ago, I was actually rather surprised at how good modern SSB transceivers have become now that 1 Hz accuracies on HF are typical. What a difference from the tube rigs of the early 1970s when I became a ham!

  18. Latency on the cellular network ranges from bad to horrifically bad. Take two cell phones and call one from the other. You’ll experience 100-500mS of latency. Most voice calls are still being completed with the 3G part of the network. Latency should be better with 4G, but not a lot of calls are going that way yet (e.g. VoLTE). Such latencies cause big problems with normal speech patterns. I notice this all of the time, and when we have work telecon’s participants who are on cell phones can have great difficulty interjecting a comment properly.

    In the analog SSB world, when you start to exceed 200mS of latency and it can be hard to bust DX or contest pile-ups, and it can even be hard to take part in free-form (not round table ) multi-station rag chews. 200mS of latency is actually not unusual on first generation SDRs, or for those who might be using digital audio workstation software, or for those who are operating remotely over the internet.

    1. There’s no *inherent* reason why latency should be so bad.

      I don’t know about LTE (it’s after my time) but IS-95 CDMA used 20 ms vocoder frames. Processing latencies should be small in comparison, meaning on a local call you shouldn’t see much more than twice this delay on a round trip. Say 50 ms in round numbers.

      Long distance can of course be a different story. Internet round trip ping times from San Diego (where I am) to Baltimore MD (where my parents live) are about 85 ms minimum.

      Let’s look at the each of the contributing causes to this minimum delay.

      About 9-10 ms is in the local cable TV plant, since cable modem upstream transmissions have to be polled by the head end.

      The next factor is the speed of light delay. Remember that the speed of light in fiber is inversely proportional to the refractive index, which is usually about 1.5, so the speed of light is only 2/3 c. Also, the network routes do not follow great circle paths; just about all of my traffic, for example, gets routed up to Los Angeles before going anywhere else.

      A third factor is store-and-forward latency. Even when the links are idle, a switch or router generally won’t begin to transmit a packet until it has been fully received, so the delay depends on the number of router hops, the size of the packet, and the transmission speed of each link. Most main Internet backbone routers have links running at 10 Gb/s, so for small ping packets the delay is almost negligible (80 ns for 100 bytes) even with 20 or so hops in the path. The router also takes time to decide how to route the packet, but this is usually even smaller than the store and forward delay. These delays can be larger for edge routers and switches operating at much slower link speeds, but even at 100 Mb/s it’s still only 8 microseconds.

      That brings us to the variable component of the delay. Links are “stochastically multiplexed”, which is a fancy way of saying that packets are queued up, usually in first-in, first-out order, so they can be delayed when the links are busy. This shows up as a variable delay from one packet to the next, and it can be substantial in a heavily loaded network, especially if it is not well engineered. Because interactive speech is a real-time application, the receiver often deliberately adds delay to allow data frames to be passed to the decoder at regular intervals even when some fraction are delayed in transmission by long queues. There’s an inherent tradeoff here; make the “playout” buffer too long, and you annoy the user with excessive latency; make it too short, and you annoy the user with a lot of dropped packets that in fact were received too late to be of use.

      In ham radio where we build and operate our own transmission links and switches, we can control all these factors to keep delays from becoming excessive. Also, since most ham operation is half duplex, delays that would be annoying in full duplex operation are often just not a big problem.

    2. Regarding 200 ms latencies in first generation SDRs, this is not inherent; it’s due to poor buffering and/or scheduling control.

      This has gotten a lot better with current hardware and software. Even Linux, which is not designed as a real-time OS, has an audio system that lets you trade latency off against the inefficiencies of processing tiny buffers. I’ve been playing with the AMSAT UK Funcube Pro+ dongle, which I/Q samples at 192 kHz with 16 bits per sample. I typically use 4K sample blocks, which is about 21 ms, but I can easily work with smaller blocks without excessive processing overhead. The problem gets better at higher sampling rates, at least until you downfilter and decimate to smaller bandwidths.

      The only truly unavoidable delays are in filtering: the sharper the filter, the longer its latency must be; that’s the uncertainty principle. So if you really *must* have that near-brickwall filter, DSP will give it to you but you’re gonna have to wait to get the results.

  19. I dont expect volte to be better than legacy networks, after all its voip.
    With the traditional circuit switched mobile networks here in europe i never faced any noticeable latency, since my first contact with gsm in 1999. All was kind of on par with pots. However the sorts of sprint n verizon were an insult to my ears, so much noise, intelligibility to me was, as a non native speaker kind of below zero. The more you get used to it, the more you understand. And from your comments it seems not to have improved since 2008…

  20. Sharp filter skirts do not necessarily equate with long filter delays. At some urging from me PowerSDR mRX now has minimum phase FFT filters. Combined with some improvements in buffer architecture, it is now capable of providing TX and RX latency in the sub 30mS range with brick wall skirts. Given similar or better performance in the IF DSP radios (e.g. K3) and the 7300, I suspect those radios are using either min. phase and/or IIR filters also. The phase ripple in such filters has proven to be immaterial given what HF propagation does to the signal enroute to the receiver.

  21. Something you may find interesting. I am an american and I am somewhat used to the british accent. Listening through the small speakers in my notebook, I found the samples at 700 Hz more legible than the 1300 versions. It seemed to lessen the effect of the accent.

  22. I made another pass through my stripped C version (removing all the other bit rates), because I noticed some more dead code.

    Holy cow, there’s nothing left, ha. This mode does away with all that lpc/lsp stuff, and as far as I can tell performs some sort of voodoo.

    I just have to work it into the coherent PSK library now.

    I guess what the world needs now is a coherent OQPSK modem :-)

  23. Hello Steve,

    Good to see you working on the code, it’a also a great way to learn. I’m happy to blog further on the DSP magic – or try running the Octave newam1_fbf simulation which plots the internal waveforms of 700C and lets you single step through frames.

    I hadn’t actually thought about using OQPSK for the DV work, the OQPSK post was just me helping out a friend. However it’s not a bad idea, we may get a better PAPR using OQPSK, although having multiple tones may par back that advantage of OQPSK. The other issue is the OQPSK sync algorithms I have used so far are too slow for PTT operation – more suited to continuous operation.

    I’d like to modify the cohpsk modem to be OFDM and try that. This would mean closer tone spacing so a higher bit rate (useful for adding FEC), and smaller buffers as we would not be filtering each carrier any more (OFDM uses simple integrators). Smaller buffers means it will fit on embedded platforms like the SM1000 easily.

    But first I want to get some feedback of 700C Over the Air.

    – David

    1. I’ll run the Octave newam1_fbf simulation. As I am interested in the algorithms you’ve designed. I just finished a Java translation of 700C, but no testing yet. I’m sure it is another 10 edits to go after running it. The Java is not very useful in the wild, but lets me experiment with networking and GUI ideas easier.

      The OFDM voice modem by AOR had a huge number of carriers. I think it was 36 at 50 Bd. But they were trying to do 2400 + FEC. They used a preamble, and a kalman filter I believe. I seem to remember pushing a button if it lost sync, so maybe it reset the kalman filter back to what the preamble measured. Dumping the tracking coeff’s.

      I think I asked Mr. Brain about this at the time, and he referred me to the 1982 Hsu kalman paper: https://goo.gl/yInfqz

      I can see where these studies can provide a lifetime of study and experimentation. Too bad I spent so much time on tubes and relays…

      1. Thanks for the equaliser link – so far we have taken an approach that doesn’t require equalisers for our HF waveforms. Quite a challenge to implement fast syncing modems on HF.

        Yes I can see a few more years ahead of me in this work, and some of the Codec 2 source has date-stamps in the early 1990s.

        To verfiy your Java port consider using the tcohpsk.m Octave script. I used it to verify the C port. It compares test vectors from your port to the reference Octave version of the modem. Very powerful way of finding bugs and you will learn a lot as you chase down the bugs.

        1. I’m not a real fan of preambles (brrrrreeeep), ha. All those formulas do sift down to some smallish code. The greek complex math is kind of scary.

          I’ll check my output. I have checked that it plays through FreeDV OK, but that is only half the test, but an important one.

          The one thing I saw in military low-bit rate modems, is that some of them use a throat microphone as well as a regular noise cancelling microphone. The throat mike being used for noise cancelling. The examples were inside a helicopter, and sounded pretty good at 600 bps.

          I guess the throat mike would signal windows of voice, and then hard filter outside those windows.

          1. Ack, I forgot I was going to include a sample of my modem output signal. It reads into FreeDV OK.

            https://goo.gl/VgQZyQ

            This is hts1.raw through the Java 700C and then through the Java cohpsk modem.

            I’ll look at how to do these comparisons using the Octave tests.

  24. The reason for digital transmission is the possibility of of near-perfect relay if the SNR at each of the relay stations is high enough. If you’re trying to compete wth SSB in operating at extremely low SNR, introducing an error on the transmission side (by quantization) is a handicap. Why not have a mode with _analog_ transmission of analysis results (pitch and signal intensities in each spectral band)?

Comments are closed.