Yeah that makes sense - to get the high compression rates the assumption is made of a single voice. We fit a model around that. Break that assumption and you break the codec.
Correct for VOIP, the main target for codec2 is digital radio where the same overheads don't apply. For VOIP there are some tricks - if you concatenate many channels in a single IP packet (say for trunking between sites) you could send 32 calls in the same bandwidth as a single 64 kbit/s channel. For Voice over 2.4GHz Wifi you could consider breaking 802.11. For example the minimum bit rate is currently 1 Mbit/s. That could carry 500 x 2000bps calls if the protocol was modified. As it's unlicensed spectrum this is possible and legal, like running cordless phones and toys on the same spectrum. Alternatively we could come up with a 2000 bps Wifi waveform and get a 26dB power advantage for longer range, non line of site etc.
Long range is possible. Low bit rate means more energy per bit, so less chance of a bit error over a given channel. So with s similar power output a 2400 bps codec has twice as much energy per bit than a 4800 bps codec. This can translate to longer range.
The tones in Chinese are short term variations in pitch. It's not really that different to the way we use pitch to convey emotion and questions in English, although perhaps the variations are faster. Codec2 explicitly analyses and encodes pitch. So it should be fine. I'm learning Mandarin myself so will do some tests with "Wo de LaoShi" (my teacher) soon
if we include an "erasure mode" this type of codec is pretty good at handling packet loss, as it is easy to interpolate between two adjacent frames. CELP type codecs have a lot more memory so tend to be less robust. Also conversational speech has only about a 30% activity factor, so 7/10 packet losses will be in silence of background noise frames.
I haven't even considered CPU load yet - remember this is just the alpha 0.1 release. Much can be done.
Latency won't change when it moves from an x86 to a DSP chip, it's define by the algorithm. As in my comment above it will remain at about 40ms, similar to other speech codecs like Speex, GSM, G.723 etc
David Rowe, the author here. The latency is about 40ms. The encoder accepts buffers of 20ms (160 samples) and the decoder outputs buffers of 20ms (160 samples) So assuming zero transmission delay, you get your first output speech sample about 40ms. It's comparable to cell phone codecs like GSM, and fine for real time communications.
Actually license issues put me to sleep, so I just took the first random choice that came to mind which was GPL, then followed swiftly by LGPL when some one complained. But then again, I am the sort of guy who gets excited for frequency domain speech coding.
This is an area of codec2 I would like to work on. Can you please send me some sample files of fireman speech corrupted by background saw noise? This would be a good start. The good news is with an open source codec this problem can be addressed - with a closed source codec your are stuck.
drowe67 writes: "Li Yuqian (Beijing, China) and I (Adelaide, South Australia) just made the first VOIP call using the IP04 Open Hardware IP-PBX. Unlike any other PBX projects the IP04 hardware is free (as in speech). Anyone is welcome to copy, modify, manufacture and hack the design. The hardware was designed using open source gEDA CAD software, and it even runs uClinux and Asterisk. Could these be the freest phone calls ever? Even the inventor of the telephone Alexander Graham Bell was wrapped up in 19th century patent wars over his hardware!"
I am building a business based on Open Source Hardware: http://www.rowetel.com/blog/?p=14. Open hardware also has potential to help the developing world: http://www.nextbillion.net/blogs/2006/09/05/is-op