Follow Slashdot blog updates by subscribing to our blog RSS feed


Forgot your password?

VoIP Calls Double In Quality 116

anthm writes "From Newsforge and LinuxPR
FreeSWITCH, an open source soft-switch and IVR platform, have announced that they can support 16khz audio calls thus doubling the potential voice quality. They have had successful tests with a conference bridge, a pass-through SIP call and an IVR that reads RSS news feeds with the Cepstral Text-To-Speech Engine." has a good list of business VoIP providers.

This discussion has been archived. No new comments can be posted.

VoIP Calls Double In Quality

Comments Filter:
  • Everything else is stuck at 8khz, so unless your call uses this service end-to-end, there's going to be a downconversion if you're calling someone on a land line. And you'll be stuck with 8khz if you get any calls from someone not on this service.

    Still, its a good piece of news, onward and upwards.

    *crosses fingers* Please nobody mention video phones. *crosses fingers*
  • Good Work (Score:3, Insightful)

    by kasgoku ( 988652 ) on Monday July 17, 2006 @12:14PM (#15731981) Journal
    good work there, but all you need is to get the message across. its not like u r singing on the phone and need good voice quality. just do what's needed.
    • Re:Good Work (Score:2, Insightful)

      by Tychon ( 771855 )
      But for those of us with a bit of trouble hearing, or when speaking with a person that has a thick and or foreign accent, that extra quality is the difference between a conversation and a stream of "What'd you say?"
    • Think of the hold music.

      Now imagine that it responds to button presses so you can change songs.

      "Operator... oh won't you help me make this call..."
  • So what? (Score:5, Insightful)

    by Spazmania ( 174582 ) on Monday July 17, 2006 @12:16PM (#15731996) Homepage
    So what? If you're going to up the sampling rate why not go directly to 44khz stereo (CD quality audio) and be done with it? Jumping from the telephony industry standard 8khz to 16 khz is thoroughly uninspired.
    • but it needs to be done in a way that won't 1) swamp the internet backbone with huge quantities of digitized voice telephony, and 2) give ISP's a good excuse to insist on multi-tiering the internet.

      That said, can video telephony and the kind of communication we've seen portrayed on Star Trek et. al. be far behind?

    • Re:So what? (Score:3, Interesting)

      by xachen ( 967588 ) *
      If you find a codec that does 44kHz stereo, FreeSWITCH will do this. It has no hard limit in it and is variable to any rate! This is just awesome!
      • If you find a codec that does 44kHz stereo, FreeSWITCH will do this. It has no hard limit in it and is variable to any rate! This is just awesome!

        Get foobar 2K and grab the free mp4+SBR codec from nero. You can turn a cd quality stereo signal (mp3, whatever) into a svelte, 16kBIT/s 44khzs/stereo (!) signal without much quality loss (well at least compared to current telephony anyway...)
    • What would be the point of going to 44khz stereo for a mono signal?
    • . . .and I wouldn't call it a doubling of quality. The improvement of adding one octave of high frequency is, subjectively, less than a doubling. A human voice is quite intelligible with no frequencies above 4 kHz or so present.
      • Dude.. have you never worked in a call center ? I would kill to have a phone system that runs at 16khz, better yet 32khz. Don't double the bitrate, maybe a 30% increase would be enough, just move the filter cutoff freq higher because not everyone's voice has intelligible transients in the low-khz range, often times those voices get wrecked by the filtering and all you can hear is mumbling, as if the caller were talking with the mouthpiece in their armpit :P

        Higher frequency from the source, then less aggre
        • No, I haven't. Maybe that's my problem, I don't like talking on the phone to begin with, so I don't do it anymore than I have to, but I don't think it's because of the audio quality. ;-)
          Thanks for your comment - whoever designs and buys phones and voice networks should obviously give more weight to your opinion than mine.
    • It's ok if you are not interested but there are some who are. Here is an article about some of the benefits. [] as well as a wiki entry from [] []

    • So what? If you're going to up the sampling rate why not go directly to 44khz stereo

      Because stereo would be a complete waste of bandwidth and processing power (one microphone, one speaker), and the human voice doesn't get anything near 22khz in frequency. Normal speaking voices have an even lower cutoff frequency. The CD standard is great for music, but complete overkill for sending voice.
    • YES! While we're at it why don't we all invest in stereo microphones and headsets. That way when we talk to our friends through the internets we can talk around the mic and make it sound like we're swirling in their heads... GENIUS!
    • Our voices don't have that wide a frequency range, there's little up in the high frequencies. A voice sample recorded at 22kHz (11kHz frequency range) is very hard to distinguish from one recorded at 44kHz (22kHz frequency range). In fact you'd need to be using a fairly good mic to really get much of the higher frequencies anyhow. 8kHz works since F1 and F2 (the frequencies of the first two peaks in the harmonic curve) fall under 4kHz for essentially all speakers. F1 and F2 are what we primarly use to deter
    • I wish some people here actually KNEW something about the telephone network. First off, there are still hundreds of thousands of miles of copper wire in the network. Much of it is connected to 'loading coils', which are essentially low-pass filters. Any frequencies over 4kHz are attenuated, so your 44kHz is just a dream. Telephone engineers knew that; that's why they picked the 8kHz sampling rate (Nyquist theory). Second, as someone else pointed out, there remains the question of getting every single telc
      • You missed the point: with internet connections rapidly reaching video speeds and the telephone network very much tied to 8khz there is no value in having a 16 khz VoIP. If you're going to up the sampling rate only for VoIP, go straight to 44khz and be done with it. Don't brag because you were dumb enough to select a median value.
    • So what? If you're going to up the sampling rate why not go directly to 44khz stereo (CD quality audio) and be done with it? Jumping from the telephony industry standard 8khz to 16 khz is thoroughly uninspired.

      16kHz is pretty similar to analog FM radio transmissions and people have been listening to music on that medium for a long time quite satisfactorily. Besides, if you want to have high fidility transmission of music over the internet, there is already pretty decent solutions with streaming ogg/mp3.


  • Define: IVR (Score:4, Informative)

    by theGreater ( 596196 ) on Monday July 17, 2006 @12:18PM (#15732019) Homepage
    Google gives the definition of IVR [] as Interactive Voice Response.

    So I knew what one was, I just didn't know there was a TLA for them. This inane personal revelation brought to you by the captcha "accuse".

  • Doubling? hardly (Score:4, Insightful)

    by MacBoy ( 30701 ) on Monday July 17, 2006 @12:23PM (#15732060)
    I fail to see how adding one additional octave of frequency response to the 6 or 7 currently available, can be called "doubling" the quality.
    • Correct me if I'm wrong, but I'm assuming this means the digtal to analog conversion rate. This means it's sampling the analog audio 16,000 times per second instead of 8,000 times per second. Which in theory is double the "quality".
    • I thought this number referred to the sampling rate...

      You're thinking 8bit audio to 16bit audio.

      • Re:Doubling? hardly (Score:2, Interesting)

        by slyvren ( 989423 )
        Actually 8 bit to 16 bit is far greater than double quality. The quality essentially doubles everytime you add a bit.
      • The maximum frequency that can be carried is proportional to the sampling rate -- if I recall correctly, the highest frequency that can be carried is half the sample rate. Sample 8000 times per second and you can carry up to 4 kHz. At 16000, it's 8 kHz. People can hear up to about 20 kHz, so this does increase the frequency range. Since 'going up an octave' means doubling the frequency, the previous poster was correct. The end result is only to raise the maximum frequency by an Octave.

        The bigger proble
    • I've got a buddy who uses VOIP, and I can assure you: the quality of his phone calls to me has not doubled. It's all the same old "Dude, there's this chick on tv right now, I'm not sure which channel, who is like majorly hot. Turn it on!"
    • It won't. While there's a noticeable difference in quality when listening to 8KHz and 16KHz sampled speech, it certainly won't double the perceived quality. Even more so if it's in a VoIP context, where other factors such as the loss rate and distribution, forward error correction and the choice of codec (which tend to be of the non-PCM kind) play such big roles. Just my 2 cents...
  • We're a Cisco VOIP shop and phone conversations sound fine. I'm not sure how going from 8->16 would make it any better.

    • It does make a difference. 44KHz would be ideal, but 16 is good. The original 8KHz is a carry over from the old telecom days. That's how much uncompressed voice data they could carry over a single copper line. So in essence voice quality really hasn't improved much on telephones since the 80's.

      It would make understanding people who mumble, have poor english skills, lispers, etc, etc, significantly easier. 44KHz would be ideal, but 16 would be an improvement. I'm pretty sure however that many VoIP soft sw
  • by Rob T Firefly ( 844560 ) on Monday July 17, 2006 @12:30PM (#15732110) Homepage Journal
    This can only mean twice as much material filling up the tubes. []
  • I wasn't aware that telephones even HAVE "definition", let alone that they are in HIGH DEFINITION now.

    4. a. The clarity of detail in an optically produced image, such as a photograph, effected by a combination of resolution and contrast.
    b. The degree of clarity with which a televised image or broadcast signal is received.

    Of course, what do I know... I didn't realize wireless networking equipment had fidelity, either (ie. WiFi).
  • by riflemann ( 190895 ) <riflemann@bb. c a c t i i . n et> on Monday July 17, 2006 @12:36PM (#15732153)
    Actually, I've used Asterisk to pass through 24KHz Speex encoded audio - very impressive sound quality, but only works when the SIP channel is client to client.

    In theory a SIP server doesn't need to know all of the codecs a client supports - the clients themselves negotiate any compatible protocol.

    Of course, if the sip server puts itself in the path (such as when it needs to pass through to PSTN or firewalled clients), then 8KHz is the (till now) maximum supported rate.

    • Actually, I've used Asterisk to pass through 24KHz Speex encoded audio - very impressive sound quality, but only works when the SIP channel is client to client.

      Care to provide more info on this. Speex is *not* optimized for 24 kHz so it would probably sound worse than 16 kHz or 32 kHz. If the devs are indeed using 24 kHz, it's probably a bad idea that would be fixed. (BTW, I know what I'm talking about -- I wrote Speex)
  • will be able to clearly understand me when I say, "I can't talk now, my leg is on fire."
  • the submitter is the author of the code.

    Move along, nothing to see here yet
  • Big Whoopie (Score:3, Insightful)

    by jmorris42 ( 1458 ) * <> on Monday July 17, 2006 @02:11PM (#15732317)
    The problem isn't making a software based IVR system or even a softswitch run at a better rate. Now find me a SIP phone that runs at anything other than 8Khz. No, I'm not talking about a F/OSS softphone, but a real hardphone. They have the minimum DSP power the manufacturers can get away with to support 8Khz. Now find me a PRI that can interface with it. For now that is still an issue.

    Skype has been running their softphones at higher than 8Khz/8bit so their softswitch obviously was the first widely deployed one to leave 64kbit max quality behind.

    Yes, someday all telephony (except legacy telco stuff that will never change, which will be a shrinking market) will offer higher quality audio and an option for video. But not for a few more years until the saturation of next gen telephony products gets better.
    • I mean sure I can route a call through Enum or DUNDi (Well... my DUNDi peer group only has 2 nodes right now, so that's kind of pointless) and it could be pure digital. I've yet to find softphone I/O solution that doesn't suck (Maybe a bluetooth headset would be OK if it could push that sort of quality) so it's still much easier to dump the call out to an old $10 wireless RadioShak special via the digium FXS card.

      The VOIP to PSTN scene kind of sucks at the moment anyway. There are a lot of fly-by-night op

  • 8khz to 16Khz is fine, but that's not usually the problem we encounter with VOIP. It's latency and dropped packets, which this will just make worse. But if you're doing this on your own network only then I can see where this would be neat.
  • by Anonymous Coward
    They're just using a higher quality codec than G.711 (which is the standard for the back-end digital phone system).

    The phone people (probabably AT&T) chose that standard since it gave pretty good voice quality given the limitations of current technology.

    People are generally happy with the voice quality of the phone system - which is different from the voice quality of the last mile - the analog copper loop to your house, or CDMA/GSM/TDMA to your cell phone.

    It's highly unlikely this new codec will catch
  • Marketing BS (Score:4, Insightful)

    by jheath314 ( 916607 ) on Monday July 17, 2006 @02:17PM (#15732390)
    This "improvement" is idiotic. The thing which most limits the quality of a VoIP call is delay and jitter, NOT the sampling rate. Guaranteeing the quality of a telephone conversation over the internet is tricky because the internet was originally designed for best-effort packet delivery, with no guarantees on packet delay, sequence, or even (at the network layer) delivery.

    If anything, this feature reduces end-to-end quality by doubling the amount of data being sent down the pipe, as you'd need to buffer more data at the same transmission speed to correct for jitter. Brillant!
    • Re:Marketing BS (Score:2, Informative)

      by anthm ( 894202 )
      FYI: 20ms of 16khz audio (the typical size of 1 RTP packet) encoded with the Speex Codec [] is 43 bytes. 20ms of 8khz audio encoded with the Speex Codec [] is 29 bytes which is only 1.4 times as big as it's 8khz counterpart. 20ms of 8khz g711 is 160 bytes so with speex at 16khz, you can still fit 3 calls in the same amount of bandwidth that it takes for one 8khz call. The biggest overhead in VoIP is the various headers on each RTP packet per level of encapsulation,
      • Thanks. That's something a lot of people forget. Actually, the overhead of the headers is usually 16 kbps, i.e. about as much as the codec data itself. That's also why very low bit-rate ( 8kbps) codecs are (almost always) useless in VoIP.
    • Guaranteeing the quality of a telephone conversation over the internet is tricky because the internet was originally designed for best-effort packet delivery

      There's more to VoIP than the Internet, you know. Some of us work with lines which are guaranteed big enough or have QoS.
  • Is it even a difference human ear can notice? I mean, VoIP calls today are pretty good..
    • Try doing a Skype call to an international country and you'll see the difference in "reception" (quality). Probably not entirely Skype's fault, but any improvements will make a difference.
    • The more quality, the easier it is to perform detection algorithms for things like speech recognition, and yes you can notice the difference as long as the audio was generated digitally by a microphone+soundcard or with something like cepstral that defaults to 16khz for a reason.
  • It's more complicated than doubling the sampling rate. Standard PCM telephony uses 8 kHz sampling rate, 8-bit samples, non-linear encoding. It's fairly simple, resulting in 64 kbps.

    Speex is a CELP (code excited linear prediction) codec that is far more complex than the simple PCM system used by the telephone company. The resultant bit rate can be fixed or variable, and is not rigidly tied to the sampling rate used for data acquisition.

  • So it's 10 times better than the Evil (tm) telcos!

    And my software puts a green stripe around the edge of the data too... sucka!

  • OMG, at this rate, we'll have 64 kHz calls in 6 years, and 128 kHz in 12 years!!!!

    (Going from 8 kHz to 16 kHz isn't a "doubling of quality" :-P )
    • OMG, actually, we can actually operate at any sample rate we want! 16khz was just a logical test because the phone we tested it with supported it.
  • Theoretical maximum, may be as low as 3.

    Second, this is enough to capture most of a human voice. Can you hit a high "C"? That is about one kilohertz.

    Everything above 1kHz is being used to carry ever-dimishing harmonics that provide resolution for fast-rising sounds like "k" and "p". There's a slight loss of detail at 4kHz and very little at 8kHz. There is no honest way to refer to a move from 8 to 16 as "doubling the quality". Sycraft-fu's post has it right. In fact, if I were designing the system I'd put i
  • For those of you who are not IRC junkies, the IRC client KVirc [] has built-in support for 44.1 KHz "voice chat" (not sure if it qualifies for "VoIP", but is a simple direct connection between two computers supporting real-time audio transfer). Not only does it support 44.1 KHz, but it has for at least a year (when I started using it). What's the big deal with 16KHz?
  • by jmv ( 93421 )
    That's called wideband speech. It's been around for 10+ years and Speex [] supported it about 4 years ago. About time people actually use it (i.e. why people are still using narrowband in VoIP is beyond me).
  • That's a bit of a retarded demo for the technology: every techie's instincts are screaming "why not just transmit the RSS and convert to speech at the client?"

The human mind ordinarily operates at only ten percent of its capacity -- the rest is overhead for the operating system.