Do you develop on GitHub? You can keep using GitHub but automatically sync your GitHub releases to SourceForge quickly and easily with this tool so your projects have a backup location, and get your project in front of SourceForge's nearly 20 million monthly users. It takes less than a minute. Get new users downloading your project releases today!
Almost all leading ASR products now have the ability for continuous speech with speaker independence (read: no need to teach it your voice). Dragon products suck (to date, lead the pack for a short while, though) -- L&H and IBM are significant consumer/commercial providers with great products and there are others thought of as more R&D (not for joe user's PC) that are even better.
USC has some fascinating research into this (was posted on/. a while back and is in this month's Wired magazine) which achieves better-than-human results for recognizing words when background noise is 400%+ louder than the speaker. The military is interested as are commercial shops. Let's see if they can get it out of the "command and control" mode of listening for key words and into the continuous dictation realm.
Ohh im sure it will find a niche but this is kind of like inventing the mouse before you bother to finish putting all the letters on the keyboard. Sure most people will find it useful in a limited number of situations but in most of these situations other options are availible. In both your car and office it would be quite easy to put a large, more powerful, computer unit which can interface with your PDA.
And while I like the idea of it taking notes at a meeting speech recognition is nowhere this good yet. Secrateries would never take dictation anymore. My guess is it uses a sort of graphiti for the voice...only responding to a certain small pretrained set of words.
And the supposedly true story of one of the early windows demo of a speaker independent voice enabled system where various audience members shouted: "Format C:" "Return" "Yes" "Return"
There are some areas where it would be a nice addition to a standard PDA. Not to replace the handwriting recognition but as an alternate entry method. I could see myself using it in the car in a hands free manner while driving. I know in previous jobs we did medical form/record entry and we would have liked to have recognition as part of it. Many of those customers use microcassette recorders that then need to be sent to a transcriber. Those areas with limitied technical vocabularies would be well suited. In a cubicle environment it's crazy. The woman over the wall drives me crazy with listening to her voice mail with her speaker phone as it is now... then again, the ability to yell "format c: return Y return" is tempting 8^)
Voice is a step backwards in many environments. Staff density in businesses is increasing, so much so that even trying to hold a normal (short)phone conversation is getting difficult. There are times that staff near me are yelling down the line at someone on a mobile so loud that I can't even think to type. The place I work at has been so successful recently that if it doesn't start packing staff in like diners at a sushi bar we're going to run out of room. Imagine if every one of those spent most of the day talking at their computers. Meanwhile all you hear from me while I'm composing an e-mail (or a/. post) is the quiet tapping of keys. Sure, voice recognition is useful in some applications, but I'm more a fan of gestures - not writing on a pad, but hand movements in the air. Granted that waving your fingers around in public might make you look like you're involved in an indecent act, or trying to cast a spell, but it's a lot less intrusive than voice. Anyone have any followup info to those fingernail motion sensors posted here a month or so?
Does anybody remember Nathan Spring's "Box" in the short-lived BBC Series "Star Cops"? Now that's a useful PDA.
I don't think a speech interface is going to be much use until these devices are capable of understanding a substantial proportion of possible spoken phrases. That would require incorporating a large chunk of the CYC database in a decent real-time (therefore probably analogue) neural net.
It'd be nice to think that demand for this type of feature on PDA's might fund and therefore speed up development of that kind of sophisticated AI.
Consciousness is not what it thinks it is Thought exists only as an abstraction
There are times when I wouldn't use the Voice recognition and other times when it would be great to have (driving your car, needing a note to be taken for instance). I use my PDA a lot, it's enabled me to carry just it and not a pen and paper. I 'post notes' and organize them from time to time like at the end of the day or just before I leave work. Currently I don't use it to take lots of notes during a conversation (like at work). There I still use my notebook because it's just easier to write without thinking. I use a lot of symbols and such as a kind of shorthand. It's the one area where I have a hard time with Graffitti. Also I tend to add notes in the margins and drawing s (a picture is worth a 1000 words) which is not possible with any computer I know of. If someone can overcome these short comings they'd truely have a product that should sell.
Yeah, I have a problem with talking to PDAs. Who is going to take notes in a crowded office or eve the supermarket of a project that is under NDA?
Would you dictate your personal notes where anyone else can hear them. Let's face it, you don't want the rest of the world to hear what you're doing all the time, right?
Well I for sure don't.
The best way to enter data I've ever had was on a chord keyboard. The device was called an "Agenda" and it's about a decade old. If the memory card wasn't dead I'd be using it still. It was smaller than a Cassiopeia A-10 and I could enter text into it faster that I can type on a QWERTY keyboard, which is saying something.
I would love to use voice recognition in an appropriate capacity, but it's not going to happen soon, if ever.
(Note also that using VoRec correctly requires a near total redesign of user interfaces to accomodate correlation of recognized voice commands and gestural input - notice how often you point or make another meaningful motion when you talk to someone, especially in trying to convey information, and that those gestures mean very different things from the way interfaces use pointing today.)
It's really sad, but today's best voice recognition is only marginally more useful than the CoVox voice recognition I had on my Commodore 64 thirteen years ago. Sure, vocabularies are larger now, and we can get somewhat closer to true connected speech recognition, but by and large, the lack of progress proves that despite an improvement of several orders of magnitude in processing power, (and memory speed and capacity, and storage space, not to mention DSPs) voice recognition is not a problem that can be effectively addressed by any amount of brute force processing using current methods.
I'm going to really date myself here: I've got a copy of Interface Age magazine (anybody else remember that?) in the garage (the one with the floppy ROM: a BASIC ROM on one of those floppy vinyl records they used to put in magazines) that has ads for several voice recognition systems that didn't work all that much worse that the ones we have now - I had friends that had some, back when everybody was using the S-100 bus. Speech recognition has been "just around the corner" in the computer/hacker community for as long as I can remember, and the VoRec prognosticators have been hoovering money for years out of suckers of either the investor or customer ilk who fell for "all it needs is next year's hardware and we'll be fine!"
I would love to believe in VoRec, but I see little evidence that it will exist in a significantly useful form for another ten years or so. Somebody prove me wrong - please.
[Disclaimer: In the interest of full disclosure, I work in speech recognition for SpeechWorks International [speechworks.com]. You may enjoy testing the SpeechWorks Demo Line at 1.888.SAY.DEMO (1.888.729.3366).]
A number of users have posted comments questioning the benefit of speech recognition operating on a PDA. Before examining this topic, I would like to quickly review the technical state of the art (in an effort to satify your inner-geek).
The Technology
Speech recognition operates by performing statistical matches of incoming sound against familiar words or phonemes (i.e. individual sounds; a works is composed of one or more phonemes). Traditionally, embedded speech recognition systems have featured small vocabularies (i.e. a limited set of recognizable phrases), but advances in processor speed are allowing larger and more complex vocabularies.
For applications in the telephony industry, a system running on a 500 MHz Pentium may support up to 50 lines and ten languages. These systems use a combination of directed dialog (asking for specific pieces of information - "what is your account number?") or limited natural language ("I'd like to fly from Boston to San Jose").
Returning to the PDA market, the task is to recognize a single user operating in a single language. This greatly reduces the memory and processor requirements. Further tradeoffs are possibly by adapting to the speech patterns of a single user as frequently occurs in dictation systems. But, as we will discuss below, the vocabularies are much more complex and word-spotting becomes vital.
Speech on a PDA
For simple tasks like navigation, the point and click interface works great. You get immediate feedback from the screen and you may peruse a page of information at a time. A speech based interface, in contrast, is more serial than parallel. If you are walking through a list of 50 items, your eyes will locate the correct item far faster than if the list is being read. Likewise speech recognition will not replace the keyboard for data entry. It is, however, a valuable suppliment which allows the user to jump to information not readily visible. While you're composing an email on the Palm Pilot, for instance, saying "Tell me the birthday of Jim Bob Jones" may be faster than navigating there yourself. Likewise, if you're navigating through a database of 20k companies, it may be easier to just say "Yoyodyne Propulsion Systems".
To make speech recognition useful on a PDA, the vocabularies must directly relate to the installed applications and information. Complex navigation using true natural language is a difficult and very much unsolved recognition task. But speech recognition on a PDA is even harder. Why?
Imaging that you're sitting in a cave and you hear "Dave, I'm sure that I've got it.. umm... that's not... no... Boston Sand & Gravel... come on...". You're the PDA. What does the user want? If you understood the context of the situation, you might recall the above example of company names in a database. You might say, I've got that installed and locate the entry for the 'Boston Sand & Gravel Company' for the user. But a PDA is not that smart. It needs to first pick out the allowed phrases from the noise and surrounding conversation. This is called 'word spotting'. Then it needs to decide how to interpret the phrase. Without a restricted application, the PDA must understand the context, frequently in human terms, of the speech.
If this seems hopeless with today's technology, you are correct. We will see speech applied first to limited interactions and simple applications. Over time, the domain will grow. Think back to handwriting recognition on the early Newtons. We've come a long way in a few years. On the PDA, the same will be true for speech.
I remember in that in the book Xenocide by Card, Ender used a method of speech called "subvocalizing" to communicate with his computer-friend without anyone else hearing it. Is this actually possible? If so, why aren't we doing that instead of speaking aloud? It seems to me that would solve the problem of talking loudly to a little black box where it was inappropriate. Someone tell me if this is just something Card made up.
With the current increase in voice controlled appliances some serious work should be done on the privacy and safety field. Most people probably know the exceedingly funny dilbert strip in which Dilbert has been given a voice controlled PC and Wally expresses his jealousy by "DELETING A FILE". In the Netherlands (where I live) we had a very funny commercial for a voice controlled mobile phone. A guy is standing in the street with his boss and says "mother" to his phone to demonstrate the coolness of his new device. He talks to his mother for a sec and hangs up. After that two people right next to them get in an argument and one of the guys shouts "asshole" to the other (dutch equivalent actually) at wich point the boss's mobile phone rings.....
This is of course funny and theory right now but There will come a point at wich people will want to be sure device reacts to their voice commands and no one else's. I wonder how this problem is going to be tackled. In the meantime I guess with the coming of voice operated pc to the workplace we will be getting seperate offices again to prevent us from interfering with each other's work. That is IMHO at least a good thing(tm)
You can't use it in all places of course but I know of a few places where voice control might be usefull..
In a car for instance, especially if it's connected to an on-line traffic report or on-line maps and has built in GPS. At a meeting. No more "who will be taking notes this time?"-silence before every meeting. just flop you PDA on the table and synchronize with everyone afterwards.
I think this like al other things with "limited" use will find a niche to operate in. a very large one to as far as I'm concerned.
In a PalmSource session, Jeff Hawkins (now of Visor) said that one of the concepts that he was considering implementing when he left Palm was speech recognition.
He had a prototype made out of wood (like he did for the original Pilot), and carried it around with him, pretending that it was a real unit, doing things like making appointments by voice, dictating memos, etc.
He said that "it felt wrong", both in terms of execution (it's much faster to click your way around the OS than try to short-cut through it using voice commands), and it terms of the social stigma that's attached to "talking to yourself". He didn't think that the latter would be overcome, but he also pointed out that a long time ago, even talking on a cell phone was out-of-the-ordinary enough for people to look at you funny.
I've used Dragon Dictate (on a P3/500 with gobs of RAM), and it was just too slow for my tastes even on my relatively powerful machine. I constantly had to "correct" it, but each time I did it would remember the way I said certain things for the next time. Now this is fine on a desktop with mucho storage, but what is going to happen on a PDA with limited storage? Is it not going to be possible for me to customize its vocabulary? Will I have to mold my voice to the device? If so, I see these devices having very limited appeal...
I'm the happy user of a Dragon Systems naturally speeking and I love it. I would like to take issue with your statements concerning VRsoftware. The voice to text training has an option to read back what you said. It really is pretty amazing how lazy most people are when it comes to proper pronunciation. When I slur my words, Dragon slurs my text. GIGO (Garbage In Garbage Out)! To illustrate this, use your challenge on yourself. Record someones (relaxed) voice reading 4 chapters of text and take a week to convert it to text and see what you get. _________________________
Now think to yourself, in exactly how many of those places would talking aloud (esp to a little black box)
The same thing could be said before cell-phones became more widespread. At first it will seem a little strange, but once more people get used to it, I really don't see a social problem.
Even now, it's common for people to walk down the street "talking to a little [black box] cell-phone". _________________________
Apple's Newton 2000 was demoed with voice software from Dragon Systems.
Had a 32 programmable word functioning for the demo. Full details used to be on the Newton FTP site, but Apple dropped that site. So you have to buy one of the compilation CD's that exist if you want to see it/track down the.PDF on your own.
As for Linux on a PDA - No one was talking at LinuxWorld. So no one is close enough to ship anytime soon.
I'll take the challenge, and I'll get 100% accuracy, guaranteed (assuming we're talking about normal speech, and not absurd levels of slurring).
It's always been the case that voice recognition works if you speak in a robotic, non-natural way. But I'm talking about speech to text for the masses, who will not (and have not in the past) be willing to learn to speak "the right way". Most are barely willing to go through training sessions. The reason is that for most people, they want to concentrate on what they're saying, rather than how they're saying it (which is not unreasonable).
Not to mention the environmental difficulties of having using a PDA. Will this work in a noisy airport? Doubt it.
by "large" I mean conversational English, which is far less than 31,000.
Yes, I realize they claim they can handle more than that, but the rub is in the definition of "handle". My definition says that practical text to speech should be very close to the accuracy level of dictation to a human secretary (without requiring training, I might add). Their definition is -- far from that.
The reason I made the distinction of "large vocabulary" is that we actually have reasonable voice recognition of things like digits, which are very accurate and speaker independant, primarily because there is no context required.
I still have yet to use a voice recognition system that doesn't suck. I keep hearing that "we're almost there! Processors are almost fast enough!". Yeah right.
Here's an experiment that I would love to give any voice recognition company: I speak in my natural voice and read, say, 10 pages out of a novel. You can let your system process it for ONE WEEK. This is the equivalent using a processor 4000 times faster. After that, I would expect 100% accuracy if the limitation is cpu power!.
But you know 100% accuracy won't happen. The reason is that the problem with voice recognition is not lack of computer power, it's 1) lack of "world" knowledge, and 2) lack of understanding of how to apply world knowledge to language parsing (the AI problem, in other words). Even humans have trouble understanding speech, but we fill in the inaccuracies by knowing context.
I am doubtful that we will see true large vocabulary voice recognition (beyond the toy level) in the next 20, if not 50 years.
By the way, this is not to say that the toy level might not be useful for a device like this. But I have a feeling that they will oversell it like every other YAVRF and the public will get frustrated with its imperfections.
I somehow doubt that modern speech recognition technology is sufficient to recognize instructions at a quiet-whisper level.
Let's ignore for a moment that the most recent version of MacOS seems to be just fine at recognizing individual voices. This argument, and others like it in the discussion, miss the point I think-- Even if technology isn't good enough to do it now, there is a fantastic future for voice-recognition in PDAs. I personally have held off on buying a Palm Pilot because I don't want to have to fuss with the pen all the time. If somebody manages to develop a PDA than can effectively save me the pain of having to write in everything, I will be all set to buy.
At least initially, I would be ok with a combination stylus/voice recognition interface... Saying that a crowd of people will mess up voice recognition and therefore it is lousy for PDAs seems the same argument as "trying to write on a bumpy car/bus will make it impossible for PDAs to recognize handwriting"... and yet people seem to be just fine with Palm's handwriting recognition.
Bottom line: there is a GREAT future for voice rec in PDAs...
He said that "it felt wrong", both in terms of execution (it's much faster to click your way around the OS than try to short-cut through it using voice commands), and it terms of the social stigma that's attached to "talking to yourself". He didn't think that the latter would be overcome, but he also pointed out that a long time ago, even talking on a cell phone was out-of-the-ordinary enough for people to look at you funny.
Why not just have both (stylus and voice) and allow the user to use whichever is most appropriate?
I agree that voice doesn't make sense for some OS-type tasks, but if I want to leave an interesting note for myself, or dictate and e-mail, I really don't want to deal with typing.
I think the social stigma of "talking to yourself" is a bad argument... Does anybody think it looks weird when police, etc.. use walkie-talkies? Personally, I think the idea of talking into my PDA would be kind of cool.
Besides, isn't talking into a cell phone "talking to yourself" as nobody is actually there with you. Seems like talking to your PDA would look a lot like talking to your cell phone! (OK, maybe it IS ridiculous looking...;) )
Little PDA, in the handheld aisle 4 buttons and a touch screen, it's very versatile Listen to her synchin' up now, backing up yer fi-ee-eye-iles C'mon flip it out, turn it on, write on it, PDA
Wa-wa, (Yeah, yeah, little PDA) wa, wa, wa, wa, wa, wa (Yeah, yeah, little PDA) Wa-wa, (Yeah, yeah, little PDA) wa, wa, wa, wa, wa, wa (Yeah, yeah, little PDA) Wa-wa (Ahhh, little PDA) wa, wa, wa, wa, wa, wa
I pull it out at the office, or in the train turnstile This little handheld computer, is so worthwhile I don't use no Win CE, I'm not seni-ee-eye-ile C'mon flip it out, turn it on, write on it, PDA
Wa-wa, (Yeah, yeah, little PDA) wa, wa, wa, wa, wa, wa (Yeah, yeah, little PDA) Wa-wa, (Yeah, yeah, little PDA) wa, wa, wa, wa, wa, wa (Yeah, yeah, little PDA) Wa-wa (Ahhh, little PDA) wa, wa, wa, wa, wa, wa
Gonna go on the Net order a PDA Get a case and a cradle I'll be running today Show it at the conference table and then they'll say, yeah, yeah That I'm a flagrant geek I'll upgrade it in a week And then flip it out, turn it on, write on it, PDA
true large vocabulary voice recognition (beyond the toy level) in the next 20, if not 50 years.
How do you define large vocabulary?
To give people an idea of what a large vocabulary is, Shakespeare used just over 31,000 words in ALL of his printed works counting proper names, words used once and mis-spellings. Any of the major speech recognition vendors (L&H, Dragon, IBM) can easily handle many more words than this. Certainly more words than are used by anyone in everyday life.
It does not say that it's open source, there's no indication that it is. why would you assume that it's open source just because it's designed to go on a Linux platform?
If you use a PalmPilot you will become pretty proficient at Graffiti; typically 20WPM minimum. Though a chording keyboard might be nice, I also use an onscreen keyboard layout called FITALY [fitaly.com]. This baby gets 40-50WPM, it really is pretty amazing once you get over the learning curve. Costs $25 though.
Continuous speech for a PDA would be excellent. Though just using it for switching apps or dialing numbers would not be as simple as pushing a button or swiping a shortcut IMO.
Those of you who own a PDA right now -- try to think of the last 5 places where you've used it. Thought of them? Good. Now think to yourself, in exactly how many of those places would talking aloud (esp to a little black box) be regularly tolerated? Maybe I'm seeing things from a distorted viewpoint, but I'd primarily have to use it in class, and, well, you can probably see what can come of that. I somehow doubt that modern speech recognition technology is sufficient to recognize instructions at a quiet-whisper level.
Yea, this has its applications for accessibility to people who can't use the stylus standard, but, as a mainstream item, I don't see this getting too far.
I don't think today's speech recognition is ready for text input yet. It will rather be used for controlling basic functions of your PDA.
Do you remember those voice-controlled ticket selling computers you can ask via phone "when does the next train to XY go off?" And the computer reserves your tickets. Now to the PDA: With MP3s in your PDA (in the nearest future) you could ask your device "hey, which songs from Moby do you have?" and the Device will present you a lost of titles (speech output?).
So Sense recognition is by far more important than speech recognition, and with limited applications such as ticket reservation (telephone hotline computer), Number dialing (cellphone) or music control (PDA) it is also possible with today's technology. And in your PDA it does not only make sense for music control...
So you won't use speech recognition for laying your device on the meeting table and make an automatic protocol in the near future, because the technology is not advanced enough. But it will make you able to perform certain tasks that require only a relatively small vocabulary without having to touch any buttons, e.g. while driving a car.
Re:Anyone else see a problem with this? (Score:1)
USC has some fascinating research into this (was posted on /. a while back and is in this month's Wired magazine) which achieves better-than-human results for recognizing words when background noise is 400%+ louder than the speaker. The military is interested as are commercial shops. Let's see if they can get it out of the "command and control" mode of listening for key words and into the continuous dictation realm.
peace.
Re:Anyone else see a problem with this? (Score:1)
Speech won't be the exclusive way to input data into these devices, just an option that is potentially very useful to some.
I have a doctor friend who has been asking for that technology for years for taking patient notes.
Re:Anyone else see a problem with this? (Score:2)
And while I like the idea of it taking notes at a meeting speech recognition is nowhere this good yet. Secrateries would never take dictation anymore. My guess is it uses a sort of graphiti for the voice...only responding to a certain small pretrained set of words.
Not new... (Score:1)
Not as pervasive but still...
Re:What we need is... (Score:1)
Re:My voice or your voice control? (Score:2)
"Format C:"
"Return"
"Yes"
"Return"
There are some areas where it would be a nice addition to a standard PDA. Not to replace the handwriting recognition but as an alternate entry method. I could see myself using it in the car in a hands free manner while driving. I know in previous jobs we did medical form/record entry and we would have liked to have recognition as part of it. Many of those customers use microcassette recorders that then need to be sent to a transcriber. Those areas with limitied technical vocabularies would be well suited. In a cubicle environment it's crazy. The woman over the wall drives me crazy with listening to her voice mail with her speaker phone as it is now... then again, the ability to yell "format c: return Y return" is tempting 8^)
Re:Anyone else see a problem with this? (Score:2)
Re:The niche for speech (Score:2)
I don't think a speech interface is going to be much use until these devices are capable of understanding a substantial proportion of possible spoken phrases. That would require incorporating a large chunk of the CYC database in a decent real-time (therefore probably analogue) neural net.
It'd be nice to think that demand for this type of feature on PDA's might fund and therefore speed up development of that kind of sophisticated AI.
Consciousness is not what it thinks it is
Thought exists only as an abstraction
Re:Anyone else see a problem with this? (Score:1)
Re:Anyone else see a problem with this? (Score:1)
Would you dictate your personal notes where anyone else can hear them. Let's face it, you don't want the rest of the world to hear what you're doing all the time, right?
Well I for sure don't.
The best way to enter data I've ever had was on a chord keyboard. The device was called an "Agenda" and it's about a decade old. If the memory card wasn't dead I'd be using it still. It was smaller than a Cassiopeia A-10 and I could enter text into it faster that I can type on a QWERTY keyboard, which is saying something.
Vik
How is this better than a MessagePad 2100? (Score:1)
Wake me up when someone does.
Re:The niche for speech (Score:1)
VoRec is not getting better - and won't (Score:2)
(Note also that using VoRec correctly requires a near total redesign of user interfaces to accomodate correlation of recognized voice commands and gestural input - notice how often you point or make another meaningful motion when you talk to someone, especially in trying to convey information, and that those gestures mean very different things from the way interfaces use pointing today.)
It's really sad, but today's best voice recognition is only marginally more useful than the CoVox voice recognition I had on my Commodore 64 thirteen years ago. Sure, vocabularies are larger now, and we can get somewhat closer to true connected speech recognition, but by and large, the lack of progress proves that despite an improvement of several orders of magnitude in processing power, (and memory speed and capacity, and storage space, not to mention DSPs) voice recognition is not a problem that can be effectively addressed by any amount of brute force processing using current methods.
I'm going to really date myself here: I've got a copy of Interface Age magazine (anybody else remember that?) in the garage (the one with the floppy ROM: a BASIC ROM on one of those floppy vinyl records they used to put in magazines) that has ads for several voice recognition systems that didn't work all that much worse that the ones we have now - I had friends that had some, back when everybody was using the S-100 bus. Speech recognition has been "just around the corner" in the computer/hacker community for as long as I can remember, and the VoRec prognosticators have been hoovering money for years out of suckers of either the investor or customer ilk who fell for "all it needs is next year's hardware and we'll be fine!"
I would love to believe in VoRec, but I see little evidence that it will exist in a significantly useful form for another ten years or so. Somebody prove me wrong - please.
Re:The niche for speech (Score:1)
The niche for speech (Score:4)
[Disclaimer: In the interest of full disclosure, I work in speech recognition for SpeechWorks International [speechworks.com]. You may enjoy testing the SpeechWorks Demo Line at 1.888.SAY.DEMO (1.888.729.3366).]
A number of users have posted comments questioning the benefit of speech recognition operating on a PDA. Before examining this topic, I would like to quickly review the technical state of the art (in an effort to satify your inner-geek).
The Technology
Speech recognition operates by performing statistical matches of incoming sound against familiar words or phonemes (i.e. individual sounds; a works is composed of one or more phonemes). Traditionally, embedded speech recognition systems have featured small vocabularies (i.e. a limited set of recognizable phrases), but advances in processor speed are allowing larger and more complex vocabularies.
For applications in the telephony industry, a system running on a 500 MHz Pentium may support up to 50 lines and ten languages. These systems use a combination of directed dialog (asking for specific pieces of information - "what is your account number?") or limited natural language ("I'd like to fly from Boston to San Jose").
Returning to the PDA market, the task is to recognize a single user operating in a single language. This greatly reduces the memory and processor requirements. Further tradeoffs are possibly by adapting to the speech patterns of a single user as frequently occurs in dictation systems. But, as we will discuss below, the vocabularies are much more complex and word-spotting becomes vital.
Speech on a PDA
For simple tasks like navigation, the point and click interface works great. You get immediate feedback from the screen and you may peruse a page of information at a time. A speech based interface, in contrast, is more serial than parallel. If you are walking through a list of 50 items, your eyes will locate the correct item far faster than if the list is being read. Likewise speech recognition will not replace the keyboard for data entry. It is, however, a valuable suppliment which allows the user to jump to information not readily visible. While you're composing an email on the Palm Pilot, for instance, saying "Tell me the birthday of Jim Bob Jones" may be faster than navigating there yourself. Likewise, if you're navigating through a database of 20k companies, it may be easier to just say "Yoyodyne Propulsion Systems".
To make speech recognition useful on a PDA, the vocabularies must directly relate to the installed applications and information. Complex navigation using true natural language is a difficult and very much unsolved recognition task. But speech recognition on a PDA is even harder. Why?
Imaging that you're sitting in a cave and you hear "Dave, I'm sure that I've got it.. umm... that's not... no... Boston Sand & Gravel... come on...". You're the PDA. What does the user want? If you understood the context of the situation, you might recall the above example of company names in a database. You might say, I've got that installed and locate the entry for the 'Boston Sand & Gravel Company' for the user. But a PDA is not that smart. It needs to first pick out the allowed phrases from the noise and surrounding conversation. This is called 'word spotting'. Then it needs to decide how to interpret the phrase. Without a restricted application, the PDA must understand the context, frequently in human terms, of the speech.
If this seems hopeless with today's technology, you are correct. We will see speech applied first to limited interactions and simple applications. Over time, the domain will grow. Think back to handwriting recognition on the early Newtons. We've come a long way in a few years. On the PDA, the same will be true for speech.
Re:Read the article (Score:1)
Summary: (Score:2)
Why actual voice? (Score:1)
My voice or your voice control? (Score:2)
Most people probably know the exceedingly funny dilbert strip in which Dilbert has been given a voice controlled PC and Wally expresses his jealousy by "DELETING A FILE". In the Netherlands (where I live) we had a very funny commercial for a voice controlled mobile phone. A guy is standing in the street with his boss and says "mother" to his phone to demonstrate the coolness of his new device. He talks to his mother for a sec and hangs up. After that two people right next to them get in an argument and one of the guys shouts "asshole" to the other (dutch equivalent actually) at wich point the boss's mobile phone rings.....
This is of course funny and theory right now but There will come a point at wich people will want to be sure device reacts to their voice commands and no one else's. I wonder how this problem is going to be tackled.
In the meantime I guess with the coming of voice operated pc to the workplace we will be getting seperate offices again to prevent us from interfering with each other's work. That is IMHO at least a good thing(tm)
Re:Anyone else see a problem with this? (Score:2)
In a car for instance, especially if it's connected to an on-line traffic report or on-line maps and has built in GPS.
At a meeting. No more "who will be taking notes this time?"-silence before every meeting. just flop you PDA on the table and synchronize with everyone afterwards.
I think this like al other things with "limited" use will find a niche to operate in. a very large one to as far as I'm concerned.
Re:Anyone else see a problem with this? (Score:2)
He had a prototype made out of wood (like he did for the original Pilot), and carried it around with him, pretending that it was a real unit, doing things like making appointments by voice, dictating memos, etc.
He said that "it felt wrong", both in terms of execution (it's much faster to click your way around the OS than try to short-cut through it using voice commands), and it terms of the social stigma that's attached to "talking to yourself". He didn't think that the latter would be overcome, but he also pointed out that a long time ago, even talking on a cell phone was out-of-the-ordinary enough for people to look at you funny.
I've used Dragon Dictate (on a P3/500 with gobs of RAM), and it was just too slow for my tastes even on my relatively powerful machine. I constantly had to "correct" it, but each time I did it would remember the way I said certain things for the next time. Now this is fine on a desktop with mucho storage, but what is going to happen on a PDA with limited storage? Is it not going to be possible for me to customize its vocabulary? Will I have to mold my voice to the device? If so, I see these devices having very limited appeal...
Read the article (Score:2)
I just hope it's not the 60MB monstrosity from Transmeta
Re:Yet another voice user failure (Score:1)
The voice to text training has an option to read back what you said. It really is pretty amazing how lazy most people are when it comes to proper pronunciation. When I slur my words, Dragon slurs my text. GIGO (Garbage In Garbage Out)!
To illustrate this, use your challenge on yourself. Record someones (relaxed) voice reading 4 chapters of text and take a week to convert it to text and see what you get.
_________________________
Re:Anyone else see a problem with this? (Score:2)
The same thing could be said before cell-phones became more widespread. At first it will seem a little strange, but once more people get used to it, I really don't see a social problem.
Even now, it's common for people to walk down the street "talking to a little [black box] cell-phone".
_________________________
Been here, done this in beta...1996 (Score:1)
Had a 32 programmable word functioning for the demo. Full details used to be on the Newton FTP site, but Apple dropped that site. So you have to buy one of the compilation CD's that exist if you want to see it/track down the
As for Linux on a PDA - No one was talking at LinuxWorld. So no one is close enough to ship anytime soon.
Re:Yet another voice user failure (Score:2)
I'll take the challenge, and I'll get 100% accuracy, guaranteed (assuming we're talking about normal speech, and not absurd levels of slurring).
It's always been the case that voice recognition works if you speak in a robotic, non-natural way. But I'm talking about speech to text for the masses, who will not (and have not in the past) be willing to learn to speak "the right way". Most are barely willing to go through training sessions. The reason is that for most people, they want to concentrate on what they're saying, rather than how they're saying it (which is not unreasonable).
Not to mention the environmental difficulties of having using a PDA. Will this work in a noisy airport? Doubt it.
--
Re:Yet another voice recognition failure (Score:2)
by "large" I mean conversational English, which is far less than 31,000.
Yes, I realize they claim they can handle more than that, but the rub is in the definition of "handle". My definition says that practical text to speech should be very close to the accuracy level of dictation to a human secretary (without requiring training, I might add). Their definition is -- far from that.
The reason I made the distinction of "large vocabulary" is that we actually have reasonable voice recognition of things like digits, which are very accurate and speaker independant, primarily because there is no context required.
--
Yet another voice recognition failure (Score:3)
New acronym I just invented: YAVRF.
I still have yet to use a voice recognition system that doesn't suck. I keep hearing that "we're almost there! Processors are almost fast enough!". Yeah right.
Here's an experiment that I would love to give any voice recognition company: I speak in my natural voice and read, say, 10 pages out of a novel. You can let your system process it for ONE WEEK. This is the equivalent using a processor 4000 times faster. After that, I would expect 100% accuracy if the limitation is cpu power!.
But you know 100% accuracy won't happen. The reason is that the problem with voice recognition is not lack of computer power, it's 1) lack of "world" knowledge, and 2) lack of understanding of how to apply world knowledge to language parsing (the AI problem, in other words). Even humans have trouble understanding speech, but we fill in the inaccuracies by knowing context.
I am doubtful that we will see true large vocabulary voice recognition (beyond the toy level) in the next 20, if not 50 years.
By the way, this is not to say that the toy level might not be useful for a device like this. But I have a feeling that they will oversell it like every other YAVRF and the public will get frustrated with its imperfections.
--
Re:Open Source? (Score:1)
Re:Anyone else see a problem with this? (Score:1)
Let's ignore for a moment that the most recent version of MacOS seems to be just fine at recognizing individual voices. This argument, and others like it in the discussion, miss the point I think-- Even if technology isn't good enough to do it now, there is a fantastic future for voice-recognition in PDAs. I personally have held off on buying a Palm Pilot because I don't want to have to fuss with the pen all the time. If somebody manages to develop a PDA than can effectively save me the pain of having to write in everything, I will be all set to buy.
At least initially, I would be ok with a combination stylus/voice recognition interface... Saying that a crowd of people will mess up voice recognition and therefore it is lousy for PDAs seems the same argument as "trying to write on a bumpy car/bus will make it impossible for PDAs to recognize handwriting"... and yet people seem to be just fine with Palm's handwriting recognition.
Bottom line: there is a GREAT future for voice rec in PDAs...
Re:Anyone else see a problem with this? (Score:1)
Why not just have both (stylus and voice) and allow the user to use whichever is most appropriate?
I agree that voice doesn't make sense for some OS-type tasks, but if I want to leave an interesting note for myself, or dictate and e-mail, I really don't want to deal with typing.
I think the social stigma of "talking to yourself" is a bad argument... Does anybody think it looks weird when police, etc.. use walkie-talkies? Personally, I think the idea of talking into my PDA would be kind of cool.
Besides, isn't talking into a cell phone "talking to yourself" as nobody is actually there with you. Seems like talking to your PDA would look a lot like talking to your cell phone! (OK, maybe it IS ridiculous looking...
Mood Music - Little PDA (Score:2)
by Justin Osborn
To the tune of: G.T.O. by Ronnie and the Daytonas
Little PDA, in the handheld aisle
4 buttons and a touch screen, it's very versatile
Listen to her synchin' up now, backing up yer fi-ee-eye-iles
C'mon flip it out, turn it on, write on it, PDA
Wa-wa, (Yeah, yeah, little PDA)
wa, wa, wa, wa, wa, wa (Yeah, yeah, little PDA)
Wa-wa, (Yeah, yeah, little PDA)
wa, wa, wa, wa, wa, wa (Yeah, yeah, little PDA)
Wa-wa (Ahhh, little PDA) wa, wa, wa, wa, wa, wa
I pull it out at the office, or in the train turnstile
This little handheld computer, is so worthwhile
I don't use no Win CE, I'm not seni-ee-eye-ile
C'mon flip it out, turn it on, write on it, PDA
Wa-wa, (Yeah, yeah, little PDA)
wa, wa, wa, wa, wa, wa (Yeah, yeah, little PDA)
Wa-wa, (Yeah, yeah, little PDA)
wa, wa, wa, wa, wa, wa (Yeah, yeah, little PDA)
Wa-wa (Ahhh, little PDA) wa, wa, wa, wa, wa, wa
Gonna go on the Net
order a PDA
Get a case and a cradle
I'll be running today
Show it at the conference table
and then they'll say, yeah, yeah
That I'm a flagrant geek
I'll upgrade it in a week
And then flip it out, turn it on, write on it, PDA
Re:Yet another voice recognition failure (Score:1)
To give people an idea of what a large vocabulary is, Shakespeare used just over 31,000 words in ALL of his printed works counting proper names, words used once and mis-spellings. Any of the major speech recognition vendors (L&H, Dragon, IBM) can easily handle many more words than this. Certainly more words than are used by anyone in everyday life.
So exactly how large is TRUE large vocabulary?
-Brad :)
Re:Open Source? (Score:1)
-Brad :)
Scratching & clawing (Score:1)
If you use a PalmPilot you will become pretty proficient at Graffiti; typically 20WPM minimum. Though a chording keyboard might be nice, I also use an onscreen keyboard layout called FITALY [fitaly.com]. This baby gets 40-50WPM, it really is pretty amazing once you get over the learning curve. Costs $25 though.
Continuous speech for a PDA would be excellent. Though just using it for switching apps or dialing numbers would not be as simple as pushing a button or swiping a shortcut IMO.
Anyone else see a problem with this? (Score:4)
Yea, this has its applications for accessibility to people who can't use the stylus standard, but, as a mainstream item, I don't see this getting too far.
Speech recognition for controlling, not for text i (Score:1)
Do you remember those voice-controlled ticket selling computers you can ask via phone "when does the next train to XY go off?" And the computer reserves your tickets. Now to the PDA: With MP3s in your PDA (in the nearest future) you could ask your device "hey, which songs from Moby do you have?" and the Device will present you a lost of titles (speech output?).
So Sense recognition is by far more important than speech recognition, and with limited applications such as ticket reservation (telephone hotline computer), Number dialing (cellphone) or music control (PDA) it is also possible with today's technology. And in your PDA it does not only make sense for music control...
So you won't use speech recognition for laying your device on the meeting table and make an automatic protocol in the near future, because the technology is not advanced enough. But it will make you able to perform certain tasks that require only a relatively small vocabulary without having to touch any buttons, e.g. while driving a car.
Re:Read the article (Score:1)
try
http://www.arm.uk.linux.org/~rmk/armlinux.html
http://www.linuxce.org/
http://embedded.adis.on.ca/