Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
News

Lernout & Hauspie Going Into PDA Space 47

TNLNYC writes "Good article from InfoWorld about new prototypes of voice-driven PDAs."
This discussion has been archived. No new comments can be posted.

Lernout & Hauspie Going Into PDA Space

Comments Filter:
  • by Anonymous Coward
    Almost all leading ASR products now have the ability for continuous speech with speaker independence (read: no need to teach it your voice). Dragon products suck (to date, lead the pack for a short while, though) -- L&H and IBM are significant consumer/commercial providers with great products and there are others thought of as more R&D (not for joe user's PC) that are even better.

    USC has some fascinating research into this (was posted on /. a while back and is in this month's Wired magazine) which achieves better-than-human results for recognizing words when background noise is 400%+ louder than the speaker. The military is interested as are commercial shops. Let's see if they can get it out of the "command and control" mode of listening for key words and into the continuous dictation realm.

    peace.

  • I get away with talking to the little black box that is my cell phone.


    Speech won't be the exclusive way to input data into these devices, just an option that is potentially very useful to some.


    I have a doctor friend who has been asking for that technology for years for taking patient notes.

  • Ohh im sure it will find a niche but this is kind of like inventing the mouse before you bother to finish putting all the letters on the keyboard. Sure most people will find it useful in a limited number of situations but in most of these situations other options are availible. In both your car and office it would be quite easy to put a large, more powerful, computer unit which can interface with your PDA.

    And while I like the idea of it taking notes at a meeting speech recognition is nowhere this good yet. Secrateries would never take dictation anymore. My guess is it uses a sort of graphiti for the voice...only responding to a certain small pretrained set of words.
  • ...L&H already did this for the Newton MP 21xx.
    Not as pervasive but still...
  • Obviously you need a boy/girlfriend! I'd say spouse but I'm not sure pleading is "voice driven".
  • And the supposedly true story of one of the early windows demo of a speaker independent voice enabled system where various audience members shouted:
    "Format C:"
    "Return"
    "Yes"
    "Return"

    There are some areas where it would be a nice addition to a standard PDA. Not to replace the handwriting recognition but as an alternate entry method. I could see myself using it in the car in a hands free manner while driving. I know in previous jobs we did medical form/record entry and we would have liked to have recognition as part of it. Many of those customers use microcassette recorders that then need to be sent to a transcriber. Those areas with limitied technical vocabularies would be well suited. In a cubicle environment it's crazy. The woman over the wall drives me crazy with listening to her voice mail with her speaker phone as it is now... then again, the ability to yell "format c: return Y return" is tempting 8^)
  • Voice is a step backwards in many environments. Staff density in businesses is increasing, so much so that even trying to hold a normal (short)phone conversation is getting difficult. There are times that staff near me are yelling down the line at someone on a mobile so loud that I can't even think to type. The place I work at has been so successful recently that if it doesn't start packing staff in like diners at a sushi bar we're going to run out of room. Imagine if every one of those spent most of the day talking at their computers. Meanwhile all you hear from me while I'm composing an e-mail (or a /. post) is the quiet tapping of keys. Sure, voice recognition is useful in some applications, but I'm more a fan of gestures - not writing on a pad, but hand movements in the air. Granted that waving your fingers around in public might make you look like you're involved in an indecent act, or trying to cast a spell, but it's a lot less intrusive than voice. Anyone have any followup info to those fingernail motion sensors posted here a month or so?
  • Does anybody remember Nathan Spring's "Box" in the short-lived BBC Series "Star Cops"? Now that's a useful PDA.

    I don't think a speech interface is going to be much use until these devices are capable of understanding a substantial proportion of possible spoken phrases. That would require incorporating a large chunk of the CYC database in a decent real-time (therefore probably analogue) neural net.

    It'd be nice to think that demand for this type of feature on PDA's might fund and therefore speed up development of that kind of sophisticated AI.

    Consciousness is not what it thinks it is
    Thought exists only as an abstraction
  • There are times when I wouldn't use the Voice recognition and other times when it would be great to have (driving your car, needing a note to be taken for instance). I use my PDA a lot, it's enabled me to carry just it and not a pen and paper. I 'post notes' and organize them from time to time like at the end of the day or just before I leave work. Currently I don't use it to take lots of notes during a conversation (like at work). There I still use my notebook because it's just easier to write without thinking. I use a lot of symbols and such as a kind of shorthand. It's the one area where I have a hard time with Graffitti. Also I tend to add notes in the margins and drawing s (a picture is worth a 1000 words) which is not possible with any computer I know of. If someone can overcome these short comings they'd truely have a product that should sell.
  • Yeah, I have a problem with talking to PDAs. Who is going to take notes in a crowded office or eve the supermarket of a project that is under NDA?

    Would you dictate your personal notes where anyone else can hear them. Let's face it, you don't want the rest of the world to hear what you're doing all the time, right?

    Well I for sure don't.

    The best way to enter data I've ever had was on a chord keyboard. The device was called an "Agenda" and it's about a decade old. If the memory card wasn't dead I'd be using it still. It was smaller than a Cassiopeia A-10 and I could enter text into it faster that I can type on a QWERTY keyboard, which is saying something.

    Vik :v)
  • They still have not come up with anything as good as the last Newton.

    Wake me up when someone does.

  • I couldn't even get to the stock quote option. It must be slashdotted.
  • I would love to use voice recognition in an appropriate capacity, but it's not going to happen soon, if ever.

    (Note also that using VoRec correctly requires a near total redesign of user interfaces to accomodate correlation of recognized voice commands and gestural input - notice how often you point or make another meaningful motion when you talk to someone, especially in trying to convey information, and that those gestures mean very different things from the way interfaces use pointing today.)

    It's really sad, but today's best voice recognition is only marginally more useful than the CoVox voice recognition I had on my Commodore 64 thirteen years ago. Sure, vocabularies are larger now, and we can get somewhat closer to true connected speech recognition, but by and large, the lack of progress proves that despite an improvement of several orders of magnitude in processing power, (and memory speed and capacity, and storage space, not to mention DSPs) voice recognition is not a problem that can be effectively addressed by any amount of brute force processing using current methods.

    I'm going to really date myself here: I've got a copy of Interface Age magazine (anybody else remember that?) in the garage (the one with the floppy ROM: a BASIC ROM on one of those floppy vinyl records they used to put in magazines) that has ads for several voice recognition systems that didn't work all that much worse that the ones we have now - I had friends that had some, back when everybody was using the S-100 bus. Speech recognition has been "just around the corner" in the computer/hacker community for as long as I can remember, and the VoRec prognosticators have been hoovering money for years out of suckers of either the investor or customer ilk who fell for "all it needs is next year's hardware and we'll be fine!"

    I would love to believe in VoRec, but I see little evidence that it will exist in a significantly useful form for another ten years or so. Somebody prove me wrong - please.
  • hmm..i tried the speechworks stock thingy and it failed on both tries of LNUX - va linux.
  • by First Person ( 51018 ) on Saturday February 05, 2000 @08:07AM (#1303271)

    [Disclaimer: In the interest of full disclosure, I work in speech recognition for SpeechWorks International [speechworks.com]. You may enjoy testing the SpeechWorks Demo Line at 1.888.SAY.DEMO (1.888.729.3366).]

    A number of users have posted comments questioning the benefit of speech recognition operating on a PDA. Before examining this topic, I would like to quickly review the technical state of the art (in an effort to satify your inner-geek).

    The Technology

    Speech recognition operates by performing statistical matches of incoming sound against familiar words or phonemes (i.e. individual sounds; a works is composed of one or more phonemes). Traditionally, embedded speech recognition systems have featured small vocabularies (i.e. a limited set of recognizable phrases), but advances in processor speed are allowing larger and more complex vocabularies.

    For applications in the telephony industry, a system running on a 500 MHz Pentium may support up to 50 lines and ten languages. These systems use a combination of directed dialog (asking for specific pieces of information - "what is your account number?") or limited natural language ("I'd like to fly from Boston to San Jose").

    Returning to the PDA market, the task is to recognize a single user operating in a single language. This greatly reduces the memory and processor requirements. Further tradeoffs are possibly by adapting to the speech patterns of a single user as frequently occurs in dictation systems. But, as we will discuss below, the vocabularies are much more complex and word-spotting becomes vital.

    Speech on a PDA

    For simple tasks like navigation, the point and click interface works great. You get immediate feedback from the screen and you may peruse a page of information at a time. A speech based interface, in contrast, is more serial than parallel. If you are walking through a list of 50 items, your eyes will locate the correct item far faster than if the list is being read. Likewise speech recognition will not replace the keyboard for data entry. It is, however, a valuable suppliment which allows the user to jump to information not readily visible. While you're composing an email on the Palm Pilot, for instance, saying "Tell me the birthday of Jim Bob Jones" may be faster than navigating there yourself. Likewise, if you're navigating through a database of 20k companies, it may be easier to just say "Yoyodyne Propulsion Systems".

    To make speech recognition useful on a PDA, the vocabularies must directly relate to the installed applications and information. Complex navigation using true natural language is a difficult and very much unsolved recognition task. But speech recognition on a PDA is even harder. Why?

    Imaging that you're sitting in a cave and you hear "Dave, I'm sure that I've got it.. umm... that's not... no... Boston Sand & Gravel... come on...". You're the PDA. What does the user want? If you understood the context of the situation, you might recall the above example of company names in a database. You might say, I've got that installed and locate the entry for the 'Boston Sand & Gravel Company' for the user. But a PDA is not that smart. It needs to first pick out the allowed phrases from the noise and surrounding conversation. This is called 'word spotting'. Then it needs to decide how to interpret the phrase. Without a restricted application, the PDA must understand the context, frequently in human terms, of the speech.

    If this seems hopeless with today's technology, you are correct. We will see speech applied first to limited interactions and simple applications. Over time, the domain will grow. Think back to handwriting recognition on the early Newtons. We've come a long way in a few years. On the PDA, the same will be true for speech.

  • What exactly does that mean... Mobile Version of linux? I can't find this option anywhere in make menuconfig...
    • Yes it does run Linux (Lower power consumption, etc.).
    • The name "Nuk", is short for the Hawaiian word for "Echo" ("Nukulu").
    • "Wireless voice connection to e-commerce".
    • Not using Graffiti will open it up to more people.
    • Should ship in second half of 2000.
    • 400-500 MHz at launch.
    • $200-$600 Estimate price.
  • I remember in that in the book Xenocide by Card, Ender used a method of speech called "subvocalizing" to communicate with his computer-friend without anyone else hearing it. Is this actually possible? If so, why aren't we doing that instead of speaking aloud? It seems to me that would solve the problem of talking loudly to a little black box where it was inappropriate. Someone tell me if this is just something Card made up.

  • With the current increase in voice controlled appliances some serious work should be done on the privacy and safety field.
    Most people probably know the exceedingly funny dilbert strip in which Dilbert has been given a voice controlled PC and Wally expresses his jealousy by "DELETING A FILE". In the Netherlands (where I live) we had a very funny commercial for a voice controlled mobile phone. A guy is standing in the street with his boss and says "mother" to his phone to demonstrate the coolness of his new device. He talks to his mother for a sec and hangs up. After that two people right next to them get in an argument and one of the guys shouts "asshole" to the other (dutch equivalent actually) at wich point the boss's mobile phone rings.....

    This is of course funny and theory right now but There will come a point at wich people will want to be sure device reacts to their voice commands and no one else's. I wonder how this problem is going to be tackled.
    In the meantime I guess with the coming of voice operated pc to the workplace we will be getting seperate offices again to prevent us from interfering with each other's work. That is IMHO at least a good thing(tm)
  • You can't use it in all places of course but I know of a few places where voice control might be usefull..

    In a car for instance, especially if it's connected to an on-line traffic report or on-line maps and has built in GPS.
    At a meeting. No more "who will be taking notes this time?"-silence before every meeting. just flop you PDA on the table and synchronize with everyone afterwards.

    I think this like al other things with "limited" use will find a niche to operate in. a very large one to as far as I'm concerned.
  • In a PalmSource session, Jeff Hawkins (now of Visor) said that one of the concepts that he was considering implementing when he left Palm was speech recognition.

    He had a prototype made out of wood (like he did for the original Pilot), and carried it around with him, pretending that it was a real unit, doing things like making appointments by voice, dictating memos, etc.

    He said that "it felt wrong", both in terms of execution (it's much faster to click your way around the OS than try to short-cut through it using voice commands), and it terms of the social stigma that's attached to "talking to yourself". He didn't think that the latter would be overcome, but he also pointed out that a long time ago, even talking on a cell phone was out-of-the-ordinary enough for people to look at you funny.

    I've used Dragon Dictate (on a P3/500 with gobs of RAM), and it was just too slow for my tastes even on my relatively powerful machine. I constantly had to "correct" it, but each time I did it would remember the way I said certain things for the next time. Now this is fine on a desktop with mucho storage, but what is going to happen on a PDA with limited storage? Is it not going to be possible for me to customize its vocabulary? Will I have to mold my voice to the device? If so, I see these devices having very limited appeal...
  • These PDAs use a mobile version of Linux.

    I just hope it's not the 60MB monstrosity from Transmeta ;)
  • I'm the happy user of a Dragon Systems naturally speeking and I love it. I would like to take issue with your statements concerning VRsoftware.
    The voice to text training has an option to read back what you said. It really is pretty amazing how lazy most people are when it comes to proper pronunciation. When I slur my words, Dragon slurs my text. GIGO (Garbage In Garbage Out)!
    To illustrate this, use your challenge on yourself. Record someones (relaxed) voice reading 4 chapters of text and take a week to convert it to text and see what you get.
    _________________________
  • Now think to yourself, in exactly how many of those places would talking aloud (esp to a little black box)

    The same thing could be said before cell-phones became more widespread. At first it will seem a little strange, but once more people get used to it, I really don't see a social problem.

    Even now, it's common for people to walk down the street "talking to a little [black box] cell-phone".
    _________________________

  • Apple's Newton 2000 was demoed with voice software from Dragon Systems.

    Had a 32 programmable word functioning for the demo. Full details used to be on the Newton FTP site, but Apple dropped that site. So you have to buy one of the compilation CD's that exist if you want to see it/track down the .PDF on your own.

    As for Linux on a PDA - No one was talking at LinuxWorld. So no one is close enough to ship anytime soon.
  • I'll take the challenge, and I'll get 100% accuracy, guaranteed (assuming we're talking about normal speech, and not absurd levels of slurring).

    It's always been the case that voice recognition works if you speak in a robotic, non-natural way. But I'm talking about speech to text for the masses, who will not (and have not in the past) be willing to learn to speak "the right way". Most are barely willing to go through training sessions. The reason is that for most people, they want to concentrate on what they're saying, rather than how they're saying it (which is not unreasonable).

    Not to mention the environmental difficulties of having using a PDA. Will this work in a noisy airport? Doubt it.


    --

  • by "large" I mean conversational English, which is far less than 31,000.

    Yes, I realize they claim they can handle more than that, but the rub is in the definition of "handle". My definition says that practical text to speech should be very close to the accuracy level of dictation to a human secretary (without requiring training, I might add). Their definition is -- far from that.

    The reason I made the distinction of "large vocabulary" is that we actually have reasonable voice recognition of things like digits, which are very accurate and speaker independant, primarily because there is no context required.


    --

  • by Tim Behrendsen ( 89573 ) on Saturday February 05, 2000 @07:42AM (#1303284)

    New acronym I just invented: YAVRF.

    I still have yet to use a voice recognition system that doesn't suck. I keep hearing that "we're almost there! Processors are almost fast enough!". Yeah right.

    Here's an experiment that I would love to give any voice recognition company: I speak in my natural voice and read, say, 10 pages out of a novel. You can let your system process it for ONE WEEK. This is the equivalent using a processor 4000 times faster. After that, I would expect 100% accuracy if the limitation is cpu power!.

    But you know 100% accuracy won't happen. The reason is that the problem with voice recognition is not lack of computer power, it's 1) lack of "world" knowledge, and 2) lack of understanding of how to apply world knowledge to language parsing (the AI problem, in other words). Even humans have trouble understanding speech, but we fill in the inaccuracies by knowing context.

    I am doubtful that we will see true large vocabulary voice recognition (beyond the toy level) in the next 20, if not 50 years.

    By the way, this is not to say that the toy level might not be useful for a device like this. But I have a feeling that they will oversell it like every other YAVRF and the public will get frustrated with its imperfections.


    --

  • Of course, they should probably investigate using Crusoe for one of these things.
  • I somehow doubt that modern speech recognition technology is sufficient to recognize instructions at a quiet-whisper level.

    Let's ignore for a moment that the most recent version of MacOS seems to be just fine at recognizing individual voices. This argument, and others like it in the discussion, miss the point I think-- Even if technology isn't good enough to do it now, there is a fantastic future for voice-recognition in PDAs. I personally have held off on buying a Palm Pilot because I don't want to have to fuss with the pen all the time. If somebody manages to develop a PDA than can effectively save me the pain of having to write in everything, I will be all set to buy.

    At least initially, I would be ok with a combination stylus/voice recognition interface... Saying that a crowd of people will mess up voice recognition and therefore it is lousy for PDAs seems the same argument as "trying to write on a bumpy car/bus will make it impossible for PDAs to recognize handwriting"... and yet people seem to be just fine with Palm's handwriting recognition.

    Bottom line: there is a GREAT future for voice rec in PDAs...
  • He said that "it felt wrong", both in terms of execution (it's much faster to click your way around the OS than try to short-cut through it using voice commands), and it terms of the social stigma that's attached to "talking to yourself". He didn't think that the latter would be overcome, but he also pointed out that a long time ago, even talking on a cell phone was out-of-the-ordinary enough for people to look at you funny.

    Why not just have both (stylus and voice) and allow the user to use whichever is most appropriate?

    I agree that voice doesn't make sense for some OS-type tasks, but if I want to leave an interesting note for myself, or dictate and e-mail, I really don't want to deal with typing.

    I think the social stigma of "talking to yourself" is a bad argument... Does anybody think it looks weird when police, etc.. use walkie-talkies? Personally, I think the idea of talking into my PDA would be kind of cool.

    Besides, isn't talking into a cell phone "talking to yourself" as nobody is actually there with you. Seems like talking to your PDA would look a lot like talking to your cell phone! (OK, maybe it IS ridiculous looking... ;) )
  • PDA
    by Justin Osborn

    To the tune of: G.T.O. by Ronnie and the Daytonas

    Little PDA, in the handheld aisle
    4 buttons and a touch screen, it's very versatile
    Listen to her synchin' up now, backing up yer fi-ee-eye-iles
    C'mon flip it out, turn it on, write on it, PDA

    Wa-wa, (Yeah, yeah, little PDA)
    wa, wa, wa, wa, wa, wa (Yeah, yeah, little PDA)
    Wa-wa, (Yeah, yeah, little PDA)
    wa, wa, wa, wa, wa, wa (Yeah, yeah, little PDA)
    Wa-wa (Ahhh, little PDA) wa, wa, wa, wa, wa, wa

    I pull it out at the office, or in the train turnstile
    This little handheld computer, is so worthwhile
    I don't use no Win CE, I'm not seni-ee-eye-ile
    C'mon flip it out, turn it on, write on it, PDA

    Wa-wa, (Yeah, yeah, little PDA)
    wa, wa, wa, wa, wa, wa (Yeah, yeah, little PDA)
    Wa-wa, (Yeah, yeah, little PDA)
    wa, wa, wa, wa, wa, wa (Yeah, yeah, little PDA)
    Wa-wa (Ahhh, little PDA) wa, wa, wa, wa, wa, wa

    Gonna go on the Net
    order a PDA
    Get a case and a cradle
    I'll be running today
    Show it at the conference table
    and then they'll say, yeah, yeah
    That I'm a flagrant geek
    I'll upgrade it in a week
    And then flip it out, turn it on, write on it, PDA

  • I am doubtful that we will see
    true large vocabulary voice recognition (beyond the toy level) in the next 20, if not 50 years.
    How do you define large vocabulary?

    To give people an idea of what a large vocabulary is, Shakespeare used just over 31,000 words in ALL of his printed works counting proper names, words used once and mis-spellings. Any of the major speech recognition vendors (L&H, Dragon, IBM) can easily handle many more words than this. Certainly more words than are used by anyone in everyday life.

    So exactly how large is TRUE large vocabulary?

    -Brad :)

  • It does not say that it's open source, there's no indication that it is. why would you assume that it's open source just because it's designed to go on a Linux platform?

    -Brad :)

  • If you use a PalmPilot you will become pretty proficient at Graffiti; typically 20WPM minimum. Though a chording keyboard might be nice, I also use an onscreen keyboard layout called FITALY [fitaly.com]. This baby gets 40-50WPM, it really is pretty amazing once you get over the learning curve. Costs $25 though.

    Continuous speech for a PDA would be excellent. Though just using it for switching apps or dialing numbers would not be as simple as pushing a button or swiping a shortcut IMO.

  • by zyqqh ( 137965 ) on Saturday February 05, 2000 @06:19AM (#1303292)
    Those of you who own a PDA right now -- try to think of the last 5 places where you've used it. Thought of them? Good. Now think to yourself, in exactly how many of those places would talking aloud (esp to a little black box) be regularly tolerated? Maybe I'm seeing things from a distorted viewpoint, but I'd primarily have to use it in class, and, well, you can probably see what can come of that. I somehow doubt that modern speech recognition technology is sufficient to recognize instructions at a quiet-whisper level.

    Yea, this has its applications for accessibility to people who can't use the stylus standard, but, as a mainstream item, I don't see this getting too far.

  • I don't think today's speech recognition is ready for text input yet. It will rather be used for controlling basic functions of your PDA.

    Do you remember those voice-controlled ticket selling computers you can ask via phone "when does the next train to XY go off?" And the computer reserves your tickets. Now to the PDA: With MP3s in your PDA (in the nearest future) you could ask your device "hey, which songs from Moby do you have?" and the Device will present you a lost of titles (speech output?).

    So Sense recognition is by far more important than speech recognition, and with limited applications such as ticket reservation (telephone hotline computer), Number dialing (cellphone) or music control (PDA) it is also possible with today's technology. And in your PDA it does not only make sense for music control...

    So you won't use speech recognition for laying your device on the meeting table and make an automatic protocol in the near future, because the technology is not advanced enough. But it will make you able to perform certain tasks that require only a relatively small vocabulary without having to touch any buttons, e.g. while driving a car.

  • it isn't there in menuconfig (if i'm not wrong).

    try
    http://www.arm.uk.linux.org/~rmk/armlinux.html
    http://www.linuxce.org/
    http://embedded.adis.on.ca/

There are two ways to write error-free programs; only the third one works.

Working...