Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
KDE GUI

IBM, TrollTech Integrate Linux Voice Recognition 230

Paladin128 writes: "Talk about cool technology. Linux may get widespread voice recognition before Windows, as this article mentions that IBM's ViaVoice will be bundled with Qt, and allow the programmers to use BNF to create parsing rules, and bid voice input directly to Qt components via Qt signals and slots. This level of integration evidently wasn't possible with Win32, thus there were performance issues. And since Qt is open source, the GNOME people could easilly find a way to integrate this technology into GTK+. Between adding voice to the handicapped accessability list, offering KDE in more languages than Windows is available in (I don't use GNOME so I can't comment on how it's doing here), and more customization than Windows can ever hope to offer (such as choice of desktops), Linux could really make some waves this year." Just don't mention "rm -rf" when you're near the microphone ...
This discussion has been archived. No new comments can be posted.

IBM, TrollTech Integrate Linux Voice Recognition

Comments Filter:

  • Let's see, which QT event should I bind *sneeze* to?

    Reality checks:
    - it's going to take just a few more CPU cycles you have, no matter how many you have
    - it's going to take ages to get translated to all these fancy little European languages
    - it's going to take longer to tweak up to usability than all those Eterm background pictures
    - it's still slower than typing.

    Embedded devices, yeah, but they don't have the muscle. And nobody wants to spend hours to teach their coffee machines or garage doors to listen.
    We have each other to misunderstand ourselves, the machines don't really care anyway.

    Anyway, where's the tarball? This is going to be soo fun. I'll just have to clean up my place so I can convince someone to come over and witness my geeky coolness once it's running...
  • at least part of the problem was that for it to function well it needed a pretty hefty machine for those times.

    I remember differently. I had a M100 (w/ decent RAM) at the time and could use it, even though Voice Type was said to be floating-heavy and Cyrices were said to be floating-weak.
    It was simply awkward to use, headset and GUI and all.

    OK, with a wireless headset and only intended to support the GUI, not replace the mouse, it could be very useful.
  • Scene:-Computer Lab Student: Excuse me, but what is the command for deleting a directory and all its subtrees? Lab Technician: rm -rf Student: Are you sure? Lab Technician: Yes On a serious note, as voice enabled computer technology becomes more common, what standards are there to enable us to seperate computer commands from normal conversation? Do we have to be like ST and prefix everyhting with "computer", or are there more intelligent ways?
  • KDE and Qt can *never* be forced back to non-free. They're both GPLed. There's no way to undo that. If they tried integrating non-GPL code with Qt/KDE, they'd be violating the GPL. Therefore they have no choice but to release their code under the GPL.

    I think you are wrong here. TrollTech is still the copyright holder of Qt. So nothing can stop them from relasing a Qt3 which is closed source again, BUT on the other hand everybody can work on qt 2.2.x and add what he likes. I am sure they (tt) will not do it (relase it under a closed liscence) but who know's perhaps they come up with two qt3--one closed with voice recognition, one open without.

  • Your example shows why it's not the absence of voice recognition software that is the problem, and why the addition will not make software for the average home user better. A query for the weather is by nature repetitive and fairly predictable. So why didn't the computer know in advance that you might be interested in the weather and have the information waiting for you? The reason as Alan Cooper points out in his book About Face: The Essentials of User Interface Design is that computers and software are stupid. The software is written to be stupid, it has no memory, and it has no intelligence. And that's not going to change anytime soon. What people are going to need for computers to become useful as assistants is for the computers to have intelligence approaching assistants. If you look at say the movie 2001, observe the sheer amount of information that HAL required for any of his functioning. Is there any effort for such information sharing in either the commercial or open source environments? No, such sharing is impossible under current licensing restrictions. So we aren't going to have a home computer that understands the simple concept of "weather", requiring any query about the weather to be explictly programmed. Sorry but PCs capable of functioning as information assistants are the flying cars of this generation.
  • I'm a piano player, and a touch typing coder (~100-105wpm).

    I can piano-play a hell of a lot faster than I can type.

    How? Simple. PRACTICE! Yes, piano players practice the same piece over and over again when it is of said difficulty. Once you've practiced it enough, it is really easy to play it fast, because your motor memory kicks in.. it really just feels like your hands know where to go, and your brain decides how fast they do it.

    Typists (useful ones, anyhow), tend to type different things every day -- this means that motor memory for a particular passage will never, never kick in. When it does (passwords, etc), you can expect 300+ wpm (bursts) from the "average coder".

    Oh, and show me the "average piano player" (remember, we were talking about the "average coder") who can sight-read (no motor memory!) a piece at 220 in 4/4 that is solid 16th notes, with two-octave jumps in the right hand, and I'll eat my hat.

    I still have a hard time believing that the "average coder" types 170wpm. I'm a pretty friggin' good coder, and I know how to type -- been doing it for over 15 years -- and 100 wpm seems to be several standard deviations above the mean.

    --
  • by Fjord ( 99230 ) on Thursday February 01, 2001 @09:34AM (#464663) Homepage Journal
    Restaurants have this system call Squirrel that lets them input orders and it maintains the bills. Two of the common complaints with the touch screen interface is that the servers, who use the system, find it unintuative, and often have their hands full. This sounds like you can make a system where they just have to say "Table 13, burger with fries, philly steak no onions, 2 cokes", and it would do alright. Servers tend to have their own lingo for the meals they serve, and the system could understand them, in list format, with the exceptions. You wouldn't expect the populous to deal with such a system, but a server would pick it up quickly, and they could do it while clearing away dishes.
  • I wonder if it is possible to get repetitive stress injures from speaking to your computer all day?
    ---------
  • The company I work for sued someone for just that. It was destruction of company property - just because you worked on it doesn't mean you own it.
  • I hope you're not implying that it is useless though.

    Having voice as the sole or primary means of input and interaction is obviously a poort coice for power users but having more means of input is usually a good thing. How about being able to send voice commands to apps not in your (typing) focus? Being able to not be required to sit at your computer and walk around via (wireless) mic. You could get exercising in while still doing "work"!

    Not to mention my mother who types about 10 wpm. Not everyone is as leet and skilled as you and the bubble of people you know.
  • But you have to take you hands off the home row to do this. Sure, it's a lot faster for the hunt-n-peck type.


    No, you don't. That's why you should be using a real keyboard, like the Sun Microsystems Type 5c, which has the control key beside the A key, and the escape beside the one.

    And wrt emacs baiting: I guess you guys aren't smart enough to use meta-x global-set-key, huh?

    And no, you don't have to move your fingers off the home row to use the meta on a real keyboard, either -- that's what the escape key is for.

    Yeesh.

    --
  • First, I was wrong about the standard keyboard. She only used Dvorak.

    Second is this quote: "In 1938...Blackburn first laid hands on a Dvorak keyboard. In only a few years her speed was up to 138 words per minute."

    In 1938 this keyboard was clearly mechanical, thus 138 is a mechanical speed. From context it sounds like she never switched to electric so 150/170 are also mechanical speeds.

    As for your claim: "A good mechanical typrwriter is a match for a good keyboard."

    Have you ever even used a mechanical typewriter? FULLY mechanical? As in it doesn't plug into the wall at all? You are probably thinking of "mechanical" electric keyboards (i.e. dedicated typewriters not attached to a computer, but nonetheless electrically powered). REAL mechanical typewriters are VERY hard to type with.

    An electric only requires you to close a relay and then the machine does the work of striking the paper. A mechanical requires you to impart enough force to strike the arm against the paper (a distance of several inches) AND lift the ribbon (and if you are shifting, you have to lift the weight of the carriage with your other pinky). The difference is incredible. I remember as a child trying to learn to type with my mother's mechanical typewriter. I had to quit because I couldn't press the keys hard enough. When I played with an electric in the store I was astounded at the tiny amount of force I needed to use to make the keys go. Also keep in mind that a mechanical requires you to wait for the arms to move out of the way (can also be a problem on the electric, but the distance is smaller) but a computer keyboard has no problem with that.

    Conclusion: You made a foolish initial statement that you can no longer sustain.
    --
    MailOne [openone.com]
  • That's why I bought a Soundblaster Live
  • They've been planning this for years, ever since IBM came out with speech recognition integrated with OS/2 V4 around 1995. They called it WHISPER, for "Windows Highly Advanced Speech Recognition". As usual, MS was far better at inventing marketing terms than creating software.
  • I agree - voice recognition opens lots of new possibilities when you start thinking beyond simple dictation. If the OSS community does get to play with this stuff, I think we can look forward to some really funky toys cropping up.

    Being able to use your computer without sitting in front of it would be great fun (especially if you do manage to rig the house for sound). You could do a whole bunch of stuff before you even get out of bed - call work (using your computer's voice-modem) to say you can't be bothered to go in today, get today's news headlines, play a few rounds of Internet VQuake (a new FPS based entirely on voice commands)...

    It would also be handy while you're at the computer, if you're using the keyboard and mouse for other things. Voice commands could be used to trigger a range of macro commands, and would be much easier to learn than cryptic keyboard shortcuts. Sure, it wouldn't replace common easy-to-learn shortcuts like Ctrl-C - but it could give the keyboard a run for its money with things like 'Font: Helvetica 14pt'.

    I bet it could work really well with the Gimp too - it always takes me ages to find which menu the script-fu effect I'm after is on, and being able to change brush tools without moving the mouse (or knowing the keyboard shortcut) would be handy.

    -- Andrem
  • Of course, there are CMU Sphinx, FestVox, and Festival available under truly open source licenses. http://www.speech.cs.cmu.edu/
  • of functionality that is.
    1) no training is included, you must pay for that
    2) none of the word recognition code is included

    You would be better off starting from scratch. Who knows if/when they will release these "extra" features without the large price tag.
    This is very similar to what Troll Tech did with the OpenGL widgets in QT, i.e. not release them in the GPL versions.
    So beware, and investigate the RELEASED features before wasting your time.
  • As i saw, there are a lots of comments on now useless is speech recognition for coders. Of course it is.

    I would have followed this statment with "there are no computer languages and programming tools that are designed with voice recognition in mind". As VR becomes more commonplace, I'm sure we will see better tools and languages that will take full advantage of all this. Also, if you have VR, your code will change. You will no longer have "int c" but "int count" and other longer, more descriptive names (that I try to do, but int c just comes out).

  • Precisely. I already have my box setup to use festival to talk to me. I wrote a few python scipts and a little gui to create a distributed environment such that any machine I use can do something at a command line (e.g. 'gmake && say your build is nicely completed || say tough eggs the build failed') and have every machine that I'm running the say listener on speak the output. I even have a couple elisp macros to allow for similar functionality from within the only editor/mail reader/ide one should ever need.

    My next step was to go wireless, but if I can go two way wireless even better!

    --

  • Erm, I was simply suggesting that error penalties *should* exist as 'Seizer's' post seemed o sugest that these penalties detracted from a typists *raw* typing speed. My point was that if the typed text is full of errors, it is worthless.

    I have come to the conclusion that you are a classic /. troll. You are the weakest link...Goodbye!

  • Anyone who has tried to figgure out what I'm saying on /. can't wait for me to get VR. My spelling is horrid at best, VR solves that problem. Who knows, I might even have a point worth reading once in a while - I've been moderated up a couple times anyway.

    Spelling has always been my weak point. I like to think that I have good ideas despite that weakness. VR would solve the spelling and fat fingers problem.

  • Lovely. I can't wait to hire a bozo like you in the future...
  • Because Gimp is "free software" and photoshop costs over $300.

    And BitchX for windows is the same as BitchX for linux, only difference is that it runs in a window and not at a console.
    --

  • Partly on topic, partly off - I was wondering if there has been any progression in the last 10 or so years with computer generated speech.

    My amiga 500 back in 1988 came with a speech program. Now, 10 years down the road I have yet to hear ANY computer generated speech that is mildly better or more realistic.

    Frankly, all computer generated speech that I have heard sounds quite terrible. Is there any development going on in this are to make it a bit more human-like?
  • And wrt emacs baiting: I guess you guys aren't smart enough to use meta-x global-set-key, huh?

    Well the emacs-baiting was intended as a light-hearted quip, as I'm sure you'll appreciate. However, I've admired emacs' rich feature set from afar many times, and sat down to do the tutorial on three different occasions, and each time I ended up going back to vi, so I guess I'm *not* smart enough to use meta-x global-set-key ... ;)

    But, I don't like editor wars: I say, if you like it, use it.
    --
  • 160-170wpm...you must be mistaken, unless they are typing a string of 'a's or something. Assuming we take you upper quoted figure 170/60=2.8(let's round that down to 2.5 words per second)Now lets say that you have an average 4 letter word length, that's 10 characters a second. Bottom line, you are making this up.

  • The comment we were following up to was that 'achieving 8 characters a second' is 'practically impossible'.

    If you're having a hard time believing that the 'average coder' types 170wpm, I suggest you stop trying because it's clearly not true.

    I agree that there's a bit of an apples and oranges comparison between piano music and typing. I don't think sight-reading should be compared with copy-typing though: it's much easier to process words as chunks than groups of notes in fast runs.
    -- Andrem

  • ^"Voice recognition is made for secretaries, writers, housewifes and handicap persons, not for coders"? Can none of these former be coders?

    I am a "handicap person" (a physically disabled person is what I assume you mean)--I can type at at best five words per minute--but I am certainly a "coder" [a professional software developer]. I do not make a lot of money, nor do I manage to work incredibly quickly (significantly slower than most able-bodied workers), but I manage (due largely to the largesse of the Australian government admittedly, but I hope that will change).

    Please don't forget that large (-ish; let's be realistic!) group of people that I represent.

    (Incidentally, I am quite excited at the prospect of, once appropriate voice recognition software becomes available, returning to using a Unix-like OS like Linux!)

    [This comment may sound rather peeved; please note that I mean no disrespect to you personally! I understand how easy it is to forget about users like me... I did so for many years!]

    Cheers, Chris

  • Now I can just go lie down on my bed, and shout out commands to my computer across the room... i'm gonna be so lazy now! Thanks IBM! Thanks TrollTech!

  • by Pflipp ( 130638 ) on Thursday February 01, 2001 @05:37AM (#464686)
    ...does this new code integration mean that Debian will have to move KDE into non-free again? I never see the licence issues with this discussed (this stuff was announced before). ViaVioce is not Free Software. So, how do they plan to solve this?

    It's... It's...
  • In all seriousness, try Xemacs version 19.4 or better. The learning curve is pretty decent, because it uses a GUI menu with the shortcuts written in, and it will whine if you use a meta-x command when you've got a shortcut programmed for it. I've had recent-college-grads comfy enough with emacs to use it inside of a couple of hours.

    BTW, the meta x menu uses command-line completion, so you also have a hope in hell of guessing stuff. :)

    Finally, if you want to be really wierd, you could run "meta-x viper", which will run a VI emulator inside of emacs... but don't type "meta-x dungeon" unless you have a lot of time on your hands.

    --
  • by Col. Klink (retired) ( 11632 ) on Thursday February 01, 2001 @05:38AM (#464688)
    Yes, VR is overrated... unless you don't have any hands. Or even if you've only got one hand. Or if you've got arthritis. And so on.

    It's true that VR is not much use for able-bodied coders, but it is useful for able-bodied letter writers who don't type so fast.
  • Voice *recognition* would be great for replacing the mouse as a selector tool. A lot of people only use the mouse to select menu items or highlight windows to accept input. It sure would be nice to not have to take my hands off the keyboard to do some of these things.

    *type type type*
    Bold on
    *type type type*
    Mail window
    *check mail*
    Read....Reply
    *type type type*
    --
    MailOne [openone.com]
  • I actually find it useful for two conditions - when I'm tired and when I've had a few drinks. Which isn't to imply I'm a drunk or anything :)

    Seriously, I find my typing goes noticeably downhill and becomes a serious effort under those conditions, but the dictation (Dragon NaturallySpeaking, FWIW) still works fine. If I need to send an e-mail or something, it's just easier.

    It's not something I use most of the time but it's something I'm very glad to have around.
  • Has anyone done any serious investigation into context-modifiable keyboards?

    I can give one fabulous example - AutoCAD - that uses modifiable mouse pads. You've got a central drawing area on your tablet, and then you've got a huge number of areas outside this, where you can stick functions, and assorted pre-drawn pieces. This 'template' area can be changed at will.

    Another (lesser) example is the good old MVS system. Your function keys vary depending where you are. Kinda nice, really, although I tend to drastically reprogram mine. Thus PF20 (Same as PF8, Page Down, by default) gets reprogrammed as 'Next' for one app (SDSF), 'Find not'blank' 8' in Edit (goes to next paragraph) and so on.

  • ....Score: -1 (TrollTech)

    --
  • Sorry to bother you with this but how the Hell do you focus on a given widget with a voice recognition system ?
    Do you just say "<TAB>" ?
    I don't really like it.
    If I type quicker than I speak and I want to replace my mouse with voice recognition, which is IMHO the only interesting way to exploit it, then I might want to focus on a different text zone using my voice but then, I might have problem deciding which word to connect to which action, isn't it?
    --
  • I've known people with 100+ wpm skills. Isaac Asimov claimed to type faster than he could talk and said this was about 100 wpm. And when I took typing in HS (for which I got a D, btw) our teacher (who was more excited about typing then I am about sex) went on and on about some championship she attended where the winner achieved 200+ wpm.
    --
    MailOne [openone.com]
  • by dj.dule ( 87188 ) on Thursday February 01, 2001 @05:42AM (#464717) Homepage
    As i saw, there are a lots of comments on now useless is speech recognition for coders. Of course it is. Voice recognition is made for secretaries, writers, housewifes and handicap persons, not for coders. Of course it wount interpret your speech in comand line (imagine yourself in someones eyes dicating vi commands). Idea is ti make computers easy to use for non-computer literate persons.
  • [...]
    Linux still discriminates against a significant proportion of the population!

    But who do you mean, you ask. Well let me tell you, for the many, many people out there who suffer from dyslexia the stubborn insistance of the Linux crowd to stick with their command line interfaces is nothing short of discrimination!

    I'm dislexic. Not just a little either. Had to ride the short bus the school for six years. I have an extremely hard time spelling anything right. Spell checkers that don't have the correct spelling for me are worthless (thanks aspell!)

    I don't find Unix names all that hard. They are short. Easy to memorize. I don't have trouble with three letter words. I do when we hit five.

    So don't you go changing /etc to some damn thing I can't spell worth a damn in my name!

  • by slim ( 1652 ) <john.hartnup@net> on Thursday February 01, 2001 @05:43AM (#464719) Homepage
    Typing CTRL-B is surely quicker than saying "bold on" surely? Hitting alt-tab a couple of times is quicker than saying "mail window" surely?

    Maybe Emacs users would benefit, since it probably *is* quicker to say "bold" than it is to type "meta-x-embolden-text" or whatever ;)

    Emacs baiting aside, though, this is great news for a segment of the disabled market, but I really don't see the mainstream applications. Not to mention how awful a place the average open-plan office would become if voice-recognition took off...
    --
  • This is all very interesting, but I still wonder (as do many of the other threads here today) whether the best use of voice recognition technology isn't in dictation and interactive control, rather than actual window/widget control.

    For example, use the Star Trek test. They've got very powerful computers (nevermind that they can be infected with weird space-borne contaigons), but what do they use voice controls for? Asking questions, controlling their environment, etc. When they need to program a new subroutine for the deflector dish, though, they use the keyboard.

    Which brings up another question: Has anyone done any serious investigation into context-modifiable keyboards? My understanding on Star Trek is that their keyboards change their layouts depending on who's there, and what they're doing. I've always thought something like that would be fantastic, say, for switching into Quake or a flight sim -- make your keyboard LOOK like a control panel, so you don't have to remember that "." is strafe or whatever...

    As for voice control, I'd really like to be able to control house systems (see my ÜberTiVo posting under the Set Top Box thread). To say "Play 1812" and have the system start playing it for me. Or "Where's my dinner?" and have the computer tell me to cook it myself (hey, gotta be realistic here). Or to just start rambling on, stream-of-conciousness, in a rant or rave about what's really annoying or cool, so I can edit it down to a letter later. That is what I think we need, and it's more on the application side than on the OS side.

    Of course, we may already have good solutions for this, I just haven't been able to play with them yet... :(

  • let me clarify on my previous statement - to the bozos who were alarmed by my posting:

    I didn't want their creepy fingers prying into my files - allow me to add that these were my own personal files, and copies of remote CVS repositories. I destroyed nothing that was of any value to the employer, they were going to wipe the drive and install NT anyways.

  • Bold text is overrated.


    --
  • Replace all just replaces all occurrences. It assumes that you meant to do what you're doing.

    I only use R/A when I'm doing large regexp replaces. That happens often enough, though, that I learned the keyboard shortcuts to do it quickly. (Why, you ask, would any programmer do that, seeing as how it's rather dangerous? The particular piece of code I was working on when I learned the trick contained a number of rather large tables of the form

    {Name, CONST_, "Name", , ...}

    where it was easy to create a list of the Name fields. That's the kind of thing that old vi-warriors know calls for regexp replace all.)
  • by Shotgun ( 30919 ) on Thursday February 01, 2001 @07:32AM (#464737)
    everyone here is knocking voice recognition as useless. Well, I'm here to say that it is not.

    I used OS/2. Version 4 came with a version of voice recognition, and I ran it on a 100MHz Pentium with only 32Meg RAM. It ruled in the proper place.

    First, the system is good for first drafts of text documents like long reports. Don't expect to get a perfect copy the first time through. The output from the voice input will require some cleanup. But guess what, so does anything I type.

    Very few people type anything close to 80wpm. I only get around 40. Voice type allowed around 100wpm. For those l337 haX0rz that can type and think that everyone should be able to...go out and see the sun every now and then!!

    I would write up my report in note form, basically just outlining what I wanted to say. Anything that I had to quote got a reference to the text I would quote from. Then I locked the bedroom door to keep out noise from wife and kids, gathered my notes and references around and started talking. An hour later I had the first draft of a ten page report. I've spent 4 doing it manually.

    You may not have a need for it, but if you're in school or any other place where you have to produce long reports and you don't type with flaming fingers, then voice input can be a real boost to productivity.

  • I'd keep a copy of Qt around if this happens. I just hope it runs on FreeBSD, I'm not going back to linux just so I can talk to my workstation.

    Now I will need to design a Qt-based WM which supports dual-focus, so that I can have keyboard focus on one window, and voice focus on another... Then I can talk while I type and get work done twice as fast.

    Oh, yeah, Windows has dragon, like some other poster said, but it isn't integrated.

    A new year calls for a new signature.

  • "Typing CTRL-B is surely quicker than saying "bold on" surely?

    But you have to take you hands off the home row to do this. Sure, it's a lot faster for the hunt-n-peck type. But for people who type relatively quickly AND work with several keyboards with different layouts, stopping to find ctrl and then hit "b" takes WAY too long. At least, compared to just saying with no pause in my typing.
    --
    MailOne [openone.com]
  • If your concern is beating Windows, then you'd better hope they hurry really quickly, because Whistler will ship with native voice support built-in.... probably this year.

    As for what that means performance-wise, I have no idea at this point. We'll just have to wait and see.
    -
    The IHA Forums [ihateapple.com]
  • And along these lines, IIRC, the record speed (AFAIK) is about 240 wpm on a Dvorak keyboard. Of course, anyone is free to correct me if I'm wrong, here, but that's what I heard last. :)
  • Doesn't the newer versions of bash do this?
  • It's not useless for coders. Too many people seem to think the only thing that VR can do is let you dictate a memo into your computer, or replace any typing.

    As for use for a coder - you just got done rewriting a majorly troublesome routine. You sit back to relax for a second and say "Okay...build" and let it go. You go get some soda and your computer says "Build successful" so without moving you yell "Run it" and it runs.

    Either that, or you yell Run It and it hears "cat /dev/zero >/dev/hda1"

    VR is wonderful for things like this. Dication sucks.
  • On an almost related note, does anyone else remember an IBM mainframe application called DWIM, standing for Do What I Mean (not what I say)?

    DWIM originally appeared in Interlisp for DEC 36-bit machines. It was a terrible idea. Despite claims to the contrary in the documentation, sometimes it would destroy work by misinterpreting something. I once typed "EDIT" while in a mode where "EDIT" wasn't permissible, and DWIM typed "=EXIT", throwing me out of Interlisp without saving. DWIM was too closely tuned to the typing error patterns of its originator.

    To do this right, a corrector needs information about the potential consequences of what it is doing. For example, knowing whether something is easily undo-able is crucial.

  • Although it's not very hard to type pretty fast in an alphabetic script, it's a lot harder in chinese. The fastest you can get at the moment basically requires a special type of keyboard, one with all 200-odd radical components of characters on, and you type all the components that are in the character you want, and the software works out which character that is.
    However, this technology is reportedly very hard to learn, and not at all widespread. Most people who type chinese characters use software where you type the sound of the character (in the roman alphabet) and it gives you a list of characters to pick from.

    Chinese speech recognition could be much better than this. It could pick up on the tones a speaker uses much better than the roman alphabet can, and it wouldn't require them to know a foreign script.


    This is all in theory though. I don't know how good the software out there is at the moment. There are over 1,000,000,000 people who can read chinese characters, but not many of them have a computer.

  • Yup. M$ is so advanced they are incorporating stuff that was available in OS/2 in 1996.

    So when will they get their interface right?

  • Most coders I know don't even touch-type. I mean, think about it. How many geeks take typing classes in junior high school?

    People may *say* they type 140 wpm, but that's actually extremely fast, and when you're coding, you're using a lot of top-row special characters, which tend to slow even skilled typists down. And remember, actual typing speed = wpm - errors.

    Try this test, folks. Time yourself. Do a typing test and subtract the errors you make from your score. My guess is that most of us won't get anywhere near 140 wpm. I touch type and I still only max out at about 70 wpm - and that's when I'm typing notes like this, not coding.

    I still haven't decided how I feel about voice recognition, but 60-140 seems like a tremendously inflated speed range to me.

  • This may just be legend, but from what I understand it, a while ago, a company (i believe it was dragon, not sure) was giving a demo of their voice-recognition software for the PC at a conference. At one point, the demonstrator dropped into a DOS shell to show how it worked with command-line interfaces. Unfortunately, someone in the audience yelled out "FORMAT C COLON ENTER"... someone else yelled out "YES ENTER"... and thus ended the demo...

    can anyone confirm this? even if it's not true though, it makes for a fun story :)
    -----

  • Most of the coders I know type anywhere from 60-140 words per minute. When coding, this measure of speed goes out the window, but it still is a fair shade faster than actually discussing what they are in the midst of coding.

    I had ViaVoice for a while. For coding it is pretty worthless because you have to switch to the military alphabet to do most variable names, and lots of keywords, and unix commands, and....

    It did Ok for email messages, and was pretty allright for web browsing. Then again a good optical mouse with the scroolwheel is almost as good, and not that bad on my RSI.

    If integrated in the UI it would probbably be useful for lots of things where you have to look over a button panel before deciding what to "click" on. I look forward to seeing it work, but doubt it will allow a whole lot more hands-off use. Er, let me clarify, it won't make hands-off as effecent as hands-on. People that can't use their hands, or shouldn't may well get a lot out of it.

    Besides, it's just palin cool.

  • Okay, I'll take the trollbait...just in case:

    First off, I say this with no hostility or sarcasm, I am sorry that dyslexia has affected you or someone you know.

    However, why should I as a user type Configuration instead of etc?

    Feel free to write your own distro with aliases if you need to.

    If you NEED long friendly names, most of the world has heard of an alternative OS called Windows. It prob even came preinstalled on your computer!

    What you are asking is for the bloat to be removed, yet you insist that others suffer with the bloat of longer names that accomodate your disability.

    This is equivalent to saying "No I don't just want every building to have wheelchair ramps, I want EVERYONE to have to cruise around in a wheelchair, just like me, damn it! Its just not fair!"

  • If you don't pause, your text will probably end up something like this.

    Hitting a key-combo for format codes has the advantage of being synced with your typing -- you know exactly when and where it will occur.

    --

  • WPM? Geez, as an analyst, that's the least of my concerns - programmers should be hired for their brains, not their typing speed.

    That said, I've had the opportunity to work with code that was written based on the 'per line' model. My Gawd, I've never seen so much empty space!

    But yeah, keyboards rock, especially if you like macros. One combo, and 50 keystrokes get played out ...

    I did have one real problem when I tried to implement that kind of efficiency in DragonSpeak. Started subbing one and two syllable words for multi-syllabic, and for frequently occuring phrases. Terible when that carries over into everyday conversation! The worst part is, I've discovered there were others who do the same thing, with the same consequences.

    The best general style for using speech recognition is still for word processing ... first draft dictated, then go through with keboard to edit. (The cat prefers the first part, but the second interferes with lap time ;)

  • In 1981, Barbara Blackburn, from Salem, achieved 150wpm for 50 minutes (yes, minutes) on a mechanical typewriter. It's in the Guinness Book of Records.

    http://sominfo.syr.edu/facstaff/dvorak/blackburn .h tml
  • You hear (pun) a lot about how integrating speech with user interface toolkits will solve the Accessibility Problem. Screen Scrapers can dive through the widget hierarchy and present an audio form to a blind user. The problem is, that these apps really don't work well. Trying to make an application that was developed for a sighted user work for a blind user really can't be solved at the toolkit level.

    Solutions such as EmacsSpeak [cornell.edu] provide much more accessibilty for blind technical users.

  • And *I* will say *this* again: I know for an absolute, 100% fact that the world champion typist in the late 80's was at or near 200 words per minute.

    200 cpm is pretty lame. Given an average 5 chars/word, that's only 40 wpm. Check the classifieds for typists/secretaries. Note the speeds being asked for. Minimum 40, usually 50, often 60 words per minute.

    Instead of coming back with responses like "I just don't see..." or "I can't possibly believe..." why not go find a link that lists fast typing speeds as either wpm or cpm?
    --
    MailOne [openone.com]
  • by Nerds ( 126684 ) on Thursday February 01, 2001 @06:14AM (#464780) Homepage
    Use your imagination here, nobody said you have to use it for dictation. I'd like to set up my own HAL-style computer with a few microphones throughout the house and program it to open xmms and play a song or control lights and possibly other appliances. Even better, allow it to take remote voice commands, so I could call my computer from my cell phone and tell it to start making coffee or run apt-get update.

    In any case, there are a million and one cool things you can do with voice recognition (well, until your HAL-9001 tries to kill you and you end up dead or on another plane of evolution, whichever comes first), and I'm sure the ideas I have right now are just the tip of the iceberg.

  • by firewort ( 180062 ) on Thursday February 01, 2001 @11:49AM (#464783)
    I can speak faster than I can type.

    If I can have my speech recognized *accurately* then I can gain in productivity.

    Real world proof are the articles of Charles W Moore at www.macopinion.com and www.applelinks.com
    He creates all his articles using iListen for Macintosh.

    If I can do the same with my linux boxen, then this is a dramatic leap forward for me.


    A host is a host from coast to coast, but no one uses a host that's close
  • by GregWebb ( 26123 ) on Thursday February 01, 2001 @08:16AM (#464786)
    You don't have to spend hours training you coffee machine.

    For that, all you need is command recognition. It's orders of magnitude simpler than dictation and can be done with little or no training.

    Listen to ViaVoice's recordings of what it thinks you've said when playing with its correction feature and you'll see just how hard a job transcribing complete, dictated continuous speech with a wide vocabulary really is. Even deciding where one word ends and another begins is far from simple - but that sort of problem is so myuch simpler with a limited vocabulary and no continuous speech requirement. Both of which can be done with that sort of device.

    I agree about coughing and, erm, well, thinking, er about what you, er, were trying to say. I always found I needed fair presence of mind to get something readable (especially if formal) down on the page. If you think the above is exaggerated, try dictation software and you'll see what I mean.
  • what standards are there to enable us to seperate computer commands from normal conversation? we have to be like ST and prefix everyhting with "computer", or are there more intelligent ways?

    Easy, you just talk into the mouse with a scottish accent.

    --

  • requiring any query about the weather to be explictly programmed.

    So? There is still a relatively small set of similar queries to be done in Joe Average's life, so even if this has to be explicity programmed, it would still be incredibly useful, and wastly more useful than actually walking over to the computer and click on an icon or write something on the command line.

    If I had moderator points, I would have modded the original comment high.... :-)

  • I went to school with a guy who is *still* looking forward to the day when Windoze WP's can keep up with his typing. When I first met him, he typed about 140 words per minute. The usual 'cliteky-click' of typing we all hear was more like a quiet buzz for him. He had to take breaks every few minutes so that the old 9in Mac he was using could catch up with him and he could look for errors.

    The keyboard was fast enough to keep up with him, but the WP software itself wasn't. Go fig...
  • Which brings up another question: Has anyone done any serious investigation into context-modifiable keyboards? My understanding on Star Trek is that their keyboards change their layouts depending on who's there, and what they're doing. I've always thought something like that would be fantastic, say, for switching into Quake or a flight sim -- make your keyboard LOOK like a control panel, so you don't have to remember that "." is strafe or whatever...

    Some years ago there was a programmable keyboard made by Siemens (and probably a bunch of other companis, but that's the one I saw) that had a little LCD display in every key. The LCD's resolution was pretty crappy (8x8 or so), but you could program every key differently.

    There just wasn't any software that used it, and pretty much nobody used any other than the standard layout, so it didn't pay off. Maybe nowadays you could try it again, for gamers. You would need a better resolution though.

    Another thing that would be interesting would be keyboards where you change the shape of the keys, too. You're just not gonna get force feedback on the keys, and that's a big problem. Would be cool, nonetheless. ;)

  • by kjz ( 26521 ) on Thursday February 01, 2001 @06:18AM (#464807)
    I see a number of postings here to the effect that voice recognition, especially for dictation, will be largely useless. The problem is that these postings are considering the use of voice recognition as a replacement for typing within the current crop of user interfaces.

    The true power of voice recognition is not in replacing the keyboard. It comes with allowing new forms of interaction with a computer. Consider the simple task of checking the weather. Pulling up a browser and heading to weather.com is no big task, but why would I want to sit at my computer and have to do that just so I can decide how heavy a sweater I'll need for the day? Why not just ask the computer to read me the forecast while I'm getting dressed?

    Many people would assume in this scenario that one would call out: "Computer, browse to h t t p colon slash slash w w w dot weather dot com. Read page." How about simply calling a script intead that does all the hard work behind the scenes? "Computer, what is the weather forecast for today?" The use of predefined grammars, as the article describes, will make such queries very reliable as they will be much easier to recognize.

    This may have been a simple example, but hopefully it gets the point across. Voice recognition is not going to replace typing. As many have said, some people can type much faster than they can dictate text. Once you start considering higher level interaction with the computer, however, the situation changes, and voice recognition systems will really show their colors.

    -kris
  • However, why should I as a user type Configuration instead of etc?

    You wouldn't. You would type Co<TAB>.

    Of course, the simple solution is surely just to set up symlinks to these directories. That way you don't have to type them correctly more than once. It's easy enough to do the same for application names.

    On an almost related note, does anyone else remember an IBM mainframe application called DWIM, standing for Do What I Mean (not what I say)? It did a great job of detecting typos and suggesting what you might actually have wanted to type. I'm sure someone could come up with a shell that worked on similar principles...

  • If you thought having some yuppie ensconced in a SUV barreling around while yakking on his/her cell phone was dangerous, just wait until you get some sysadmin trying to reconfigure his/her server using voice control while driving...
  • by Throw Away Account ( 240185 ) on Thursday February 01, 2001 @01:34PM (#464818)
    So, Windows 200(1|2|3) = OS/2 1996. Glad the Windows world is catching on. Maybe now they'll adopt a system-wide object model like OS/2 1992.
  • Speech recognition isn't perfect. It's not technology you can use casually, but it is usable with practice and training (for the software and the user). John Ousterhout, the creator of Tcl/Tk [scriptics.com], has been using it for years after he developed problems with his wrist [scriptics.com].

    Until a year ago, there were four leading speech recognition firms out there: Lernout & Hauspie, Dragon Systems, IBM, and Philips (barely). Dragon was near bankrupcy, and L&H bought them last year [cnet.com]. Now L&H is being rocked by financial scandals (see this list of articles on them in CNET [search.com]), and may go under as well.

    IBM, on the other hand, has supported their ViaVoice SDK for Linux [ibm.com] for a long time. They also sell their ViaVoice dication software for Linux [ibm.com].

    Without IBM, speech recognition might die. I'm glad to see they're pushing it futher, especially on Linux.

    P.S.: "Voice recogintion" identifies people; "speech recognition" turns what they say into text.

    P.P.S. It's possible to bind sneezes, sniffles, coughs, etc., into "null text."
  • by Dr.Dubious DDQ ( 11968 ) on Thursday February 01, 2001 @08:57AM (#464828) Homepage
    Or even if you've only got one hand.

    Okay, somewhere in there is a wise-ass comment about the usefulness of voice-recognition for porn-surfing, but I won't stoop to that level... :-)


    ---
    "They have strategic air commands, nuclear submarines, and John Wayne. We have this"
  • by LordNimon ( 85072 ) on Thursday February 01, 2001 @08:57AM (#464829)
    The limitations in the Windows version don't exist in the OS/2 version (which was released in 1996), because the OS/2 API allows apps like VoiceType to integrate with existing applications. There are also a VoiceType API and a toolkit that let programmers provide specialized integration (see Papyrus Office [rom-logicware.com] as well as defining your own grammar.

    The only drawback with the OS/2 version is that it only supports discrete, not continuous, dictation. This means that you need to pause between each word. For voice navigation, that's not a problem. You also have to go through a three-hour "training" session if you want it to work well.

    So before you get all excited about how Linux might beat Windows, you should not forgot that OS/2 is real competitor here.
    --


  • Okay, time to clean up Emacs' auto-saves...

    % rm *~
    *~ not found. Assuming you meant "rm *"
    %


    ...hmm, I guess there weren't any auto-saves.

    % ls
    %


    Hey, where's my code?

    According to the story as told in the Jargon File [tuxedo.org], DWIM actually stands for "Damn Warren's Infernal Machine", at least in the opinion of the victim of such an accident, who then wanted to tie the author to his chair and enter the same command on his workstation. Twice.

    David Gould
  • Am I the only one who finds that a properly selected task specific MP3 playlist can add that critical 10%+ performance boost that spells the difference between "adequate" and r-r-rippin'?

    I especially value it when I am doing literary stuff ("Mood music" if you will) or need to pick up the pace for a deadline. And then there's the selection of "We kick the world's ass" anthems that I save for emergencies when you have to either go into ultra-arrogance or lapse into despair (Warning: not for use when you may have to deal with actual humans intermittently!)
  • Just because you can't imagine it, doesn't make it false.

    I have an electronic keyboard that plays a certain piece of music at 220 beats/minute. That's 220/60 = 3.6 beats/second. Since all of the notes in the music are quarter (or shorter) that's 12-16 notes/second. I've seen this piece performed by an actual human at roughly the same speed my keyboard plays it. Amazing, yes. Impossible, clearly not.

    Piano and typing use a lot of the same hand movements--in fact, typing is EASIER since the keys are closer together AND don't require much force to press down.
    --
    MailOne [openone.com]
  • It is oss so it will propably be adapted by gtk

    ViaVoice is Open Source???? There was nothing, either in the linked article or in its comments, that would imply that. Can anyone provide a link to confirm?

  • Well,
    :%s/stupidity/intelligence/

    is much quicker to type, than using the mouse to:

    Edit|Search and Replace
    Search "stupidity", Replace "intelligence"


    But it isn't faster to type than ctrl-H, stupidity, TAB, intelligence, Alt-A. That's the key sequence in MSVC.

  • Getting it more tolerant would allow you to have some Pizza in one hand a Beer in the other and say:

    "ftutec buoyd fpaat"

    getting

    "static void start()"

    on the screen. Or

    "brmmpf fbhurrrgle"

    becoming

    ". profile"

  • Comment removed based on user account deletion
  • We're talking about *integrated* VR here, not just a VR application.

    Both Dragon's Naturally Speaking and IBM's ViaVoice run under windows. Dragon is *excellent* at taking dictation and writing letters. I've found it far superior to ViaVoice. On the other hand, it's not so good at controlling the rest of the computer. When it comes to moving windows, opening menus and applications and browsing the web, ViaVoice reigns supreme.

    What IBM was never able to do was to tightly integrate that sort of GUI control into the system. This is what IBM is doing with QT.
  • just wait until you get some sysadmin trying to reconfigure his/her server using voice control while driving...


    You think that's bad? Check out VOCP [sourceforge.net], which is a voice mail system that allows you to get a shell. Yep, it decodes DTMF and allows you to type in commands. The whole thing is extremely cool.

    So I'd rather the voice recognition for the driving admin thank you.
  • How about:
    • tv (guide) (on||off) (channel) (n)
      e.g. "tv on sky1"
    • play (media) (by||like||match||move) (query)
      e.g. "play music by britney"
    Turning the tv on and off and changing the chanel, all without even having to find a control....throw in the full power of a decent set-top box. I think this is a killer app!
  • by Bonker ( 243350 ) on Thursday February 01, 2001 @05:28AM (#464857)
    Most of the coders I know type anywhere from 60-140 words per minute. When coding, this measure of speed goes out the window, but it still is a fair shade faster than actually discussing what they are in the midst of coding.

    Most writers I know type anywhere from 60-170 wpm. I type on the lower end of this scale, about 80-90 wpm. Again, this is significantly faster than I can comfortably speak.

    When *editing* code or text, however, voice commands cannot hold a candle to a combination of mouse and keyboard commands, especially with newer trackballs and 'wheel' mice.

    "Page up. Page up. Page up. Stop. No, go up. Stop! Not delete! Damnit!"


  • There are a million ways around this "problem". First, Qt is released under a *dual* license, so it's perfectly LEGAL to distribute and use a ViaVoice/Qt library. Second, the only parts of KDE interfacing with the ViaVoice module will be the LGPL parts. Third, ViaVoice could very well be a runtime module with runtime linkage and all that.

    And finally, if you're still too worried, try a less retentive distribution. I got a kick out of Debian calling me a criminal for giving my friend a copy of KDE1 (it would have been wrong to say no), and I can't wait for them to do it again.
  • I also used the OS/2 voice recognition (which was the precursor to ViaVoice) on a 100Mhz box. I only had 16Megs, so it was a bit slow, but I am still amazed at how accurate it was.

    I didn't do much dictation with it, but it was great when you were trying to play solitaire with your hands full eating a greasy double cheesburger...
  • Pretty much as soon as IBM's ViaVoice toolkit was available GVoice came out. From memory it lets you bind voice accelerators in the same way you bind keyboard accelerators. Check out this article about it in the GNOME summary [gnome.org] from June 1999.
  • Actually, I think VR is really kewl! I remember using it years ago when the sound card came with some simple VR software and we had a hand held scanner, and giving commands to the scanner by voice was far better than by keybord or mouse, because I wouldn't have to take my hand of the scanner. It worked very well!

    Anyway, I think we will see mainstream use of VR, in not too long in connection with applications used on living-room machines and things like that. The Nokia Media Terminal for instance, that was recently shown on /., I mean, I would much rather like to talk to that machine while sitting comfortably in my best chair than a have keyboard or a remote controller in my hand. I think that's going to be a very important application of VR.

    Also, in the kitchen, I mean "damn, I've got my hands full turn off that hot plate, will ya? thanks". Nice, eh? :-)

  • 1) No, she was using Dvorak only at the 170 speed. The 150 was a standard keyboard. At least, that's how I read your quote.

    2) The previous poster said this was a MECHANICAL typewriter. Electronic is MUCH MUCH faster.
    --
    MailOne [openone.com]
  • "Well to be fair, if people make errors there should be a penalty."

    You obviously have no clue what you are talking about. Errors ARE subtracted from wpm rates.
    --
    MailOne [openone.com]
  • Just don't mention "rm -rf" when you're near the microphone ...

    Hah - I actually tried this the day I left my former employer. It was only my desktop workstation, but I didn't want their creepy fingers prying into my files, so I did su -l, rm -rf / - the command returned an error claiming I didn't have a lock on a certain process, and it couldn't complete the command.

    If I hadn't been lazy, it would have been nice to code up some wipe tool that was used in Cryptonomicon...

  • Well, I think one thing that you are forgetting here is that when you are typing code, you are for the most part not typing at your highest speed, unless you have a much faster mind than I do. Chances are you are typing out a few lines, then thinking.... so on and so forth.... for writers, this is different, as they can go a bit more stream of conscience than a coder can.

    If you wanted to use it for dictation, you could also do that. My father uses speech recognition for dictation and finds it alot easier to do than writing things out by hand. Admittedly, he can type much faster than he speaks, however it tends to drain him less than typing it out does, and he is able to do more (though it takes a little extra time) than he would if he was just typing it out (less brain drain).

    Where I see the big advantages coming through in this is in the overall OS control. I don't see this controlling everything (unless you so wanted it that way, in which case be my guest), but I can see this helping in my "multitasking" of many different things at once. You could for instance be typing away at a letter and realize that you need to bring one up to reference it, but you don't want to click through and find it because you're on a roll. You simply tell the computer "bring up letter X in the background" (

    One of the big problems that happens with new technologies is that everyone says that this is going to replace some device that is used every day (remember people have declared that the PC was dead for about 10 years now if I recall the first time I heard that). In reality, what will happen is that people will discover a way to use it to work a bit faster than they did before, using their every day means + the new technology rather than just the new technology itself.

  • Ah yes, OS/2. I loved it. I still do. If there was any support for it I would still be using it.

    The OO desktop beats anything else I have ever seen. Add on ObjectDesktop and it got even better. Over the past few years using Linux/BSD and KDE, I've been thinking how to do the same thing under Unix. I've come to the conclusion that you can make it LOOK like Warp easily enough, but it will take a hell of a lot of work to make it act like it.
  • The point of integrating voice recog with the user interface toolkit seems to be that the applications themselves need not be made voice-aware. Just as an application today doesn't care if a widget is activated by direct mouse click or keyboard shortcut (um, at least in sensible toolkits it doesn't, I assume Qt is sensible in this regard), this makes voice a transparent input method. Nice.
  • when we will have wearable, I'll then see my pointer more often on my female colleague's ass than on my source code window ;-)
    --

The optimum committee has no members. -- Norman Augustine

Working...