Opera Promises Voice-Operated Web Browser 352
unassimilatible writes "Opera's latest browser talks and listens, according to AP.
The new browser incorporates IBM's ViaVoice technology, enabling the computer to ask what the user wants and "listen" to the request. "Hi. I am your browser. What can I do for you?" asked a laptop with the demonstration versions of the browser. The message can be personalized, such as greeting users by name. The computer learns to recognize users' voices, accents and inflections by having them read a list of words into a microphone. Opera plans to first launch an English version of the voice browser for computers running the Windows operating system. Versions for other systems, including handhelds, will follow. Opera's press release has more details, including Opera's hopes that people will adopt this technology for presentations - and to replace PowerPoint."
a few things to say... (Score:5, Interesting)
as for their statement about it being a replacement for powerpoint, I don't think that this will fly either unless they either: a) find a company to make a powerpoint alternative which saves to html files b) make the aforementioned software themselves. Even if they accomplished that, people's stupidity and ignorance has proven time and time again that whether microsoft's software is better, worse, or just as good as its competitors- people will buy microsoft's software instead of others. Look at openoffice.org, mozilla (most people use ie)/opera/konquer/galeon/netscape/etc, linux, amd a bunch of other superior software. Maybe a couple could be explained (linux often involves use of the command line interface, netscape is slower to load (even though ie cheats by loading some of the program at startup time)), but most of it is due to a problem which exists somewhere between the keyboard and the chair. Besides, I would find a remote control a better option than speech, since a remote control wouldn't force me to scream "NEXT SLIDE" across the room like an idiot before it recognizes what I'm saying. It would also be much smoother to just press a button on a remote control.
Browsing with people is a pain (Score:5, Interesting)
The point here is when it's hard to instruct intelligent people how to browse the web, how well can a computer do it? I have my doubts.
Re:a few things to say... (Score:5, Interesting)
And as for presentations, who says you have to stop your speech to scream "NEXT SLIDE". Imagine a presentation package capable of picking up from your presentation exactly when you'd like the next "slide" (useless word since you could now do much more than you are constrained with using Powerpoint).
Imagine, during a presentation, being able to say "If you look at the sales figures for the year..." and have your presentation automatically display those figures.
It may have a niche. (Score:5, Interesting)
I can't see this having wide acceptance in the corporate world. Cube farms are noisy enough. I can't imagine what it must sound like for everyone to be browsing by voice.
I also can't imagine some of my co-workers saying the addresses of what they browse out loud. *shudder*
English is the problem. (Score:2, Interesting)
Maybe when AI is done (Score:4, Interesting)
Once we have a computer that can do this, we'll have great interfaces - it will be like robo-butler. But we're not there yet, and robo-idiot-child - "I thought you said Quick Bananas, so I googled and we're at the Dole website" - is only going to make things annoying.
It will be a boon to those who can't use point and click for whatever reason, and ignored by everyone else.
Re:voice operated? (Score:2, Interesting)
I've tried this with Dragon (Score:5, Interesting)
I got a free copy of dragon dictate once so I trained it as much as possible.
I got mozilla working quite happily, 'down' 'up' 'slow' (that was a good one, it slowly scrolled down), 'back' etc.etc.
the thing I found after weeks of training that it was just so tiring talking all the time
Re:Voice activated Powerpoint? Uhm, no... (Score:3, Interesting)
But personally, I think this has great potential for presentations, without disrupting them - especially if you could control the commands used to advance each slide. For example, if you could program the transition to a sales figures slide to be triggered by the words "sales figures for 2002", then it would automatically pull up the right slide when you say "Now let's look at the sales figures for 2002". Properly scripted, it could be pretty slick.
I once got paid good money just to launch PowerPoint presentations and click the "next" button all day. These people might have been ok with running the presentations by voice - but a two button device connected to a computer (wired or otherwise) was too intimidating.
Re:a few things to say... (Score:5, Interesting)
OpenOffice [openoffice.org] can save to HTML and Flash files from Presentations.
Even if they accomplished that, people's stupidity and ignorance has proven time and time again that whether microsoft's software is better, worse, or just as good as its competitors- people will buy microsoft's software instead of others. Look at openoffice.org, mozilla (most people use ie)/opera/konquer/galeon/netscape/etc, linux, amd a bunch of other superior software.
People buy Microsoft software because they are
a.) not familiar with the competitors
b.) worried about compatibility with the rest of their microsoft software
c.) do not want to retrain staff
d.) need feature X which competition lacks
e.) work for Microsoft or are otherwise affiliated with them.
f.) do not trust an unproven product (in their eyes) and don't want to be the guinea pigs
Point being, as other software matures it will be harder and harder for Microsoft to release sub par software and expect a solid buy in. If you look at Mozilla it's growing speed very fast now, I know a number of Windows users that aren't even very technical that use FireFox and/or Mozilla. Look at OpenOffice, Microsoft is killing themselves with their own Doc standard. They can't move future iteratios of Office to abandon or morph the compatiblity of
As far as Opera's voice operated browser goes I think this is great, especially for disabled and handicapped people. I also think there's a certain appeal to be in front of a board and say Next slide to your openoffice html/flash presentation and have it progress. I mean what a way to impress.
Re:Homophones... (Score:3, Interesting)
One potential workaround is to have a short period of 'sensitivity' after common homophones.
For examaple the speaker says 'Final 4' but the browser types 'Final for'. The software recognizes that 'for' is a common homophone and waits a *very* short time ( a second or two ) after the uttering of 'for' for *another* occurance of 'for', which would imply a correction. Also an occurance of a special word eg. 'no', followed by 'for' in that short period would imply the alternative 'for', ie. '4' is correct.
To override the 'quick correction' the person speaking can simply pause after homophones that are to be repeated in dictation or followed by control phrases.
Re:Useful for mobile users (Score:1, Interesting)
I don't believe they're going for the desktop here. (not for capable users anyway)
Bye Bye sore thumbs.
Re:a few things to say... (Score:3, Interesting)
So many times while putzing around the house or driving I've wanted to bark out a command a la Star Trek and having Google answer me. Very cool.
Although if it chimes in with - "It sounds like you are trying to browse the internet, would you like me to help you?", then someone will surely have to die.
Voice internet... (Score:4, Interesting)
Imagine a simple voice interface for limited internet functionality. Place microphones and speakers around the house. Now, when I'm sitting on the couch reading a book and I come across I word I haven't seen before, I can say "Hey Frank, lookup the word '...'." Need the weather? "Hey Frank, what's the weather report?".. Etc, etc..
It should be fairly simple to tie a speech recognition engine to some python scripts to perform simple queries and return a parsed result ready for text-to-speech conversion. One big problem the dictionary feature brings out is how the speech recognition would handle unfamiliar words. Even leaving that feature out, it would be nice to have a limited set of features I could use anywhere in the house.
Use some sort of unique gating phrase('Hey Frank!') and look for the nouns and verbs to give it some flexibility.
Re:a few things to say... (Score:2, Interesting)
-Jim
An idea looking for a market... (Score:2, Interesting)
I'm kind of in that boat myself too. While I think that anyone would readily play with such technology, there haven't been a lot of people willing to stick with it, and I think that's largely due to the "Who am I talking too? It's just a piece of furnitue" mentality.
Someday, when we're all oil for some future earth mining civilization, people will talk to their PC's and be able to hold up conversations with them I envision.
Something like:
"PC, Can you tell me when my next meeting with Mr. SoAndSo is? Oh! And bring up CNN for me would you? I want to check the headlines"
And the computer would respond with something like "Your next meeting with Mr. SoAndSo is currently scheduled for May 18. Would you like me to change that?"
And the user would say "No, just go on with the headlines please", to which the computer would start telling the user about the headlines of the day. It would interject little things like "CNN is reporting that 30 people died in a plane crash in Switzerland, but MSNBC's saying that only 24 died, so I'm not really sure which is accurate right now.
It'd be much more a conversation than you and I currently saying "PC, Go to CNN", "PC, Open Word", and so on. I would imagine that eventually productivity usage of the computer could be entirely verbally driven, from dictation to simply helping a user through his day... Something you could "chat with" while getting dressed, working on something else, exercising and so on. The PC would be our informer, figuring out what we want, and offering opinions and information based on discussions we would have with it, as well as prior conversations, and expressed interests. In short, it would do what a computer's always been designed to do: It'd make our lives easier, but in ways which simply are not possible today.
Right now such technology is very clunky when compared what I've described... Kind of a silky smooth "invisible friend" of the future. I understand that there's obviously going to be a lot of "in-between" stages for such technology, but I'd rather see todays developers focusing on making my PC more productive as opposed to sticking an auditory interface over a point-and-click technology. When my computer can surprise me with its knowledge and vocabulary, as opposed to repeating phrases I've programmed into it, and translating text into speech I'll be impressed.
Simply converting the on-screen text and reading it to me in a monotone voice is not what I want. I want my PC to know the types of news I frequently look for, and I want it to be able to paraphrase, and provide it to me in a meaningful, well-articulated manner. And I want it to feel like someone's there personally telling me of the days events. I want to be able to interupt and request greater detail on a specific bit of news. In short, I want my computer to work for me, and I want it to grow with me as my needs and interests change.
But that's so far down the line... 8(
For now this is a neat technology, but I'd imagine it will only appeal to the true geeks out there. Most will play with it and then go back to the more "private" methods of interfacing, such as mouse and keyboard.
Re:Voice internet... (Score:3, Interesting)
Next generation (Score:3, Interesting)
That system was simpler, since it couldn't rely on special voice-HTML markup tags. It took advantage of the fact that any UI element (menu item, button, etc.) in the system can be activated by speaking its text. So they added a quick Hack to Netscape so that every link's text (or ALT text) visible on a screeen would be present on a "Links" menu - thus turning the links into speakable keywords.
It worked very well for browsing. Much less well when you want to enter new URLs. The dictation mode left a bit to be desired. But that was to be expected from the hardware of the time. Voice recognition on OS/2 required a minimum of a 150MHz Pentium, IIRC. (It would work - with much latency - on my 80MHz 486, however.)
This and NASA subvocalization. (Score:3, Interesting)
After NASA announced their subvocalization project (I'm too lazy to find the slashdot URL... someone earn karma for it!) I became excited again.
The problem is if you're in an office you can't just start talking. Right now there are 10 people around me and most people are silently working on there computers. If they all started barking commands it would be loud as hell in here. It just doesn't scale.
If you add the subvocalization work this totally changes the equation. Now I can silently tell my computer to do things while my hands type away.
This is going to ROCK. Talk about multitasking... I can be typing out this slashdot post and without stopping I could launch gaim, ymessenger, make sure I'm on IRC... startup Emacs in the background , etc.
w00t!
Gimme gimme! $100 says the Mac has this next year and Linux has it sometime around 2015.
motivation (Score:1, Interesting)
Here's Hoping... (Score:5, Interesting)
That said, I think the most crippling thing about PowerPoint is its linearity. Not all presentations "want" to be laid out into a preset order of points. If a college professor or a businessperson gets asked a question during a presentation, all too often it is diverted by saying "well, that's coming up in a few slides", or the presentation is interrupted as tangential data is introduced.
Using voice recognition instead of click-through navigation opens up some great possibilities for non-linear presentations, though. Imagine that, instead of organizing your presentation into a linear timeline, you group slides and other media into "points", each of which represents a different idea relevant to your talk. You can arrange these points into a web, indicating what information depends on prior knowledge from other slides, etc. You then assosciate each point with an audio "cue", say a phrase like "projected profit margins" or "the three kingdoms period". You'll note that these phrases are things you're likely to naturally utter in your presentation anyway. This has the advantage of enabling you to speak totally naturally without interrupting your presentation. To avoid accidental jumping, we would have, say, a little translucent blue arrow fade into being every time a cue is recognized, disappearing a few seconds later. If you actually want to jump to a new point, it's just a quick click of a button when you see the blue arrow.
So, imagine you're giving a sales presentation to a group of executives. You notice this particular group is getting bored with your standard sales pitch. No problem, as you just drop a key phrase into your speech, and instantly change your presentation to include information you think will appeal to the business interests of your audience, or simply to their personality. Or, imagine a professor is giving a lecture on a peice of literature. A student asks a question about the author's background, and the professor can easily insert some information on their country, their historical circumstances, and their life.
Of course, organizing this type of presentation requires a greater investment in planning, and certainly requires a little more cognitive ability than your standard PowerPoint fare. However, those who work with these new presentation systems will be giving themselves an undeniable competitive advantage over presenters using linear methods. And those in the audience will be grateful, I'm sure.
Re:OS/2 v4 sort of had this... (Score:3, Interesting)
The WorkplaceShell was, and still is, the most incredible desktop I've ever seen or developed for.
LoB
Re:i can hear see it now (Score:1, Interesting)
Man, I miss those old Apple technologies. Apple had synthesized speech down pat years ago, and Microsoft's default TTS in current Windows version sounds downright primitive compared to even the most basic of the Macintalk voices from System 7.x, for goodness' sake, not to mention "Macintalk Pro English Victoria." And Apple's voice recognition was the best free implementation for any OS. Oh well. I'll have to pick me up a new Mac one of these days to see if these technologies still exist.
Try mouse gestures. You might be suprised. (Score:3, Interesting)
Personally I did it because I didn't like how much space the icon toolbar was taking. My use of opera also opens most pages in other windows.
So for instance to reply to you I rightclicked the link and moved the mouse down a bit opening this reply window in a new tab. Why? Well I am finished with this reply I will hold the right mouse button down and do a down and to the right movement, other move is also available, and close it and be instantly back where I was reading. I notice that this seems faster as some pages seem to insist on reloading if you do back. Also my move is one close and not two backs.
I am not saying it is for anyone but once I was determined to use it I was amazed how easy it was to pick up and get totally used to it. Of course it means that when I am on a IE box I am totally out of my depth.
Am I working faster or better with mouse gestures? It certainly seems more relaxed to me. Will I like voice commands? Well I got music on constantly in the background so perhaps not unless they got that sorted out.
Re:Word Processing is clunky, will this be better? (Score:2, Interesting)
And I and many others think it makes a lot of sense. Presentations are a good example of helping people understand the problems Multimodal is meant to solve. Obviously we were interested in the fact that devices got smaller with each passing year and no matter how we tried there were still 26 chars in the alphabet.
Multimodal is still a very new technique and a lot of work has to be done to define how it should work. Just like on phones when you start speaking you expect the other person to stop these interfaces evolvoed over a period of time, they are in many ways so subtle you won't notice them until you do them wrong and say... Hmmm that isn't right lets try this.
I know some of the earliest Multimodal interfaces we had were tied to the Broadband TV stuff that Motorola's recently purchased Geneal Inst group did. So the idea was pick up your nextel phone and using PTT tell the TV to list all the shows currently playing with Cary Grant in them. These kinds of queries are easy to write for voice and are quite powerful.
Obviously the nextel phone was the wrong input for it, but it shows the strength of Multimodal. I could fill out voice dialogs using email or SMS pages if I wanted.
The first version of the Motorola Multimodal Fusion Server worked on the NexTel network and not only was able to combine modalities on different machines but was the first example of Distributed Speech Recognition on a public network, and I am positive a lot of the stuff we did 2 years ago in our labs will find it's way onto your PDA and cell phone soon. Opera is giving you a frist crack at it.
Jer,