I love the thinking behind this, technology ubiquitously deployed so that it's everywhere: available, literally, instantaneously. The article mentions that we, and I'm assuming we means the average American in the USA, spend an hour in the bathroom every day. And the New York Times intends for us to use some part of that time to read their newspaper. hmmm...
People spending that hour in the bathroom are busy doing their daily grooming which pretty much ties up their active eyesight and participation.
The very first thing I thought about the product was, a mirror that lets you surf the web while you're grooming? The idea is good geeky fun, but is it really useful and usable?
Now that I have read the article and watched the first video, I can see that some good thought has been put into how people may want to use it.
Yes, people's hands are busy while they're in the bathroom, (and they should wash their hands before they touch anything...), So a mouse interface is not a good idea. The innards of the mouse would soon be gunked up, rendering it useless. Between hairspray, steamy water vapor, and makeup powder, even an optical mouse would be inoperable. And a women's bathroom would be just as bad. :-)
The way to make this product successful is to put a much better voice response interface on it. The user should be able to have a conversation with the product and have it act as their agent. In the demo the system is responding by changing what it displays on the screen. You should be able to "chat" with it. If you're shaving, or if your putting on false eyelashes, you should be able to ask the system a question, and have it respond to you by voice rather than forcing you to go look at something. Any time you're waving a sharp blade near your face, or gluing anything on your eye, you don't want to take your eyes off what you're doing....
And the read back voice has to be natural and pleasant sounding, and naturally, customized to the user's preferences. Of course the Kinect system should recognize each individual it has been set up with and switch to their user profile as soon as they recognize them.
For example, if the user says "where is the movie the way you were showing today?" Notice that the user doesn't put the movie title in quotes or specially delineate it in any way verbally. The user has to be able to rely on the system to successfully parse the verbal utterance when it is the normal and natural way people speak.
The system needs to respond with "that movie is showing at 4 different theaters that are near you" (if the movie was only showing at one or perhaps 2 theaters, then the system would immediately say the names of those theaters, as in "that movie is showing at the Strand, and the Cinemax".
The system will need to know what theaters the user is already familiar with. That is the kind of information it learns over time from interacting with the user. As further training for the system, as the system interacts with users from all over, it should tokenize the interaction sequences, remove all the personal identifying information, and upload the tokenized interaction sequence to a central repository. That way the central repository can learn all the different ways users have of interacting with different named things and improve the number of things the agent side know how to do. For example, an object named "movie" has its own name, [title], and each movie can be associated with one or more theaters. Each theater is associated with a physical address, and a phone number, and of course its own name. And the relationship between movie and theater has associations with multiple show time/sub-theater pairs. The show time/theater pair that is associated with the movie-theater relationship has 2 properties associated with it, the 1st is the time the movie will be shown, and the 2nd one is which sub-theater at that theater the movie will be shown in at that time. Since the system understands all these relationships, when the user asks "what time is it showing at the Strand", it understands, from tracking context, that "it" is referring to the movie. And it understands that the query means to display the information in the movie-theater relationship, which is made up of the show time/sub-theater pairs. By the way, in addition to the show time/sub-theater pairs, the movie-theater relationship also has a start date/end date pair associated with it, so if the user asks "how long will it be at that theater", the system will know to respond by reading off the end date of the start date/end date pair associated with that movie-theater relationship. Clearly all this fancy capability implies that the system knows how to go out on the Internet and successfully find that information about the movie and successfully parse that information into the appropriate pair relationship fields. And that may be almost as hard as doing good speech recognition.
Of course the system should not only respond by voice. When it makes sense to, it should also display information. For example when it has to list out the showtimes of the movie, it might want to actually generate a vertical numbered list of the showtimes on the display while it is reading them off. That gives the user the information into different formats, and makes it unnecessary for the user to try to memorize the times when they are read off. They can just glance at the display. "send Bill an e-mail asking him if he wants to go to the 2nd showing with me" In response to this command, the system should be able to format an e-mail containing all of the implied but unspoken information as well as the spoken information and dispatch it to Bill who is a person the system is aware that the user is familiar with. If there is more than one "Bill", the system should ask the user which one they mean. In this particular instance the spoken information was the selection of a show time, and the system needs to format the e-mail message correctly using both the spoken information and the unspoken, but implied information. So the content of the e-mail would be "Bill, Susan wants to know if you would like to go to - the system needs to insert the movie title here which was some of the unspoken but implied information - at the - here it inserts the name of the theater from the assumed information - at - here it inserts the unspoken but selected information of the time of the show - . Then the system includes Susan's default contact information, which might be an e-mail address or a twitter etc. and sends the e-mail out. Alternately the system might use one of the more immediate messaging tools instead of e-mail, but the information sent over is the same.
The videos imply a great deal of detailed knowledge about the user's environment and daily routines being captured and properly structured within the system's data/database. If this system were to be used in a home where there is a family, the system needs to be configured in a way that prevents children from accessing information or privileges that must be constrained to the adults. So on this system security is going to be a major issue. Especially since humans who are not registered users of the system could enter the bathroom and be misrecognized as valid users with adult access privileges. Imagine the snoopy neighbor, or someone from work asking to review your medications!
"Oh look, Susan is taking a an anti-psychotic medication!"
This idea clearly has enormous potential to improve people's lives and make them more productive. The idea of a software "personal agent" is something those of us with weaker organizational skills would kill to get a hold of.
The ways the agent manipulates the data has to be extremely open and flexible. In contrast, the closed design that Microsoft's e-mail product, Exchange has, is much too restricted and limited. Part of making this product successful includes making sure that the personal agent has the ability to interoperate smoothly with any potential outside data source or resource management utility. A resource management utility could be someone else's calendaring system. The personal agent needs to be able to negotiate and agree to a date and time with that external calendar system on behalf of their user. A software tool which can only interoperate with software tools from the same software manufacturer can't make it in a world where the most successful smart phone models are from other manufacturers: the iPhone and the "Android" smart phone platform from many companies. This means a significant change to Microsoft's "lock everyone else out!" design approach. If Microsoft can make that philosophical change, that forces the rest of the market to also try to make sure their software will interoperate with other "personal agent" type utilities. No personal agent software product is likely to be successful unless *All Companies* personal agent software products can interoperate with each other.
Because the hardware platforms available as personal computing devices for people, "smart phones", and iPads, and iPads competitors, are diversifying explosively, Microsoft's traditional integrated software silo approach is going to start causing them more economic harm than advantage. Yes Microsoft is still the big player, but the way people use computing is changing and in order to stay the big player that it is, Microsoft will need to change its approach to be more interoperable, and play nicely with other systems.
Here is one interaction scenario: Susan is in the bathroom and wants to review her schedule.
"Mirror mirror on the wall, which Prince is going to take me to the Ball?"... Oh never mind, that's from the script I sent to Disneyâ¦
Ok, let us return to Susan.
"Mirror what's coming up?"
"You have one appointment tomorrow, and you are meeting Bill on Saturday at 2 o'clock for the movie "The Way You Were".
"Next week you have a doctors appointment with Dr. Williams on Wednesday at 10 AM." You also have a meeting with Karen Johnson on Wednesday at 9:30 AM."
Susan realizes that the 930 meeting with her boss definitely conflicts with her doctors appointment.
"Mirror, cancel my appointment with Dr. Williams and reschedule it for the earliest time they have open, that I'm available for."
The personal agent contacts the doctors offices scheduling agent over the Internet, and sends over a request to move Susan's appointment to the next available time with Dr. Williams. The doctor's office agent cancels Susan's existing appointment, and sends back a list of times that Dr. Williams is available. Susan's personal agent parses the list of times and finds the 1st time slot where Susan is also available. Since the personal agent system is aware of travel time issues it examines Susan's schedule both before and after the time of the doctors appointment to make sure that there time for Susan to travel to the doctor's office from her workplace and to travel back after the appointment. If the agent detects there may not be enough time, the user agent can ask the Susan about the situation, and Susan can direct the user agent as to what she wants done.
"Susan there is a 10 o'clock appointment with Dr. Williams available on Thursday, but the travel time from your office to the doctor's office is 35 min. and you only have 30 min. available after the morning staff meeting to get to the appointment."
"Mirror, okay book me that time with Dr. Williams" Susan knows that the staff meetings tend to break up earlier than the time they are scheduled to end at. And besides if the meeting goes long, she'll have an excuse to leave! :-)
Of course this level of intelligent object manipulation and interaction with the human has been the Grail of personal productivity systems forever. In a sense what's being described here is the perfect administrative assistant who is there, 24x7 and never tired and never forgets anything. Unfortunately this can never happen unless all of the software vendors who build resource management software like calendars and personal schedulers and bill paying systems etc. all have the same interoperability protocol as part of their tool.
As you can see from the length of this posting, and, from the probably embedded miss phrasings , I not only want a speech interface system to a personal agent, I am using speech recognition software to generate this post.
[aside]:One of the issues with speech recognition software is that using it changes the way you compose written work. When I am using speech recognition software I tend to generate rather long, run-on sentences, whose logical structure is difficult to parse. This is just one of the side effects of using speech recognition.