Seriously: who or what interest does the state imagine it is "protecting" with this license? It isn't there for practical purposes, it's there for the purposes of intimidation and control.
The license is required to make sure dancers receive proper training and are pleasing to the eye! Imagine walking into a strip bar only to see some naked guy wobbling around his man breasts.
That's funny, you see I have Windows 7 in VM here on my MacBook Pro and can't stop cussing about text navigation on windows. On a Mac every text entry behaves the same: arrow keys move the cursor one character, option + arrow key moves one word, cmd + arrow key moves to the start and end of a line. It's pretty much hard-coded into my muscle memory by now and I can rely upon every text-entry behaving this way. Same with text selection via mouse: click once & keep pressed places cursor and selects characters, double click & keep pressed selects word under mouse and whole words after that, triple click for paragraphs.
It's not so much the actual interface combo but consistency that drives me mad on Windows, e.g. triple clicking in Visual Studio deselects the text-selection rather than to select the whole paragraph or line and it's different with every other application - you just cannot rely upon it!
Meant to say SISR, not SSML.
It would be perfectly reasonable to send the SRGS grammar along with the voice data. It would even help Google with speech recognition as the search graph (assuming HMMs here), would be way smaller as opposed to those employing a full-blown dictionary grammar. Not accepting a grammar and only returning an N-Best list makes it pretty much unusable for anything non-trivial. What happened to all those concepts developed as part of EMMA/VoiceXML? It seems like the Web Speech API ignores everything that came before and went for the most naive approach.
I don't know whether this has been mentioned before, but the big problem with Google's approach is that it won't allow me to define a formal grammar as the "set of things the user might reasonably say". Dictionary recognition, as is employed here and on the Android phones, has the big disadvantage that I would need some kind of natural language understanding on the (already error-prone result) for anything but dictating text.
It is in essence a projection of voice to an N-Best list of recognition results. No if I could specify a grammar (e.g. per SRGS), I could have semantic annotation per SSML and use voice to actually control an application.
"Virtual" means never knowing where your next byte is coming from.