I've actually tried out the vlingo application a couple of times, and the speech recognition is surprisingly good. They trained the system on a vast number of business names and addresses (easily over a million), and thus the application of vlingo I used was for "point of interest" queries in mobile search. When their CTO said "find me a Starbuck's in " and it worked, I naturally wanted to test it on other more odd queries. Even though the server-based recognition had adapted itself for the CTO's voice (based on the caller id information of his phone), I tried "find me Caribou Coffee in Wheaton Illinois" and it got it word for word. I tried a couple more place queries and even one that was fictitious but plausible, and it worked fine: their system is not based on a fixed speech grammar outlining all possible expected utterances, but a much more flexible statistical approach based on phoneme lattices.
Voice input seems very appealing for mobile search when you contrast it to keypad entry. This study
of a million Google Local Mobile queries showed that it took 56-63 seconds -- a full minute! -- to enter an average query by 12 key keypad, and about half that to enter the query via a PDA with a stylus and virtual keypad. So if a speech recognition interface that does it 2-3 seconds is a huge win if the accuracy is high enough for most users. I feel vlingo is at least tantalizingly close to this level of accuracy.
You can get a feel for a similar system by trying out Google's free 1.800.GOOG411, to see how it works for you.