As far as voice commands go, people are inconsistent too! someone from the deep south is going to have a bit of trouble talking to someone from the UK. Especially when using slang and idioms. People from different age groups say things differently as well!
Hell, even when people are from the same area and are same age they'll need to hear something more than once before they can make sense of what's being said.
It goes well beyond just accents though. phrasing, tone, cadence, context and a bunch of other things all play important roles in voice communication.
I believe this means there can't be a "one size fits all" speech recognition system that is ready to accept input from everyone equally accurately. Speech recognition will need to take all of these factors into account to /some/ degree in order to approach the accuracy of people. Pure statistical modeling of sounds matched to commands won't cut it because even with perfect microphones in silent rooms dissimilar phrases will sound alike unless you know who it's coming from.