This is just current enterprise tech finally making its way into the consumer world.
I've done a lot of work developing technology for language schools, requiring the recognition & reproduction of speech. This is nothing new, it's just speech recognition algorithms being parsed through a translator & then spat back out by a text-to-speech engine. Heck, I even have something like this running on my home Media Centre.
The groundwork has been done by universities & is being improved by both public (the CIA comes to mind) & private sectors. Unsurprisingly, it's big business in the teleconferencing market.
It's not perfect, however it's very different to the challenges presented to the likes of YouTube. A telephone conversation doesn't have problems with background noise & the people using this technology are aware they need to speak more slowly & clearly - a benefit not afforded to movies & cat videos.
The Japanese telecoms company NTT Docomo has been offering this technology to its customers since 2012!