A couple of things recently really highlighted why machine translation is going to be awkward and clumsy for years to come, and why even human translation is so damned difficult when you get into colloquialisms and jokes.
A couple of weeks ago, I was with a friend in Wichita and we were at a Mexican restaurant. He mentioned he'd seen a Mexican movie (ÂY tu mamÃ tambiÃ©n?) - a movie subtitled in English, and that some of the audience was getting a laugh out of *something*. He wondered why - was it just a bad translation? Probably not, I answered. Probably a play on words or a double meaning that just doesn't translate to English, or perhaps something cultural (for instance, there are jokes that are funny in Britain but would leave Americans thinking "uh?" due to cultural differences, and vice versa - despite the shared language).
Today I came across one of these. There's this geek comic strip made in Spain called TiraEcol. It's translated into many languages (and I don't know how the attempts to translate it worked in others) - but the English translation just didn't work, and I can't think of any way of actually translating the play on words so it works in English. The original Spanish is here:
And the English here.
The last frame will leave the English speaker thinking "Uh?"
But in Spanish, you say your computer crashed by saying it's hung. Furthermore, in Spanish, the personal pronoun is almost always dropped - so it could be "it hung", "he hung" or "she hung". In Spanish, if you want to say "It crashed", you say "Se ha colgado". If you want to say "she hung herself", you say "Se ha colgado". So you have the double meaning for the joke in Spanish, but which is lost in the English translation - Nano responds "What, the program or the girl?" which doesn't really work for "Uh oh, crash".
Indeed, the dropping of pronouns means that machine translation from Spanish to English generally results in something ugly. A human being knows whether someone's talking about "he", "she" or "it" from context, and with the verb conjugation in Spanish, a human doesn't need the pronoun to understand what's going on, because we already grasp the context from what happened earlier. But this is highly problematical for a computer, and quite often the machine translation will guess completely wrong whether the thing in the sentence is a "he, she or it". Also the pronoun for the indirect object is the same for "him, her and it", and again, machine translation frequently picks the wrong one when translating to English. (I can only imagine how tough it will be for languages which come from cultural bases significantly different from ours, such as Japanese or Chinese). Translations have been getting better, especially for things written formally, such as news or technical items, but they will continue to suck for a very great deal of time for informal writing or speech.
So don't use the excuse "oh, we'll have good machine translation soon" as an excuse for not learning a language, at least not for the next three or four decades