Comment Re:Data dominates methods (Score 1) 387
The big name in MT from IBM who fired linguists may have hired them for a wrong purpose.
Your assertion omits a fundamental point: Translation Machines ONLY work great within uniform and specialized fields of knowledge.
This is because human specialized languages behave closely to programming languages, so they are more likely to be processed by computers.
Moreover, technical writers have learned how to write for the machine: they try to isolate particular and recurrent actions and express them always with the same isolated sentence.
An example of that kind of sentence: "Click with the right button of the mouse". In a normal text, this sentence could be easily inserted into a bigger sentence, or subjected to some variation. But technical writers know that they must make a single invariant sentence, finished by a dot, to suit the machine skills.
They try to make idioms behave like programming languages do, like context-independent languages.
But don't forget that even in specialized fields, MTs must be used carefully because of the neologisms, that can reach a great percentage of the lexicon (almost the half) in high tech fields.
Right?
So, does the corpus-based statistics work here?
Believe me, if you translate manuals of equipment that threaten their user's life (the only that MUST be translated according to the international law), and you keep thinking only in terms of corpus-based statistics, you should be fired immediately! On the contrary you'll kill people!
MTs do have good performance when used with restrictions, for texts that behave closely to the formal languages used to program them.
Corpus-based statistics applies in some specialized contexts, to the part of an idiom that can be reduced to formal languages, which is rather narrow.
Anyway, I think the point here is somewhat different:
http://science.slashdot.org/comments.pl?sid=594853&cid=23940075
Regards,
Netpolyglot