An anonymous reader writes: Imagine it’s the 1950s and you’re in charge of one of the world’s first electronic computers. A company approaches you and says: “We have 10 million words of French text that we’d like to translate into English. We could hire translators, but is there some way your computer could do the translation automatically?”
At this time, computers are still a novelty, and no one has ever done automated translation. But you decide to attempt it. You write a program that examines each sentence and tries to understand the grammatical structure. It looks for verbs, the nouns that go with the verbs, the adjectives modifying nouns, and so on. With the grammatical structure understood, your program converts the sentence structure into English and uses a French-English dictionary to translate individual words.
For several decades, most computer translation systems used ideas along these lines — long lists of rules expressing linguistic structure. But in the late 1980s, a team from IBM’s Thomas J. Watson Research Center in Yorktown Heights, N.Y., tried a radically different approach. They threw out almost everything we know about language — all the rules about verb tenses and noun placement — and instead created a statistical model.