Romancing The Rosetta Stone

Catch up on stories from the past week (and beyond) at the Slashdot story archive

Romancing The Rosetta Stone 486

Posted by Hemos on Monday July 28, 2003 @11:48AM from the cool-story dept.

Roland Piquepaille writes "Not only this news release from the University of Southern California has a fantastic title, it also has a great content. This story is about one of their scientists, Franz Josef Och, whose software ranks very high among translation systems. "Give me enough parallel data, and you can have a translation system for any two languages in a matter of hours," said Dr. Och, paraphrasing Archimedes. His approach relies on two concepts, gathering huge amounts of data, and applying statistical models to this data. It completely ignores grammar rules and dictionaries. "Och's method uses matched bilingual texts, the computer-encoded equivalents of the famous Rosetta Stone inscriptions. Or, rather, gigabytes and gigabytes of Rosetta Stones." Read my summary for more details."

This discussion has been archived. No new comments can be posted.

Romancing The Rosetta Stone

Load All Comments

Search 486 Comments Log In/Create an Account

Comments Filter:

Article text (Score:4, Informative)

by Anonymous Coward writes: on Monday July 28, 2003 @11:51AM (#6551246)

Romancing the Rosetta Stone

'Give me enough parallel data, and you can have a translation system in hours'

University of Southern California computer scientist Franz Josef Och echoed one of the most famous boasts in the history of engineering after his software scored highest among 23 Arabic- and Chinese-to-English translatio systems, commercial and experimental, tested in in recently concluded Department of Commerce trials.

"Give me a place to stand on, and I will move the world," said the great Greek scientist Archimedes, after providing a mathematical explanation for the lever.

"Give me enough parallel data, and you can have a translation system for any two languages in a matter of hours," said Dr. Och, a computer scientist in the USC School of Engineering's Information Sciences Institute.

Och spoke after the 2003 Benchmark Tests for machine translation carried out in May and June of this year by the U.S. Commerce Department's National Institute of Standards and Technology.

Och's translations proved best in the 2003 head-to-head tests against 7 Arabic systems (5 research and 2 commercial-off-the-shelf products) and 14 Chinese systems (9 research and 5 off-the-shelf). In the previous, 2002 evaluations they had proved similarly superior.

The researcher discussed his methods at a NIST post-mortem workshop on the benchmarking held July 22-23 at Johns Hopkins University in Baltimore, Maryland.

Och is a standout exponent of a newer method of using computers to translate one language into another that has become more successful in recent years as the ability of computers to handle large bodies of information has grown, and the volume of text and matched translations in digital form has exploded, on (for example) multilingual newspaper or government web sites.

Och's method uses matched bilingual texts, the computer-encoded equivalents of the famous Rosetta Stone inscriptions. Or, rather, gigabytes and gigabytes of Rosetta Stones.

"Our approach uses statistical models to find the most likely translation for a given input," Och explained

"It is quite different from the older, symbolic approaches to machine translation used in most existing commercial systems, which try to encode the grammar and the lexicon of a foreign language in a computer program that analyzes the grammatical structure of the foreign text, and then produces English based on hard rules," he continued.

"Instead of telling the computer how to translate, we let it figure it out by itself. First, we feed the system it with a parallel corpus, that is, a collection of texts in the foreign language and their translations into English.

"The computer uses this information to tune the parameters of a statistical model of the translation process. During the translation of new text, the system tries to find the English sentence that is the most likely translation of the foreign input sentence, based on these statistical models."

This method ignores, or rather rolls over, explicit grammatical rules and even traditional dictionary lists of vocabulary in favor of letting the computer itself find matchup patterns between a given Chinese or Arabic (or any other language) texts and English translations.

Such abilities have grown, as computers have improved, by enabling them to move from using individual words as the basic unit to using groups of words -- phrases.

Different human translators' versions of the same text will often vary considerably. Another key improvement has been the use of multiple English human translations to allow the computer to more freely and widely check its rendering by a scoring system.

This not coincidentally allows researchers to quantitatively measure improvement in translation on a sensitive and useful scale.

The original work along these lines dates back to the late 1980s and early 1990s and was done by Peter F. Brown and his colleagues at IBM's Watson Research Center.

Much of the improvement and
Read the rest of this comment...

Share
twitter facebook
- - Article text (in Babel-German-back-to-English) (Score:5, Funny)
    
    by Wraithlyn ( 133796 ) writes: on Monday July 28, 2003 @04:17PM (#6553414)
    
    I just had to. Besides, I think it's proving a point, or something.
    
    --
    
    Romancing of the Rosetta stone
    
    ' you give me sufficient parallel data, and you can have translation a system in the hours '
    
    University southern California of the computer scientist Franz Josef, which Och of most famous against-resounded, praises itself in the history of the technology, after its software counted the Arab strongly under 23 and Chinese English translatio systems, commercially and experimentally, examined inside in recently concluded Ministry of Trade of attempts.
    
    "you indicate a place to me to the location, and I shift the world,", after to to order a mathematical explanation for the lever said the large Greek scientist Archimedes place.
    
    "you give me sufficient parallel data, and you can have translation a system for all possible two languages in an affair of hours,", said Dr. Och, a computer scientist in the USC school of the institute for information science of the technology.
    
    Och spoke after the benchmark tests 2003 for the machine translation, which was accomplished in the May and June of this yearly by the National Institute of Standards and Technology United States of the trade department.
    
    Translations Ochs examined well into the 2003 head ton head tests against 7 Arab systems (5 research and 2 commercial away dregal products) and 14 Chinese systems (9 research and 5 from stock). In preceding 2002 evaluations had examined it similarly superior.
    
    The researcher discussed his methods held at a NIST Postmortemseminar over the Benchmarking July 22-23 of John Hopkins at the university in Baltimore, Maryland.
    
    Och is an outstanding exponent of a newer method of using the computers to touch in order to translate a language into other one, which became more successful in the last years, while the ability of the computers grew, large bodies of the information, and the volume of the text and the brought together translations in the digital form has, on (for example) multilingual newspaper or government net places of assembly explodes.
    
    Method Ochs uses brought together bilingual texts, the computer-coded equivalents of the famous Rosetta descriptions of stone. Or rather gigabytes and gigabyte Rosetta of stones.
    
    "our approximation uses statistic models, in order to find the most probable translation for a given entrance," Och avowedly
    
    "it is rather different to the older, symbolic approximations for the machine translation, which in most existing the commercial systems is used, which try, to code the grammar and the encyclopedia of a foreign language in a computer program the grammatical structure of the strange text analyzed, and produced then English, which on hard guidelines," it is based, continued.
    
    "employs, explaining from the computer, how one, we left it it out explains translated. First we draw the system it with a parallel korpus i.e. an accumulation of texts in the foreign language and their translations into English.
    
    "the computer uses these information, in order to co-ordinate the parameters of a statistic model translation of the process. During the translation of the new text, the system tries to find English sentence which is the most probable translation strange entrance of the sentence, be based in these statistic models."
    
    This method ignores or rolls over rather, finds express grammatical guidelines and even traditional dictionary lists of the vocabulary in favor of leaving the computer matchup samples between given Chinese or Arab (or any another language) texts and English translations.
    
    Such abilities grew, while computers improved, by making possible for them, from using the individual words as the fundamental unit on using the groups of words to move -- cliches.
    
    Versions of the different human translators of the same text change frequently considerably. Another key improvement was the use of repeated English human translations to permit the computer too its transmission by an ana
    Read the rest of this comment...
    
    Parent Share
    twitter facebook
Let me know (Score:5, Funny)

by gazuga ( 128955 ) writes: on Monday July 28, 2003 @11:51AM (#6551249) Homepage

when it's in the form of a fish, and can fit in my ear...

Share
twitter facebook
- - Re:Let me know (Score:3, Funny)
    
    by Cruciform ( 42896 ) writes:
    
    That's not a real Babelfish though, it's just a Beta [aol.com]
Great summary (Score:3, Insightful)

by spectasaurus ( 415658 ) writes: on Monday July 28, 2003 @11:53AM (#6551266)

You know, it's not really a summary when you just delete half the article.

Share
twitter facebook
DARPA (Score:2, Insightful)

by BlackHawk-666 ( 560896 ) writes:

That reference to DARPA has me a little worried about the sort of uses this technology will be put to. I wonder, are the CIA trying to shore up holes in their translation abilities (particularly for Arabic/etc) by using software. What happens when you pair this technology up with the Echelon project? Are we going to see a dramatic rise in the ability of the government to spy on nationals and particularly foreign nationals now?
- Re:DARPA (Score:5, Insightful)
  
  by Abcd1234 ( 188840 ) writes: on Monday July 28, 2003 @12:04PM (#6551367) Homepage
  
  Oh please... so many conspiracy theories. You do realize that the *internet* was originally developed by DARPA, right? My point: DARPA does a lot of work... not all of it revolves around spying on or otherwise taking away the rights of American citizens.
  
  Parent Share
  twitter facebook
  - Re:DARPA (Score:5, Insightful)
    
    by wwest4 ( 183559 ) writes: on Monday July 28, 2003 @12:30PM (#6551571)
    
    well, not EVERY bottle of beer at the duff plant has a nose or hitler's head in it, but i'm glad the inspector is tasked to look at every single bottle.
    
    just because government abuse isn't guaranteed doesn't mean we shouldn't vigilantly examine the possibilities when we see them.
    
    it's all boils down to balancing powers of government and freedom of individuals, and this country (USA) was founded upon principles intended to favor the rights of individuals. i'll go out on a limb and make a value statement - that's the way to go. power to the people, man!
    
    Parent Share
    twitter facebook
- Re:DARPA (Score:2, Interesting)
  
  by kmac06 ( 608921 ) writes:
  
  Kneejerk /. response: its a government conspiracy to take away more of our rights.
  
  Kneejerk /. mod response: he's right.
- Re:DARPA (Score:2)
  
  by cybercuzco ( 100904 ) writes:
  
  Well I dont have to worry, I dont speak arabic OR chinese!
Imagine a beowulf cluster of these... (Score:2, Funny)

by mjmalone ( 677326 ) * writes:

No really... what if it used a shared database and there were hundreds, or thousands, of the systems around the world... Seems like it could become a pretty sophisticated system. And maybe one day it will be available in the form of a small fish which you place in your ear?
Oh god... (Score:4, Funny)

by gerf ( 532474 ) writes: on Monday July 28, 2003 @11:55AM (#6551278) Journal

The uber-geeks are going to have a field day with Klingon...

Share
twitter facebook
- Re:Oh god... (Score:4, Funny)
  
  by laughing_badger ( 628416 ) writes: on Monday July 28, 2003 @12:04PM (#6551369) Homepage
  
  Yay! We can finally finish translating all of Shakespear into English.
  
  Parent Share
  twitter facebook
  - - Re:Oh god... (Score:5, Funny)
      
      by Jeremi ( 14640 ) writes: on Monday July 28, 2003 @01:43PM (#6552270) Homepage
      
      I'm pretty sure you can have your throat slit for saying "Yay!" near a Klingon. Do be careful. ;)
      
      Having your throat slit is nothing compared to what Klingons do to people who put smiley-faces in their text messages...
      
      Parent Share
      twitter facebook
      - Re:Oh god... (Score:5, Funny)
        
        by daeley ( 126313 ) * writes: on Monday July 28, 2003 @01:51PM (#6552346) Homepage
        
        Having your throat slit is nothing compared to what Klingons do to people who put smiley-faces in their text messages...
        
        You're telling me! My emoticons used to have noses! Now look:
        
        :(
        
        Such a tragedy.
        
        Parent Share
        twitter facebook
The Law of Eventuality (Score:3, Insightful)

by Speare ( 84249 ) writes: on Monday July 28, 2003 @11:56AM (#6551290) Homepage Journal

"Give me enough" is a key element of the Law of Eventuality. Give me enough money, and I'll solve the Microsoft monopoly threat with a hostile takeover. Give me enough time and I'll clean up almost any unnatural disaster site by leveraging nature's own methods.
Give me enough simulated neurons and enough truisms and I'll make a sentient machine.
Eventually, with enough resources, anything is possible. Throwing more time and resources to a problem is rarely exciting science. Reducing the inconveniently large values of 'eventually' and 'enough' are the real problem.

Share
twitter facebook
- Re:The Law of Eventuality (Score:3, Insightful)
  
  by Abcd1234 ( 188840 ) writes:
  
  Err... how is this interesting or insightful? It's barely related to the discussion! If what you're is referring to is the large corpus of paired texts they inject into the system, you've completely missed the point.
  
  The cool science here is in the advancements in their statistical model and new techniques they've developed for "scoring" translations in order to improve their output. In addition, they've also demonstrated the ability to statistically translate whole phrases effectively, rather than indiv
  - - Re:The Law of Eventuality (Score:3, Funny)
      
      by Abcd1234 ( 188840 ) writes:
      
      And that's not what's being done, which is why there is interesting science going on here, hence the poster not understanding what the press release is actually about.
- Re:The Law of Eventuality (Score:2)
  
  by TopShelf ( 92521 ) * writes:
  
  OT, I know, but how would a hostile takeover solve the "Microsoft monopoly threat"? Sounds like one giant replacing another...
- Give me enough Slashdot antries... (Score:5, Funny)
  
  by Pac ( 9516 ) writes: <paulo...candido@@@gmail...com> on Monday July 28, 2003 @12:45PM (#6551702)
  
  ...and I will make pseudo-insightful comments based on the headline text without reading any of the source articles, until my karma is excellent?
  
  Parent Share
  twitter facebook
The vodka is strong but the meat is rotten (Score:5, Interesting)

by zptdooda ( 28851 ) writes: <deanpjm@gm a i l . com> on Monday July 28, 2003 @11:59AM (#6551322) Journal

That's an example from a few years' back of an attempt to translate "the spirit is willing but the flesh is weak" from English to Russian and back to English using a different translator.

Can anyone try this on the new (or some other recent) algorithm?

BTW here's Doc Och's most recent website:

Franz Josef Och [isi.edu]

Share
twitter facebook
- Re:The vodka is strong but the meat is rotten (Score:2)
  
  by mjmalone ( 677326 ) * writes:
  
  translated to russian using systran [systranbox.com] and back using babelfish [altavista.com] I got "spirit is willingly ready but flesh it is weak"
- Re:The vodka is strong but the meat is rotten (Score:5, Insightful)
  
  by rossz ( 67331 ) writes: <ogre@@@geekbiker...net> on Monday July 28, 2003 @12:13PM (#6551427) Journal
  
  That particular phrase translated badly because they used a word-for-word translation program. You simply can't do that, especially when dealing with euphenisms. This new system is the only possible way that could properly translate text.
  
  My wife is a professional translator and has absolutely no respect for machine translatations.
  
  Parent Share
  twitter facebook
  - - Re:The vodka is strong but the meat is rotten (Score:3, Insightful)
      
      by bogado ( 25959 ) writes:
      
      I doubt computers will ever get near a good translator, shure it can make some people lose their jobs translating math thesis, but a book, play, movies or even conversation have to use humans. Humans are the only thing that can realy understand what is going on, human translator (good ones) knows about the culture of both countries that it is translating. It can understand the subtext and change the words so they have the same subtext in the other language.
      
      A good book has many things to be learned that are
    - Re:The vodka is strong but the meat is rotten (Score:4, Insightful)
      
      by rossz ( 67331 ) writes: <ogre@@@geekbiker...net> on Monday July 28, 2003 @02:20PM (#6552581) Journal
      
      Because they suck, of course. She uses computers to assist her. It's just a tool. Just as you can't expect a wrench to rebuild your transmission, you can't (currently) expect a computer to create a proper translation. That will change in the future (as this article shows).
      
      Currently, computer translations work the best in technical documents and the worse in prose (stinking turd horribly bad quality translations).
      
      BTW, computer translations has never been any kind of competition for work. These days, competition is from untrained college students in Central Europe. All too often a Romanian student who "knows Hungarian" bids a couple of pennies per word, far under the going rate and far too little for my wife to consider as reasonable pay. The resultant translation sucks, but that's to be expected from someone who not only isn't trained as a translator, but also doesn't not have a good command of either languages in question (Hungarian and English).
      
      Oops, I started ranting.
      
      Parent Share
      twitter facebook
    - - A plan for translation? (Score:4, Insightful)
        
        by Jeremi ( 14640 ) writes: on Monday July 28, 2003 @01:50PM (#6552334) Homepage
        
        Actually this system reminds me a lot of the good old Bayesian Spam detector algorithms... but instead of trying to determine what category of content an email contains, the statistical classifier is trying to determine (e.g.) what English phrase a Russian phrase most closely matches.
        
        Given the impressive progress made by Bayesian algorithms in spam detection, I wouldn't be surprised to see impressive results from this method either.
        
        So bravo for Franz Och! He's taken what appeared to be an intractible problem requiring magic AI to solve, and perhaps found a way to solve it effectively using the stupid brute force methods computers are so good at.
        
        Parent Share
        twitter facebook
- "The vodka is strong, but the meat is rotten" (Score:5, Funny)
  
  by quantum bit ( 225091 ) writes: on Monday July 28, 2003 @12:14PM (#6551444) Journal
  
  You know, that actually does sound like something that would be a Russian aphorism...
  
  Parent Share
  twitter facebook
- Re:The vodka is strong but the meat is rotten (Score:3, Insightful)
  
  by Abcd1234 ( 188840 ) writes:
  
  Heh, given this is a not-uncommon phrase in the English language, it very well may be in their English-to-target-language corpus, meaning it could end up being a straight lookup-and-translate operation. Which is, of course, one of the advantages of a system like this (you can translation common idioms without having to analyze the text itself).
- Re:The vodka is strong but the meat is rotten (Score:3, Funny)
  
  by iastor ( 302585 ) writes:
  
  Let's see what google has to say:
  
  English: The spirit is willing but the flesh is weak.
  
  German: Der Geist ist bereit, aber das Fleisch ist schwach.
  back: The spirit is ready, but the flesh is weak.
  
  French: L'esprit est disposé mais la chair est faible.
  back: The spirit is laid out but the flesh is weak.
  
  Italian: Lo spirito è disposto ma la carne è debole.
  back: The spirit is arranged but the meat is weak person.
  
  Portugese: O espírito é disposto mas a carne é fraca.
  back
- Re:The vodka is strong but the meat is rotten (Score:5, Interesting)
  
  by JJ ( 29711 ) writes: on Monday July 28, 2003 @06:31PM (#6554347) Homepage Journal
  
  This actually is a myth. That particular text and translation was taken as anecdotal in a 1964 report. I did a masters thesis on MT at the University of Chicago and my advisor (once a major figure in MT) refused to approve my thesis until I got that statement correct.
  
  Parent Share
  twitter facebook
Finally, the correct approach (Score:4, Interesting)

by tuxlove ( 316502 ) writes: on Monday July 28, 2003 @11:59AM (#6551328)

I believe that using a statistical approach like this is a step in the right direction. Manually building sets of rules, dictionaries, etc., is a waste of time and hard to do. And manuall-built systems become stale as languages evolve, unless a lot of continuing work is done.

For me the holy grail is when I can converse with a computer meaningfully. I believe a similar approach will be required for the computer to "understand" language, and to be able to formulate a coherent and appropriate response.

Share
twitter facebook
Doc Och? (Score:2)

by securitas ( 411694 ) writes:

Isn't the Doc [mit.edu] supposed to be in the next Spiderman movie?
Am I the only one who thought Star Trek? (Score:2)

by Alkarismi ( 48631 ) writes:

Universal translator anyone?

Er, aging geek embarrasing self again, mutter...
- If you want a universal translator... (Score:4, Interesting)
  
  by flicken ( 182650 ) writes: <flicken-slashdot ... t ['ken' in gap]> on Monday July 28, 2003 @12:20PM (#6551499) Homepage
  
  ...here is a link to the Universal Networking Language [unu.edu] (UNL). UNL is a computer markup language that allows the author of the text to specify how exactly the text should be translated (i.e. what the precise definition of the words in the text are). Taking this specification, a machine is able to produce a readable version of the text in a variety of languages.
  It's not quite done yet, but the system does show promise. Dictionaries have already been created in Spanish, English, German, Japanese, Italian, French and several other languages.
  
  Parent Share
  twitter facebook
Was this article translated? (Score:3, Funny)

by Alton_Brown ( 577453 ) writes: on Monday July 28, 2003 @12:02PM (#6551348)

From the article: his software scored highest among 23 Arabic- and Chinese-to-English translatio systems

Oops - guess we need some more parallel data (or a few more gigs of rosetta stones).

Share
twitter facebook
Damn Babelfish! (Score:5, Funny)

by Zog The Undeniable ( 632031 ) writes: on Monday July 28, 2003 @12:03PM (#6551357)

"Most the bay only of news of the college of southern extremity California it knows an all big contents all there is this emission annular subject, it also there is a RolandPiquepaille and it writes. The Franz taxes where his software height one lyel with lines up between the translation system quite phu the Och and this history are the summary thing their scientist. The Och "it gave the data which is parallel is sufficient in me, it spread out," inside questioning the hour 2 specialties the language which it does not do of the multi Archimedes which is the possibility which there will be a hazard translation system the doctor repulsively it talked. It approach collects the sheep which data is enormous, apply the statistical model in this data a foundation in 2 concepts which it puts. It is complete and the wool of rule lu the dictionary of grammar "the m3ethode of the Och the duplex language original and the Rosetta which agree one equivalent with computer password of noble and wise pebble epitaph adopts. Or, rather, the gigaoctets and pebble gigaoctets of the Rosetta." Detail fact compared to read the hazard my synopsis.
English --> French --> English --> Korean --> English. Of course, it helps that the first sentence is munged anyway ;-)

Share
twitter facebook
Integration (Score:3, Interesting)

by slusich ( 684826 ) writes: <slusich@gm[ ].com ['ail' in gap]> on Monday July 28, 2003 @12:03PM (#6551358)

Sounds like a brilliant idea. Hopefully this is something that could eventually be compacted enough to fit into consumer electronics. It would be great to be able to watch TV from every country without any language barrier!

Share
twitter facebook
- Re:Integration (Score:4, Interesting)
  
  by ahfoo ( 223186 ) writes: on Monday July 28, 2003 @12:39PM (#6551653) Journal
  
  Not to sound arrogant, but I find actually learning another language by watching foreign TV with subtitles in the original language to be even more interesting than watching the dubbed or English subtitled version. It involves commitment to get to the point where you can understand the basics, but there are rewards to making a commitment to learn something new.
  I like the idea of translating sentence by sentence as opposed to grammatically and word for word. I'm sure this guy is right that at some point this will produce reasonably acurate translations in many cases, but multiple languages are one of our greatest treasures.
  I have read that the single most important factor in preventing senile dementia is the difference between those who continue to create novel memories throughout their lives and those who stick to what they have already learned. Learning multiple languages is a wonderful thing and once you get well into it, it is a lot of fun. It certainly increases your options for punning and rhyming and you end up with lots of aliases.
  
  Parent Share
  twitter facebook
Dialects? (Score:3, Interesting)

by dethl ( 626353 ) writes: on Monday July 28, 2003 @12:06PM (#6551381)

How can this system compensate for the different dialects of all of the different languages?

Share
twitter facebook
Well, so? (Score:4, Funny)

by k98sven ( 324383 ) writes: on Monday July 28, 2003 @12:09PM (#6551408) Journal

What is the novelty of this?

It's hardly news that you can always find correlations in two sufficiently large sets of data.

Reminds me of the Steve Martin joke:

"Chicks go for the intellectual types. I figured the best way to impress 'em was to read a lot of books. But hey, do you know how many books there are? Why, there must be, hundreds of them. But I was already a pretty smart guy. I didn't waste my time reading all those books. Heck no.
I read, the dictionary. Hey--I figure it's got all the other books in it."

Share
twitter facebook
Oh, please no... (Score:2)

by Noryungi ( 70322 ) writes:
Another IT masters thinks he can invent a perfect translation system, simply based on 0s and 1s.

I have said it before, on /. and elsewhere, machine translation does not work.

A good translation is based on several non-quantifiable parameters:
1. Context.
2. Grammar.
3. Vocabulary.
4. Nuance.
Example:

"My controller has failed. He is going to be replaced" can mean:
- My HDD controller is dead. I need to replace it, so that my computer can access its hard disks (For the slashdot crowd).
- The financial controller of m
- Re:Oh, please no... (Score:4, Insightful)
  
  by radish ( 98371 ) writes: on Monday July 28, 2003 @12:43PM (#6551682) Homepage
  
  You're right, traditional machine translation is difficult, primarily due to context. However, you're also right that the example you gave is a bad one - in english it only has one meaning (the second one you give). A HDD controller would never have an assigned gender. Of course in German for example, it would (not sure which though - neuter?).
  
  However you're missing what I think is the most important point. If an example is so ambiguous as to confuse an "ideal" machine, it would confuse us too. What you're really saying is "it is possible to write sentences with ambiguous meaning in most languages" - which is of course true. That doesn't however make it impossible to create a machine which is at least as good as a human at translating (and wouldn't that be good enough?). When you read something you interpret it according to a set of learned rules. Obviously there's the basic syntax and vocab, but then you add context like the other clauses in the prose, the identity of the author, the subject matter. We're a long way off getting those concepts into a machine reader, but I would be very hesitant to say we'll never get there.
  
  Besides, the artical is about taking a different approach to the problem - one which should be quite happy with ambiguity. They're looking at essentially pattern matching, so provided your sample data sets include enough info to describe the ambiguity it should have a decent enough chance of working it out.
  
  Parent Share
  twitter facebook
Statistical approach looks promising (Score:5, Insightful)

by TwistedGreen ( 80055 ) writes: on Monday July 28, 2003 @12:11PM (#6551423)

"One of the great advantages of the statistical approach," Och explained, "is that most of the work goes into components that are language-independent. As long as you give me enough parallel data to train the system on, you can have a new system in a matter of days, if not hours."

This statistical method is probably the best approach to computerized translation. It seems to approximate how the human mind will translate a give sentence most efficiently. Language can get awfully complex, and individual words often have, at best, an ambiguous meaning when interpreted alone. One must take into account the context of that word to specify and refine its meaning. This obviously leads to a huge number of permutations to represent a huge variety of thoughts, but the relative size of this number is diminishing as computers become more powerful.

Therefore, instead of playing with messy grammars and sentence structures, we can simply have a catalogue of thoughts as represented by words, and correlate that catalogue with a different set of words to facilitate translation. This software would operate on a deeper level than it would if it operated with the words and symbols themselves. It would utilize a map of the deep structures of language, instead of a map of the less-meaningful words and grammars.

I really like this method, and while it may seem like a brute-force hack applied to translation, the simple fact that languages do not contain elegant patterns must be accepted. It also appears to be a most efficient method, as the simple comparisons involved would bring the speed of translation into realtime.

Share
twitter facebook
- Actually, it operates on a *shallower* level... (Score:4, Informative)
  
  by Jerf ( 17166 ) writes: on Monday July 28, 2003 @04:53PM (#6553647) Journal
  
  This software would operate on a deeper level than it would if it operated with the words and symbols themselves. It would utilize a map of the deep structures of language, instead of a map of the less-meaningful words and grammars.
  
  Actually, as a result, it operates on a shallower level. In fact, it's almost like you wrote this comment for an article in a parallel universe where statistical translation was the norm, and somebody was just now proposing symbolic translation, so much so that it's almost spooky.
  
  This translation technique is so shallow it doesn't even particularly care what languages it works with. In a way, it can't really be said to be "translating" in the traditional sense; it's just correlating phrases with no clue what they are.
  
  Traditional symbolic translation is better described by what you said:
  
  Therefore, instead of playing with messy grammars and sentence structures, we can simply have a catalogue of thoughts as represented by words, and correlate that catalogue with a different set of words to facilitate translation.
  
  Word(/phrase) -> symbol -> word(/phrase) is traditional tranlation. This is word -> word translation.
  
  It's working better because we've had little or no success creating the middle part of the symbolic translation; matching the symbology used in our head has proven impossible to date. This works better by skipping that step, which introduces horrible distortions by forcing the words to fit into an incredibly poor symbology (compared to what we're actually using).
  
  However, in theory, traditional translation should still have a brighter future; this is a hack around our ignorance, perhaps even a good one, but eventually we will want to extract the symbols.
  
  (Incidentally, it's also why this same technique can't be used to match words -> symbols; we don't know how to represent the symbols yet! This kind of technique could eventually potentially be hybridized with something else to attack that problem, but simple, direct application can't result in the complicated relationships between symbols that exist, and we'd want a computer to "understand" those relations before we'd say it was truly translating or understanding English.)
  
  Anyways, just flip your comments around 180 degrees and you're pretty close.
  
  Parent Share
  twitter facebook
translatio? (Score:2, Funny)

by Lady Jazzica ( 689768 ) writes:

University of Southern California computer scientist Franz Josef Och echoed one of the most famous boasts in the history of engineering after his software scored highest among 23 Arabic- and Chinese-to-English translatio systems, commercial and experimental, tested in in recently concluded Department of Commerce trials.

Maybe what Dr. Och should do next is write some software to double-check the work of whoever translates his press releases from the original Latin. The translator seems to have missed a fe
Copyright issues (Score:2, Insightful)

by PhilHibbs ( 4537 ) writes:

I wonder if the resultant translation engine could be considered a derivative work of the texts that populated it. This system is standing on the shoulders of all the translation efforts that went in to it. I think it's a great idea, but in the current IP climate, could well be shot down in flames. How much dual-language text is available in the PD or on open content licence?
A poor analogy, and a poor method (Score:3, Informative)

by jd ( 1658 ) writes: <imipak@ y a hoo.com> on Monday July 28, 2003 @12:15PM (#6551447) Homepage Journal

The Rosetta stone encoded three languages, not two, where two were known in advance. Indeed, there have been many three-way translations of treaties found, now.

The use of three languages is critical. Grammar isn't consistant, and words have multiple meanings. By using two known languages, you can eliminate many of the errors thus introduced, because the chances of some error fitting both known languages in the same way is much smaller.

If you double the number of known languages, you more than quarter the number of errors, because although errors can occur in either or both, they're unlikely to be the same error. Once more information exists, you can re-scan the same text and fill in the blanks.

Me, personally - I'd require four languages, three of which were known. The number of texts required would be considerably smaller and the number of residual errors would be practically non-existant.

They chose two languages for the obvious reason: It's simple. It's easy to find a student who knows two languages. At least, easier than finding one who knows four.

However, the price of simplicity is bad science. The volume of information they require makes their system little better than an infinite number of very smart monkeys with text editors and a grep function. That they're being paid signficant money on such stuff is a joke.

If they offered me the same money (and one of those Linux NetworX clusters) I could have a superior system in a month, although (as stated above) it would require more than one known language.

Share
twitter facebook
- Not to mention.. (Score:4, Interesting)
  
  by k98sven ( 324383 ) writes: on Monday July 28, 2003 @12:47PM (#6551727) Journal
  
  The Rosetta stone itself did not do much in the way of our knowledge of the egyptian language.
  What it did do, was provide insight into their method of writing.
  It was the latter discovery of the the relation between Coptic and Egyptian that revealed most of the actual language.
  
  (IIRC)
  
  Parent Share
  twitter facebook
  - Re:Not to mention.. (Score:4, Interesting)
    
    by LenE ( 29922 ) writes: on Monday July 28, 2003 @02:44PM (#6552765) Homepage
    
    For those who don't know, Coptic is Egyptian written in Greek, or at least the Greek alphabet. It would be similar to transcribing a language that uses glyphs for words by recording them with the phonemes and alphabet of another language.
    A more modern example is what happened with the slavic Croatian language. The original speakers had a glyph based alphabet called Glagolitic [www.hr], through the middle ages. This would be as foreign as Egyptian hieroglyphs to people today, and could stand in nicely for an alien text in any sci-fi movie.
    Through falling under different feudal states (Venice, Austro-Hungary) the language was cast under both the Cyrillic and Roman alphabets. Today Croatian uses an accented Roman alphabet (like French), but each letter has only one pronunciation, like Russian.
    -- Len
    
    Parent Share
    twitter facebook
- Re:A poor analogy, and a poor method (Score:5, Insightful)
  
  by Abcd1234 ( 188840 ) writes: on Monday July 28, 2003 @12:51PM (#6551764) Homepage
  
  If they offered me the same money (and one of those Linux NetworX clusters) I could have a superior system in a month, although (as stated above) it would require more than one known language.
  
  LOL! If this problem was so friggin' easy, why are these researchers the first to demonstrate a working system using this technique (which blows away all existing systems, BTW)? Hell, if it's as easy as you say, this whole "translating text" thing must be a breeze. I wonder why so much money is spent every year on R&D in this area? Hell, why didn't they just hire you to whip up a system in a month?
  
  Why? Because it ain't that easy and you have no idea what you're talking about. Given these are world-class researchers, I'm sure they've considered the multiple-translation route, and subsequently rejected it for very good reasons (likely far more complex than your simplistic "it's easier" excuse). Moreover, the really hard work in this area is the statistical modelling necessary to generate a working system, something which would, I suspect, be far more complex if a multiple-translation route were taken. But, hey, that's just some number crunching, right? What's so hard about that?
  
  Parent Share
  twitter facebook
- Re:A poor analogy, and a poor method (Score:5, Insightful)
  
  by William Tanksley ( 1752 ) writes: on Monday July 28, 2003 @12:59PM (#6551864)
  
  If you double the number of known languages, you more than quarter the number of errors
  
  Your post is reasonable and interesting (using three-way parallelism would give better translations), but you're missing something important here.
  
  First, none of these languages are "known" to this interpreter program. The program reads parallel texts, and when you feed it a text without a parallel, it generates the parallel for you. In other words, it can translate either way. So you don't have two known languages and one unknown; all you have is three text corpuses. (Well, in this case you have two, but you know what I mean.)
  
  Second, yes; three would be FAR better than two; but two is also useful, and in more situations. You don't always have a Rosetta stone.
  
  They're doing well here. Yes, there's an obvious next step to take; but no, the existance of a "next step" doesn't destroy the usefulness of this step.
  
  -Billy
  
  Parent Share
  twitter facebook
- Re:A poor analogy, and a poor method (Score:3, Insightful)
  
  by Draxinusom ( 82930 ) writes:
  
  RTFA. The method described in the article is a purely statistical method, NOT a semantic one; it has zero "knowledge" of grammar, syntax, or meaning. So having more than one "known" language to start with would not help in the slightest, because the advantages that you describe are only applicable to semantic methods.
  
  I agree though that the analogy to the Rosetta Stone is a poor one.
Ranking System (Score:3, Interesting)

by freeze128 ( 544774 ) writes: on Monday July 28, 2003 @12:18PM (#6551480)

Even existing translation programs could benefit from a ranking system. Wouldn't it be helpful if you could tell just how confident the translator is about a certain phrase or word? That way, you could rephrase your sentence before you foolishly ask someone to "taste" you....

Share
twitter facebook
ignoring grammar seems strange (Score:3, Insightful)

by meshko ( 413657 ) writes: on Monday July 28, 2003 @12:20PM (#6551500) Homepage

I understand that this is a cool idea for building automatic translators, but is it practical? Basically what they are doing is taking a well-researched domain of languages and trying to make something new and cool in it by completely ignoring the domain knowledge. My intuition tells me that "always use as much domain knowledge as posssible" is an engineering axiom.

Share
twitter facebook
Several Missing Details (Score:5, Interesting)

by Flwyd ( 607088 ) writes: on Monday July 28, 2003 @12:21PM (#6551512) Homepage

As press releases tend to do, this leaves much to be desired for folks who are familiar with the discipline. As I read it, it seems to imply that the main driver is phrase-matching. What does it do with phrases it hasn't seen before? The problem is solved by throwing lots of data at it -- how much data is needed for a reasonable system? How well does it generalize to text outside the domains of the training data?

Incidentally, had my brother been a girl, he was in serious danger of being named Rosetta Stone.

-- Trevor Stone, aka Flwyd

Share
twitter facebook
Wordrank (Score:2, Interesting)

by chronos2266 ( 514349 ) writes:

I always thought it would be interesting if google applied its page rank algorithm to provide a translation service. Like poll the top 5 translation service sites for a translated sentence and then based on what each of them return, generate a 'average' or best possible result for that sentence.
Translate Pascal To C and Such (Score:4, Interesting)

by Potpatriot ( 684373 ) writes: on Monday July 28, 2003 @12:28PM (#6551564)

How about piping in various algorirhtms encoded in Pascal and C into the thing and seeing what it does to convert arbitrary sources. Where Can I get the soource? Pawel

Share
twitter facebook
What about C++? (Score:5, Funny)

by MobyDisk ( 75490 ) writes: on Monday July 28, 2003 @12:30PM (#6551579) Homepage

So, can I train this program with a bunch of requirements documents, and a bunch of implementations, and have it learn how to code? :-) If so, I think I am obsolete. *poof*

Share
twitter facebook
Programming Languages? (Score:5, Interesting)

by The Raven ( 30575 ) * writes: on Monday July 28, 2003 @12:40PM (#6551659) Homepage

I wonder how this would fare putting two computer languages side by side? I mean... take a few thousand programs, coded using the same algorithms but different computer languages... would his language translation software translate between them? Would it be able to differentiate between languages that manually allocate memory and those that use garbage collection? How about between procedural langauages like C, and more esoteric and oddly structured languages like LISP?

An interesting challenge, eh?

Would there be any benefit to this?

Share
twitter facebook
Scientific Papers (Score:4, Informative)

by acoustiq ( 543261 ) writes: <acoustiq@NoSPAm.softhome.net> on Monday July 28, 2003 @12:50PM (#6551751) Homepage
Being an undergrad hoping to do research in this area in the next few years, I've already read a few of Och's papers and others in the field. Some of the best that I remember are:
- Improved Statistical Alignment Models (2000) - Franz Josef Och, Hermann Ney [nec.com], which investigates and compares several models
- A Syntax-based Statistical Translation Model - Yamada, Knight (2001) [nec.com], which tries to treat sentences structurally instead of just a stream of words
- A Finite-State Approach to Machine Translation - Bangalore, Riccardi (2001) [nec.com], which uses a different way of looking at the problem than usual
Kevin Knight prepared an excellent (if now somewhat outdated) introduction to statistical machine translation that you can see in HTML [jhu.edu] or RTF [jhu.edu] (the formatting was corrupted when the RTF was converted to HTML - I recommend the RTF).
Share
twitter facebook
statistics is the key (Score:5, Interesting)

by gemseele ( 172754 ) writes: on Monday July 28, 2003 @12:53PM (#6551786)

Time for inflamatory reasoning. The statistical approach will beat out the grammar and rule based ones, at least for English, is for the simple reason:

English is not a language

Or rather, it resembles one but is more not than is, IMO. It is a large collection of idiomatic expressions that changes quite rapidly (and not only in colloquial forms, just look at what the political-correctness movement has done to phraseology). You know the story... more exceptions than rules, things that are legitimate to say language-wise are considered incorrect anyways, and vice versa, etc. etc.

That's not to say it doesn't have advantages; it's relatively easy to learn the basics of communication since it's weakly conjugated, has genderless articles, fairly simple uncased sentence structure. But, it is monstrous to master and I suspect most native speakers aren't true masters (not to mention the orthographical nightmare; is English the only language with spelling bee contests?)

The reason it's the new lingua franca (or should it be lingua angla now?) is techno-socio-political as is always the case. Stop harping on Americans for being largely mono-lingual. "Why didn't the Romans learn the local languages when they controlled Europe? Because they didn't have to." If every state spoke a different language, which would be more akin to Europe, then there would be need.

Share
twitter facebook
- Re:statistics is the key (Score:3, Interesting)
  
  by The Cydonian ( 603441 ) writes:
  
  English is not a language... [because it]... is a large collection of idiomatic expressions that changes quite rapidly
  
  Fair enough, English changes rapidly alright, but how would you define a language? A set of logical syntactic and semantic rules that haven't changed for the past few thousand years? I can think of only two languages like that, Latin and Sanskrit.
  Nope, I can't agree with your assertion; language is much more than mere (unchanging) grammar. In many multi-cultural places, it is a strong
- Re:statistics is the key (Score:5, Insightful)
  
  by Jeremi ( 14640 ) writes: on Monday July 28, 2003 @03:20PM (#6553055) Homepage
  
  English is not a language. Or rather, it resembles one but is more not than is, IMO. It is a large collection of idiomatic expressions that changes quite rapidly
  
  You are actually arguing that English is not a dead language. Every language that is actually in use by large numbers of people is as you describe.
  
  Parent Share
  twitter facebook
How's that news? (Score:3, Interesting)

by Yurka ( 468420 ) writes: on Monday July 28, 2003 @02:14PM (#6552532) Homepage

This has already been done some years ago in Canada, where the translation system was fed the complete text of parliamentary debates for umpteen years (required by law to be translated by humans into French, if originally in English, and vice versa). I don't know how it fares when presented with a sample of parliament-speak (I concede, this is not a fair approximation of human language), but it fails miserably on a simple rhyme. Read your Hofstadter, guys.

Share
twitter facebook
Similar to natural learning? (Score:3, Interesting)

by Bodrius ( 191265 ) writes: on Monday July 28, 2003 @04:49PM (#6553617) Homepage

Interesting method.

It seems to me this is more similar to natural learning of a language (usually at a young age) by exposure and immersion, as opposed to scholar learning of a language in classrooms, etcetera.

It shouldn't be surprising that in humans, the first method also works best at acquiring fluency in multiple languages. As a matter of fact, it's the only method through which we come to understand our FIRST language, which is in almost every case the one we command the best.

I think most people get, by consuming huge amounts of information, a feeling of "what sounds right" and "what sounds wrong" that is more effective for them at predicting the unwritten rules and exceptions, both in translations and in original sentence-creation, than memorizing a set of grammar rules which, in the end, are just codifications of the current state of the language.

I don't think the success of the approach means the symbolic methods are pointless for this endeavor, any more than the formal study of languages and their grammars is for human translators.

Professional writers and translators do study such rules to dramatically improve their command of the different languages, and do get much better results.

But it seems to me they are more successful going from "statistical matching with massive real-use data" to "optimized grammar rules matching the data" than going backward, from "scholastic grammar rules" to "consumption of massive data to acquire exceptions, and correct and complement the rules".

What would be interesting, I think, is if one can study the state of the system after it's performing well and extract/deduct grammar rules, algorithmically.

It would be interesting to see the results of a program doing that, collecting (and correcting) the grammar using the data, and using the grammar rules when no match in the dictionaries is found to, say, apply a greater weight to the gramatically-correct choice among the alternatives.

If the results were good with this approach, one could consider decreasing the size of the database as the grammar gains stability. Use that memory for other processes, other languages, or new sample data that could not be examined before.

Share
twitter facebook
- Re:oh oh... (Score:4, Interesting)
  
  by Anonymous Coward writes: on Monday July 28, 2003 @11:53AM (#6551264)
  
  This is exactly NOT a universal translator as it uses matched bilingual texts. You need an already translated text for his system to work.
  
  Parent Share
  twitter facebook
  - Re:oh oh... (Score:2)
    
    by Abcd1234 ( 188840 ) writes:
    
    You need an already translated text for his system to work.
    
    Well, you need a pool of already matched texts. Once you have this for a given language pair, you can immediately start translating (presumably). So, it is "universal" in that, once the system is primed, it can immediately begin translating (ie, you don't have to build grammar rules, dictionaries, etc, etc).
  - Re:oh oh... (Score:2)
    
    by Have Blue ( 616 ) writes:
    
    Sure it is... It's not a magic psychic UT like the one on Star Trek that allows you to instantly converse with an unknown species, but it's "universal" in that it has no internal dependencies on specific languages and could be used to go between any two.
  - - Re:oh oh... (Score:2)
      
      by wwest4 ( 183559 ) writes:
      
      but i think the training only applies to a system for translating between the languages of the datasets used.
      
      so "training" it using parallel texts of japanese and english would produce routines for translating between japanese and english, but not french and english.
- Re:Obsolete? (Score:5, Insightful)
  
  by Surak ( 18578 ) * writes: <surak&mailblocks,com> on Monday July 28, 2003 @11:57AM (#6551304) Homepage Journal
  
  'Almost everyone'? What *are* you talking about? You must be an American. From a recent online Harris poll [harrisinteractive.com], most Americans think at least half the world speaks English. This is just plain wrong. The truth of the matter is that it's more like 20%. That's it. Most people on the NET might speak English, but most people in the world? Hardly.
  
  Parent Share
  twitter facebook
  - Re:Obsolete? (Score:5, Funny)
    
    by DG ( 989 ) writes: on Monday July 28, 2003 @12:02PM (#6551350) Homepage Journal
    
    A man who speaks three languages is trilingual.
    
    A man who speaks two languages is bilingual.
    
    A man who speaks one language is American.
    
    DG
    
    Parent Share
    twitter facebook
    - Re:Obsolete? (Score:2)
      
      by Surak ( 18578 ) * writes:
      
      Heh. A friend of mine from India who worked for GM was fond of that joke. :)
    - Re:Obsolete? (Score:2, Funny)
      
      by notcreative ( 623238 ) writes:
      
      A man who speaks no known language is Dubya.
      I don't think this translation program would be able to deal with his Texan affectations.
    - Re:Obsolete? (Score:2)
      
      by panda ( 10044 ) writes:
      
      I'm American, yet I speak four languages including my native language.
      
      What does that make me a quadrilateral? :-)
    - - Re:Or a "culturally superior" American. (Score:3, Insightful)
        
        by raehl ( 609729 ) * writes:
        
        For starters, we specifically target young people when asking questions where a non-native language will be required. 3-4 of the people were employees, indicating at least a passing knowlege of "What track is this train on?" in a few European languages might be a job-relevant talent. Additionally, the sneer. Attitude is attitude regardless of what country you're in.
        
        We don't expect people to know foreign languages. We *DO* find it amusing when people who are razzing *US* for not knowing THEIR language d
  - Re:Obsolete? (Score:3, Interesting)
    
    by lildogie ( 54998 ) writes:
    
    > Americans think at least half the world speaks English.
    
    Better-informed Americans (a small miniority of the class) would be aware that Spanish is well on the way to becoming the predominant language in the USA.
    
    But, IMHO, English could become the next Latin: the dead language that everybody has to learn if they're going to try and influence the world.
    
    BTW, every "% of humanity" statistic has to consider that most humans are Chinese.
    - Re:Obsolete? (Score:2)
      
      by Verteiron ( 224042 ) * writes:
      
      BTW, every "% of humanity" statistic has to consider that most humans are Chinese.
      
      I've been thinking about this a lot lately. As in, "Should I be learning Mandarin Chinese?" China is rapidly becoming a high-tech nation. There are a lot more of them than there are of anyone else. Frankly, I think the only thing that keeps China from being the single largest world influence is the fact is its government, and that can't last forever. Sooner or later (maybe after a bloody revolution) China is going to become
    - Re:Obsolete? (Score:2)
      
      by cdrudge ( 68377 ) writes:
      
      Right...215.4 million Americans speak only English. 28.1 million Americans speak some form of Spanish (with or without also speaking English). I wouldn't exactly say 13% is "well on the way". English definitely isn't going to be the next Latin. The majority of science, technology, aviation, and computers is English based. It isn't going anywhere soon.
    - Re:Obsolete? (Score:3, Insightful)
      
      by JohnsonJohnson ( 524590 ) writes:
      
      BTW, every "% of humanity" statistic has to consider that most humans are Chinese.
      
      If you want to be even remotely close to statistically significant you have to include citizens of India as well most of whom are very different from those of Chinese descent. . In fact most people will probably be an Indian citizen within the next 20 years [iiasa.ac.at]. However citizens of India are a more heterogeneous population than that of China. Then again, Chinese of the diaspora (eg. in Malaysia, Indonesia, the Philipines,
  - Re:Obsolete? (Score:2)
    
    by kmac06 ( 608921 ) writes:
    
    I think the poll is misleading. More accurate would be:
    Most Americans think at least half the world that matters speaks English.
    American attitude is if you don't speak English, you don't matter :)
- Re:Obsolete? (Score:2, Insightful)
  
  by Anonymous Coward writes:
  
  English may be the closest thing we have to a universally-spoken language, but it certainly isn't going to become the -only- language any time soon, if ever. If all other languages disappeared, though, we would definitely need translation for all the literature we have that isn't written in English.
- Re:Obsolete? (Score:3, Informative)
  
  by ShadeARG ( 306487 ) writes:
  
  Here is Japanese Slashdot [slashdot.jp], and I'm sure there are others.
  - Re:Obsolete? (Score:4, Informative)
    
    by red_dragon ( 1761 ) writes: on Monday July 28, 2003 @12:25PM (#6551530) Homepage
    
    Spanish Slashdot: Barrapunto. [barrapunto.com] It's been around for almost as long as Slashdot itself.
    
    Parent Share
    twitter facebook
- Re:Obsolete? (Score:2, Flamebait)
  
  by timftbf ( 48204 ) writes:
  
  If email, IRC/"chat rooms" *spit* and SMS are anything to go by, a great number of young and not-so-young people who *do* have English as a first language are barely capable of forming even simple sentences in it correctly.
  
  Regards,
  Tim. (Grumpy old man day)
  - unlike you (Score:2)
    
    by DrSkwid ( 118965 ) writes:
    
    English : correctly forming sentences in it I can.
- Re:Obsolete? (Score:2, Funny)
  
  by Anonymous Coward writes:
  
  DARPA actually proposed that a forced conversion to English policy would be more cost effective for the defense department to implement through military invasion than some complicated translation scheme. Hence congress's support for the translation project.
- Old Texts (Score:5, Insightful)
  
  by holygoat ( 564732 ) writes: on Monday July 28, 2003 @12:04PM (#6551363)
  
  Firstly we could consider the enormous body of work currently available in other languages.
  Having this able to be translated into English or other languages could be very valuable for scholars.
  
  Secondly, English is not the primary tongue for the majority of people on the planet - to suggest that because a lot of people can manage to converse in it that the ability to translate between other languages isn't valuable is foolish.
  
  Also note that the article specifically mentions Arabic and Chinese, which I don't think crossed your mind. China has the largest population on the planet, remember.
  
  Translation is far from obsolete, especially given that the majority of the Western world, and especially America, is piss poor at being bilingual.
  
  Parent Share
  twitter facebook
  - Re:Old Texts (Score:3, Insightful)
    
    by OmniVector ( 569062 ) writes:
    
    A friend of mine, Hani, who is from Egypt told me a joke once.
    "What do you call a person that only speaks one language?" A: An american
    
    It's quite true when you think about it. He said in when he was growing up he had a choice between going to a french school or an english school where the given language was tought just as much as arabic. Americans really need to be tought french or spanish at a MUCH younger age (say 5 right as they start kindergarden).
- Re:Obsolete? (Score:2)
  
  by jpkunst ( 612360 ) writes:
  
  Am I the only one who thinks that translation is quickly becoming obsolete?
  
  Almost everyone can speak, read and write at least tolerable english and most young people can have full fledged discussions in it.
  That isn't much help if you want to read (say) De Uitvreter by Nescio [nescio.info] and you don't know Dutch, does it? Or for a slightly more geeky angle, if you want to read Edsger Dijkstra's Dutch texts [utexas.edu]?
  JP
- Re:Obsolete? (Score:3, Insightful)
  
  by GoofyBoy ( 44399 ) writes:
  
  http://www.britishcouncil.org/english/engfaqs.htm# howmany [britishcouncil.org]
  
  Translators are needed for 3/4ths of the world. Not what I would call close to obsolete.
- You don't get it, do you? (Score:2, Interesting)
  
  by mossr ( 72445 ) writes:
  
  ***WHAT THE FUCK ARE YOU THINKING?***
  
  Look, seriously, even if everyone did speak English, there are still tonnes of literary works in other languages - the original texts of the Ancient Greek classics, for example. To read in the original language is often a much more rewarding experience. Besiders, relying on past translations of non-english material can lead to errors. And consider how many different English translations of the Bible there are.
  
  Almost everyone can speak, read and write at least tolerabl
  - Re:You don't get it, do you? (Score:2)
    
    by Shads ( 4567 ) writes:
    
    > Young people are increasingly using SMS
    > and online chat and are actually losing
    > their ability to correctly spell words
    > or write grammatically correct sentences.
    
    This is called language evolution. It's always frowned on and it always happens in the end d00d. There are some factors slowing it down right now, most specifically a lack of teritorial conquest with assimilation of populace.
- - Re:Obsolete? (Score:2)
    
    by Shads ( 4567 ) writes:
    
    True but you could argue that even the english and americans can't speak it well.
- Re:Could help (Score:5, Interesting)
  
  by Abcd1234 ( 188840 ) writes: on Monday July 28, 2003 @12:14PM (#6551438) Homepage
  
  I'm not sure this is really applicable to translating literary works. These kinds of translations require an understanding of the native culture of both the source and target languages, as well as the intent of the writer, in order to generate an understandable translation that the target group can appreciate. A computer translation system like this one is incapable of performing these sorts of analysis.
  
  What this is really good for is on-the-fly translation of material where the reader simply wants to comprehend what was written (think the old babelfish engine). This has obvious applications on the web, as well as many other areas (on-the-fly server-side translation for IM systems, etc, etc).
  
  Parent Share
  twitter facebook
  - Re:Could help (Score:2)
    
    by gerf ( 532474 ) writes:
    
    Hey, i never said it would make a good translation. Just that it could be used.
- Re:A bit of a worry for privacy (Score:5, Insightful)
  
  by bigjocker ( 113512 ) * writes: on Monday July 28, 2003 @12:17PM (#6551472) Homepage
  
  This is a bit of a worry for privacy concerns, given that if I want to keep something secret from the world and private just between me and my intended recipient I have one less option.
  
  If you are using foreign languages or even lexically analyzable scemes to do your encription, you deserve what you get
  
  Parent Share
  twitter facebook
- Re:A bit of a worry for privacy (Score:3, Insightful)
  
  by nanojath ( 265940 ) writes:
  
  It's time for us all to get over the fact that technology is going to end practical privacy. It's a done deal. Cameras and microphones will get smaller and smaller. Translation, electronic selectivity (i.e. snoop anybody transferring bombmaking directions) and tapping of all forms of electronic conversation will get more and more sophisticated. I've no doubt the NSA made PGP its bitch a long time ago. IF they hadn't it would be getting fought a lot harder. Assuming you can get real privacy from someth
- Re:I expect they used many Bible versions (Score:4, Insightful)
  
  by ejdmoo ( 193585 ) writes: on Monday July 28, 2003 @12:28PM (#6551554)
  
  Actually, I think that this may be an interesting way to translate the Bible (assuming you didn't use the Bible itself as a reference...that would skew the translation).
  
  Think about it: every translation of the Bible is always criticized for some reason. If the Bible were translated this way it could be like the Google news of Bible translations: completely independent of human bias and editing.
  
  Parent Share
  twitter facebook
- Re:Where can I download his software? (Score:3, Informative)
  
  by Anonymous Coward writes:
  
  Franz Josef Och homepage is at:
  
  http://www.isi.edu/~och/
  
  There are links to 3 software packages for download.
- - How dare you ask (Score:4, Funny)
    
    by Anonymous Coward writes: on Monday July 28, 2003 @12:57PM (#6551832)
    
    But Can It Do Klingon?
    
    How dare you question the honor of this program! I should kill you where you stand!
    
    Parent Share
    twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Article text (Score:4, Informative)

Article text (in Babel-German-back-to-English) (Score:5, Funny)

Let me know (Score:5, Funny)

Re:Let me know (Score:3, Funny)

Great summary (Score:3, Insightful)

DARPA (Score:2, Insightful)

Re:DARPA (Score:5, Insightful)

Re:DARPA (Score:5, Insightful)

Re:DARPA (Score:2, Interesting)

Re:DARPA (Score:2)

Imagine a beowulf cluster of these... (Score:2, Funny)

Oh god... (Score:4, Funny)

Re:Oh god... (Score:4, Funny)

Re:Oh god... (Score:5, Funny)

Re:Oh god... (Score:5, Funny)

The Law of Eventuality (Score:3, Insightful)

Re:The Law of Eventuality (Score:3, Insightful)

Re:The Law of Eventuality (Score:3, Funny)

Re:The Law of Eventuality (Score:2)

Give me enough Slashdot antries... (Score:5, Funny)

The vodka is strong but the meat is rotten (Score:5, Interesting)

Re:The vodka is strong but the meat is rotten (Score:2)

Re:The vodka is strong but the meat is rotten (Score:5, Insightful)

Re:The vodka is strong but the meat is rotten (Score:3, Insightful)

Re:The vodka is strong but the meat is rotten (Score:4, Insightful)

A plan for translation? (Score:4, Insightful)

"The vodka is strong, but the meat is rotten" (Score:5, Funny)

Re:The vodka is strong but the meat is rotten (Score:3, Insightful)

Re:The vodka is strong but the meat is rotten (Score:3, Funny)

Re:The vodka is strong but the meat is rotten (Score:5, Interesting)

Finally, the correct approach (Score:4, Interesting)

Doc Och? (Score:2)

Am I the only one who thought Star Trek? (Score:2)

If you want a universal translator... (Score:4, Interesting)

Was this article translated? (Score:3, Funny)

Damn Babelfish! (Score:5, Funny)

Integration (Score:3, Interesting)

Re:Integration (Score:4, Interesting)

Dialects? (Score:3, Interesting)

Well, so? (Score:4, Funny)

Oh, please no... (Score:2)

Re:Oh, please no... (Score:4, Insightful)

Statistical approach looks promising (Score:5, Insightful)

Actually, it operates on a *shallower* level... (Score:4, Informative)

translatio? (Score:2, Funny)

Copyright issues (Score:2, Insightful)

A poor analogy, and a poor method (Score:3, Informative)

Not to mention.. (Score:4, Interesting)

Re:Not to mention.. (Score:4, Interesting)

Re:A poor analogy, and a poor method (Score:5, Insightful)

Re:A poor analogy, and a poor method (Score:5, Insightful)

Re:A poor analogy, and a poor method (Score:3, Insightful)

Ranking System (Score:3, Interesting)

ignoring grammar seems strange (Score:3, Insightful)

Several Missing Details (Score:5, Interesting)

Wordrank (Score:2, Interesting)

Translate Pascal To C and Such (Score:4, Interesting)

What about C++? (Score:5, Funny)

Programming Languages? (Score:5, Interesting)

Scientific Papers (Score:4, Informative)

statistics is the key (Score:5, Interesting)

Re:statistics is the key (Score:3, Interesting)

Re:statistics is the key (Score:5, Insightful)

How's that news? (Score:3, Interesting)

Similar to natural learning? (Score:3, Interesting)

Re:oh oh... (Score:4, Interesting)

Re:oh oh... (Score:2)

Re:oh oh... (Score:2)

Re:oh oh... (Score:2)

Re:Obsolete? (Score:5, Insightful)

Re:Obsolete? (Score:5, Funny)

Re:Obsolete? (Score:2)

Re:Obsolete? (Score:2, Funny)

Re:Obsolete? (Score:2)

Re:Or a "culturally superior" American. (Score:3, Insightful)

Re:Obsolete? (Score:3, Interesting)

Re:Obsolete? (Score:2)

Re:Obsolete? (Score:2)

Re:Obsolete? (Score:3, Insightful)

Re:Obsolete? (Score:2)

Actually, it operates on a shallower level... (Score:4, Informative)