Open-Source Language Translator Opens For Beta 155
mind21_98 writes "A new machine-translator designed for language translation has offically opened for public testing. GPLTrans is a translator similiar to Babelfish. Pre-alpha testing has shown that it is the most accurate of the major Web-based machine translators. More information can be found here. "
/.'ed? (Score:1)
Cool! (Score:2)
And maybe we'll be able to add on some custom vocabulary, that would be really nice for computer journals (or chemistry, medicine, whatever...)
...at least the article wasn't in German, or something.
---
pb Reply or e-mail rather than vaguely moderate [152.7.41.11].
Premier Stick in ground... (Score:1)
Re:/.'ed? (Score:1)
I can ping it, but port 80 is pretty non-responsive...
---
pb Reply or e-mail rather than vaguely moderate [152.7.41.11].
Swear Words (Score:2)
Forward Progress (Score:1)
"I know what you say, but I don't know what you say. You funny American!"
How do you say.. (Score:2)
I was just thinking about this.... (Score:1)
skript kiddie (Score:1)
Machine translators (Score:3)
I need to try this at work (Score:3)
Perhaps I can use it to translate my words to the customer,. so when I say "Ok,. click on My Computer" they don't hear "restart the computer and click on the first icon you see while hitting the esc key and pulling on the power cord".
AI&Babelfish (Score:3)
Re:skript kiddie (Score:1)
IRC? Never! (Score:1)
Sorry, but I don't believe it's possible, even if a perfect translator for normal speech existed.
Make it a standard desktop component! (Score:4)
It's the Stamp Collector syndrome (Score:5)
Those pundits are wrong: there is no genre of software that the open-source model will never absorb. Simply because the open-source model results in better software, for reasons that are well-known. And no, there is no no software application that is so uninteresting that no volunteer anywhere in the world will touch it. On the contrary: the more an application area remains untouched, the more interesting it becomes to open-source programmers, simply because it's virgin territory.
This is the "stamp collector" syndrome: when you already have a goodly number of stamps in your collection, adding the missing ones becomes an obsession.
Re:Premier Stick in ground... (Score:1)
Re:IRC? Never! (Score:1)
User sumbissions (Score:2)
Will users be able to add/update/correct translations or modify dictionaries ala the APT bot in #debian on irc.openprojects.net?
It seems to me the growth would be incredible if users could modify the dictionary (or atleast add suggestions that could later be added by someone with the appropriate power.
Re:I was just thinking about this.... (Score:2)
/mylang
Sets your default language (put in your startup)
/de,
Translates your typing into the language of choice and funnels it to the current dialog. With all of the translation commands if $mylang is set to a non-english, translation to english is done before translating to another language due to babelfish.
/mde,
Sends a message to in the specified language
/flag
Sets autotranslation of a person or a channel. This was really the coolest command. If some spanish-speaking person came in, you can just
/trans
Self explanitory. Output for your eyes only.
Additionally, there were some new functions that people could use to implement their own fun foreign language commands..
I have heard there is a babelfish library out there that provides a standard way to interface a program with babelfish. Plus, only one thing has to be updated for all of your babelfish-ized programs to work. With a client like BitchX this would be very easy to simply load and use!
This GPLTrans thing sounds very exciting and i'd very much like to see about building a new (better) irc script on top of it!
~GoRK
PS. Since the site is slashdotted, could anyone who knows please tell us a little more about it? Can we do the translation on our own hardware or is it central-server based? Can it directly translate between languages where english is neither the source nor the target? Does it provide a standard (e.g.
I would very much like some day to see all of my basic network communications apps (mail client, newsreader, web browser, instant messenger, irc client, etc) have the ability to machine-translate both incoming and outgoing stuff. Everyone seems to be so bent on how "good" the translation is. If a machine can translate something so that I have a basic grasp of what is going on, then the translation has been a monumental success! I would like to machine translation people focus on getting the technology more widespread before they go trying to make their software translate everything perfectly!!
~GoRK (again)
Re:Machine translators (Score:1)
The problem is that we're all expecting a "Universal Translator" ah la Star Trek.
Re:Premier Stick in ground... (Score:2)
"first wave of the pallet of the beginning"
Distributed AI...? (Score:2)
For that matter, you could even have the users refine the system's grammar.
How hard would that be to implement? Is it totally far-fetched?
Better Context Analysis (Score:5)
What most of these language translation programs need is a better understanding of context. I was surprised to find that Altavista's Babelfish utility has very poor analysis of context (possibly none at all). For example, when translating from English to French, "run" always translates to "exécute". For a sentence like you get which is reasonable, but if you translate you get which doesn't make any sense. More incredibly, "store" always translates to "mémoire". You would think that, if they were going to force every word to be interpreted in one sense, they would choose the most common meaning. But this choice leads to insanity where translates to
With knowledge of context, a more advanced system could notice situations in which it was more reasonable for "run" to have a particular meaning. In the last example, "run" is followed by a prepositional phrase indicating a direction, which would imply that the meaning involving physical movement is appropriate, and so on.
Even more revealing is the fact that the confusion of meaning happens differently for different languages. If you translate
into Spanish, you get the hilarious result: For translation software that has multiple language targets, i would have expected it to first resolve the meaning of the English sentence into an internal semantic representation before using it to emit Spanish or French. The above would be evidence that the Systran software has no such representation -- or at least that their representation is too weak to indicate the difference between "store" as in "memory" and "store" as in a warehouse.-- ?!ng
Re:AI&Babelfish (Score:2)
I wonder how current translators solve this problem, or if they even bother. That is, where one word means different things in different contexts, but in another language, there are two different words for it, when the context can be so ambiguous that both contexts can be the same statement.
Man's unique agony as a species consists in his perpetual conflict between the desire to stand out and the need to blend in.
Re:Machine translators (Score:1)
Someone doing something right! (Score:1)
'I like to soak my feet in gallons of whipped vanilla pudding'
and having it finally come out as
'I appreciate to impregnate my feet in the gallons of the pudding that I have exposed to the flash of the vaniglia.'
Translation methods (Score:4)
I would be inclined to say that if it is based on grammar rules, the project won't make much headway - machine translation has been butting its head against this brick wall for forty years. The problem with hard-and-fast grammar rules, e.g.,
S = NP VP
NP = Det (Adj)* N
VP = V (Adv)
is that they don't account for rapid linguistic change, and people have this nasty habit of twisting grammar to express themselves in new and creative ways.
I imagine GPLTrans would probably be using some sort of probability frame of phrases and words occurring together, but one can't be sure without looking at the source. I think the best way to do translation software would be to convert the text into syntax, then into a more abstract semantic form, and from the semantic form, translate back into the target language's syntax, and then into the target language's text. Of course, the trick is to figure out just exactly how to do this.
My 2 cents/Pfennig/lire/pesos,
Y
Re:Cool! (Score:2)
Now I can write back to the Mexicana Chica who works here and explain CLEARLY and CONCISELY, that whilst very attractive and nice, I can't respond to her advances because I am already "with woman".
The last time I tried, Babelfish somehow made me inform her that I'd love to " kiss her making angry other woman".
Muy Bien!
Mong.
* Paul Madley
Can Open Source improve the design of this thing? (Score:2)
GPLTrans can be quite good, but imagine it's not (I still can't access). Let's suppose that its translation strategy is not very sophisticated and this system ends up being only marginally better than the others. Now, if somebody comes up with a great idea to improve the design of a machine translation system and wants it to be free, what is (s)he supposed to do?
If they are closed to design improvements contributed by others, is their project truly Open?
The _real_ question is ... (Score:1)
More ambiguity (Score:1)
The great irony, of course, was that no machine natural language system in the world - even today - can deal with the sentence "Finally, a machine that understands you as well as your mother." (think about the possible shades of meaning)
The book (was Re:Machine translators) (Score:2)
Do you mean Phillip K. Dick's novel "Galactic Pot-Healer" [barnesandnoble.com]? (Stupid title, I know). In it, bored office workers sending a book title or folk saying through multiple translator machines, and challenging their friends to guess the original title.
It's just called "The Game" in the book.
Re:It's the Stamp Collector syndrome (Score:2)
Moderate this down, citizen.
--
Re:It's the Stamp Collector syndrome (Score:1)
so linux is BETTER than windows?
I see it as different. More stable, less ram hungry, much harder to use, many missing features.
these things are improving, sure, but its taking a damn long time too.
remember not everyone is techie, and the goodness of software cant be evaluated by techies alone.
My guess is that if open-source translator is written better or not depends not so much on wether its open source, but on how talented the main contributors and/or designer(s) are.
Re:Better Context Analysis (Score:1)
Re:Machine translators (Score:1)
Re:AI&Babelfish (Score:2)
Babelfish yields some really funny stuff when English creeps into other lauguages. For instance, the English word "teenager" has crept into German; Babelfish translates it as "tea rodent". Reading this in a movie review, a room full my friends nearly died laughing.
Re:More ambiguity (Score:1)
Rice flies like sand.
This could mean that the noun "Rice flies" enjoy sand or the the noun "Rice" flies in the same manner as sand.
Mirror, Please? (Score:2)
I hope the word databases and algorithm are easily separable from the implementation. I'm sure they can't have bound it too tightly to PHP and MySQL - the presentation layer should be determined by the user, and use of other databases should be possible.
Bruce
According to Bill Gates, this isn't possible! (Score:1)
Re:Better Context Analysis (Score:1)
--
Paul Gillingwater
Finally (Score:2)
Re:How do you say.. (Score:1)
Re:AI&Babelfish (Score:1)
Even then, you couldn't be sure, because it could easily just be sarcasm.
TurboTax: The Final Frontier! (Score:2)
I don't think it's tenable under the Open Source paridigm. I'm sure there are other, similar examples. So, there's room for proprietary software, coexisting with free software and running on a free infrastructure. I'd just rather keep the proprietary stuff in the leaf nodes of the software "tree", where nothing else depends on it.
Bruce
Re:Machine translators (Score:1)
Learning translators (Score:1)
--
Re:Can Open Source improve the design of this thin (Score:1)
Does anyone have an URL they can send that explains these issues in more detail? The question is just too broad to answer in a
How does it work? (Score:2)
Re:Machine translators (Score:1)
"Out of sight, out of mind" [English-Russian-English] = "Invisible; insane"
Re:How do you say.. (Score:1)
My bad spelling...
The effect on Slashdot (Score:1)
read the c't articles.
Re:TurboTax: The Final Frontier! (Score:1)
In America, at least, everything depends on taxes. Thus, what you wish for is impossible.
Re:TurboTax: The Final Frontier! (Score:1)
It's probably safe to say that most systems that require more domain knowledge than programming knowledge will remain difficult to open source. Can anyone come up with an example of such a system that is an open source success?
Walt
Re:TurboTax: The Final Frontier! (Score:2)
I'd argue that we already have created this software as opensource: the web browser or other UI toolkits.
Service will always sell.
Re:More ambiguity (Score:2)
english to french to english on babel. Not as bad as it could have been... :)
does it work at all? (Score:1)
And now their server looks like it's down...
Re:AI&Babelfish (Score:1)
Well even if this word has crept into the german language(we do use it), this mistake has a different origin. If you take the word "teenager" apart you have "Tee" = Tea and "Nager" (shortform of "Nagetier") = rodent.
Re:TurboTax: The Final Frontier! (Score:2)
Re:TurboTax: The Final Frontier! (Score:1)
When I worked in the US, I couldn't believe that employees would need to pay an accountant to fill their taxes. I mean, I know of no other country like that... In all the Europeans countries I know, you fill in some numbers in a form, you sign and that's it!
Re:does it work at all? (Score:1)
I wish I could look at the source, if anyone has it, post a link or something.
Someone moderate this up, along with the (real) first post unfairly marked as redundant, and then spank the moderators for me.
---
pb Reply or e-mail rather than vaguely moderate [152.7.41.11].
Re:I need to try this at work (Score:1)
"I clicked on the thing but it didn't work so I clicked on the other thing and it gave me some message"
Would it be smart enough to translate "I didn't do anything" to "I didn't do anything except replace half the software on the system in a lame attempt at fixing it."
Re:Better Context Analysis (Score:3)
these systomes know nothing about meaning at all.
All that do is try to match one set of strings to
a difrent set of strings.
GPL Trans works by the substuation methoud.
>from: Mooneer Salem
>
> It is a system where words in a phrase that
> can be substituted are
> marked by %phrase%
> For example:
>
> English: My name is %phrase1%.
> Spanish: Me llamo %phrase1%.
>
This genreal systome can be extended in to a
phrase sturcture grammer with pares of rules for
each language. ex:
english: S -> NP1 V NP2
irish: S -> V NP1 NP2
these rules would modal sentences like:
english: the cat chased the dog.
irish: chased the cat the dog.
All this is oversimplifyed but you get the poin.
The real problime is that you need to be trained
as a linguist to understand what the structer of
many seantences are and even linguestes aruge a
LOT. The phrase structal aprouch is probly what
altavista a such do. All thoe I rilly like the
idea to GPL Trans I do not thik there aproch will
get them to far; but it will be fun to see what
thay can do.
This stuff is hard (Score:3)
The problime is that most if not all of
these systomes know nothing about meaning at all.
All that do is try to match one set of strings to
a difrent set of strings.
GPL Trans works by the substuation methoud.
>from: Mooneer Salem
>
> It is a system where words in a phrase that
> can be substituted are
> marked by %phrase%
> For example:
>
> English: My name is %phrase1%.
> Spanish: Me llamo %phrase1%.
>
This genreal systome can be extended in to a
phrase sturcture grammer with pares of rules for
each language. ex:
english: S -> NP1 V NP2
irish: S -> V NP1 NP2
these rules would modal sentences like:
english: the cat chased the dog.
irish: chased the cat the dog.
All this is oversimplifyed but you get the poin.
The real problime is that you need to be trained
as a linguist to understand what the structer of
many seantences are and even linguestes aruge a
LOT. The phrase structal aprouch is probly what
altavista a such do. All thoe I rilly like the
idea to GPL Trans I do not thik there aproch will
get them to far; but it will be fun to see what
thay can do.
Re:TurboTax: The Final Frontier! (Score:1)
And you couldn't even get started if you wanted to. Trademarks are so tightly entwined with the software in that field, that it's just about impossible to Open Source anything.
So, yes, there's plenty of room for proprietary software in the leaf nodes. It's funny, folks talk about the "desktop" as if the home market and the business workplace were similar markets. They're very different in many ways, but luckily much of the traditional home apps are moving to the web, where we can use them on decent operating systems.
As far as TurboTax goes, an open sourced Tax program would be a great thing, since stability and lack of error is one of the major goals. I don't think it will happen though. Accountants don't rush home after work to work on personal accounting projects in the way many programmers do.
Re:How do you say.. (Score:1)
Re:AI&Babelfish (Score:1)
More Babelfish abuse! (Score:1)
Woo-ee, babelfish is smoking crack tonight. It's starting to sound like a religious prophet. The Bible, by Babelfish, anyone?
---
pb Reply or e-mail rather than vaguely moderate [152.7.41.11].
Re:Better Context Analysis (Score:1)
---
pb Reply or e-mail rather than vaguely moderate [152.7.41.11].
Re:Machine translators (Score:4)
The `spirit is willing' story is amusing, and it really is a pity that it is not true. However, like most MT `howlers' it is a fabrication. In fact, for the most part, they were in circulation long before any MT system could have produced them (variants of the `spirit is willing' example can be found in the American press as early as 1956, but sadly, there does not seem to have been an MT system in America which could translate from English into Russian until much more recently --- for sound strategic reasons, work in the USA had concentrated on the translation of Russian into English, not the other way round). Of course, there are real MT howlers. Two of the nicest are the translation of French avocat (`advocate', `lawyer' or `barrister') as avocado, and the translation of Les soldats sont dans le café as The soldiers are in the coffee. However, they are not as easy to find as the reader might think, and they certainly do not show that MT is useless.
BTW, since this book is no longer available in the stores, the whole contents is placed online [essex.ac.uk]. I recommend reading this book to anyone who is interested into the subject of MT. It really is a nice introduction into the subject.
Re:Machine translators (Score:2)
For example, when Nova (the car) was brought to Spain, it didn't sell very well since Nova (no va) translates into "doesn't go". Ford Pinto didn't fare much better; who would drive a car named "small male appendage"? Nike cought on fire (literally!) when an angry mob informed them of that "air" on one of their products was strikingly similar the arabic "Allah". Branif translated it's airline slogan, "Fly in leather" into Spanish as "Fly naked", and the most horrible error was probably some random baby food manufacturer who began selling their product in South Africa. What they didn't think of was that most products in South Africa are labeled with a picture of the food inside the container (due to illiteracy). Their product was of course labeled with a baby, since that was whom the product was supposed for. Imagine the horror -- tinned babies!?
Re:Translation methods (Score:1)
However, my conclusion was that each method (and there are more than two) had both its strengths and weaknesses, and no one of them was "better" than any other in general.
I then went on to propose that the best solution would be to have a "blackboard" system, whereby you allow each parsing methodology to do what it does best and you don't try to twist each of them to handle everything, and they each contribute their own part to the mapping and parsing of the input.
The result being that you can have multiple feedback loops, and the total output should be better than the sum of individual outputs of the various subsystems.
It wasn't exactly the paper that had originally been envisioned, and my adviser only gave me a "B" for it. I wish I had a copy of it online, so that I could provide an URL to it. Hopefully, I've still got a floppy disk around somewhere that I could pull up that has a copy of it. If I ever manage to get a copy and put it up, I'll let you folks know.
Anyway, it seems to me that the sort of systems that Systrans and GPLtrans have created would be ideal applications of this methodology -- take what they have now (strict sentence/phrase/word substitution, or whatever), and combine that with a system that could tag and direct the substitution based on contextual clues.
Implemented properly, you should be able to continue to extend and improve this sort of a system pretty much indefinitely.
Re:TurboTax: The Final Frontier! (Score:2)
No... the hard part about tax software isn't the code... it's the legalesee... who is Joe SixPack going to sue when GnuTax-1040 causes him to be audited? Can we get an addendum to the GPL that says if you use the results without verifying them then you use them at your own risk (oh... wait the wording on that reminded me... isn't there already a "use at your own risk" clause in the GPL?)
Another sticky situation is the trust aspect... are people going to trust us to not collect their personal info? Lately I'm not so sure they're going to trust anyone... OSS or not. ('cause even if they *can* read the source it doesn't mean they'll understand it.)
Being OSS also brings up another point... let's say you and I put out GnuTax and have correctly translated all the tables and formulary... then some 'leet haxor goes and patches it for something (say performance... or "privacy") and breaks the math... who's to blame? (I hate to think this way... but with something like this the blame game is going to be important... just ask Intuit's legal department).
well... just to be on topic I was going to translate this to french or something... but the poor server is slashdotted....
Re:More Babelfish abuse! (Score:1)
Re:How does it work? (Score:1)
Re:It's the Stamp Collector syndrome (Score:1)
"less ram hungry"???
If you are comparing windows and Linux..then you are probably comparing the desktopedness of linux to windows (running X, wm, a net browser, etc), and Linux is very RAM hungry...netscape/mozilla both devour RAM, so does X itself.
Now, if you are comparing Linux the server to windows the server, then ya sure Linux doesn't use half as much ram..(assuming that you aren't running X)
sorry to pick nits.
PS try out corel Linux...its should be pretty nice for non-techie people.
Context and internal semantic representations (Score:3)
While contextual knowledge can increase the qualitiy of a translation; the amount of world knowledge [essex.ac.uk] necessary to translate a typical web page is simply astounding. Most users of a translation system simply do not want to wait for hours to translate a simple sentence.
And, there is the problem of linguistic knowledge. Most web pages are not written in "proper" English, but in some Web-speak-lingo. This requires the system to be very robust.
The most successful use of MT in corporations today are situations where a very simple grammar and lexicon is used, and very little world knowledge ois required. For instance, the Xerox corporation has its own translation system that translates component manuals. The technical writers that write the original version of the manual are required to use very simple English only, without any ambiguities and with very simple constructions.
For translation software that has multiple language targets, i would have expected it to first resolve the meaning of the English sentence into an internal semantic representation before using it to emit Spanish or French.
This "internal semantic representation" is called an Interlingua [essex.ac.uk]. It has been used in various MT systems, with varied amounts of succes.
The most important advantage of an Interlingua-based MT system is that is does not require a translation engine for each language pair. For instance, if you create a system for English, French, Dutch and German texts, you only need to create four analysis engines:
Clearly, it is easier to integrate new languages into a interlingua system than into a transfer system.
Re:It's the Stamp Collector syndrome (Score:1)
this will fail ! (Score:1)
Ive a masters degree in computational linguistics, and I predict this effort will totally fail. Research on automatic translation is about 40 years now and a lot of money has been spent.
However there is still no working solutions, as problems are still far too big. Id suggest everybody participating in discussion should read a good book on linguistics.
It only does EnglishSpanish so far! (Score:1)
Re:Translation methods (Score:1)
It seems to have basis ability to correctly position Proper-Nouns using wild card characters within phrases.
No clever grammar rules etc which is probably a good thing. Stick on a 'did this translate properly' button and let users add to the vocabulary is probably a better approach long term approach with enough users that a clever grammatical algorythm.
POP3 (Score:1)
I ran the English->Spanish translation on my homepage and, although I don't speak Spanish, it is quite clear that it sucked! Much development work to be done I think. A VERY good idea in principle though.
Weak translation, funny note (Score:2)
English: "I am a small fish who wants to live in your ear."
German: "Ich bin a small fish who wants to live in your ear."
Astounding. I couldn't have done it better myself, and it was 6 years since I last took a German class... Wow. Also, I find this part of the Note at the bottom of each page particularily qualitative, too:
Note: this computer-automated translation is not guranteed. It'll screw up with some text. If it does in fact screw up, first make sure you spelt everything properely.
My note: I have mucho respect and understanding for alpha releases. It's just that I'm a nitpicking bastard, and this was quite funny.
Re:Mirror, Please? (Score:2)
Of course, it would of helped had the author (who had hours of advance notice apparently) had emailed with I or my associate that agreed to host his site letting us know he was going to be on slashdot, then arrangements could of been made much earlier. He posted a notice on his site that it was happening, but failed to notify either one of us. (Can you tell I'm not real happy with him right now?).
So if anyone has the resources to mirror this, contact me and I'll arrange it with the author, or contact him directly and arrange it. Either way works.
--
William X. Walsh
william@dso.net
Re:It's the Stamp Collector syndrome (Score:1)
Moderate this down, citizen.
--
Re:Premier Stick in ground... (Score:1)
Try it with "slash dot":
There went my karma...
--
Re:How do you say.. (Score:2)
circus". It was about a hungarian phrasebook
translating (if I recall it correctly) a
hungarian phrase with the meaning "How can I
get to the train station" to "My hovercraft
is full of eels" (and other such nonsense).
Re:AI&Babelfish (Score:2)
Of course, after I saw this, I remembered from my high school German than "to speak" is "reden" and "moon" is "Mond," so I can understand how Babelfish got confused
Re:Better Context Analysis (Score:2)
The real problime is that you need to be trained as a linguist to understand what the structer of many seantences are and even linguestes aruge a LOT.
IMO, linguistics is just as woolly as psychology. That's why they argue; because many of the more subtle assertions about grammar that have been published aren't much more than unsubstantiated opinion.
The human brain uses grammar up to a point, and then dispenses with it. There is no reason to expect that the grammar that has evolved in every language has to be completely regular. So you can formulate a consistent set of grammatical rules to deal with basic usage, but the more complex things get the more often the rules will be broken.
The difference between linguistics and zoology or botany is that the latter subjects only attempt to catalogue a finite number of real living species. But when grammatical rules are flexible or disposable, the number of potential structures is almost as limitless as the number of potential utterances (which Chomsky put a number to, I seem to remember).
In this case, beyond a small core of prescriptive grammar everything else is purely descriptive. To catalogue the resulting infinity of possible verbal blunders and call this zoo a formal grammar is pointless.
Also, even with simple phrases you can have two different interpretations (and two complete but mutually exclusive superimposed structures) whose meaning cannot be resolved without context.
Because of all this, a phrase structural approach, or any other rule based method is ultimately doomed. However, insofar as the linguistics community utilises Artifical Intelligence concepts (as in natural language processing studies), they are it appears still dominated by those who swear by symbolic logic.
I'm inclined to believe that the most effective natural language parsers will always be connectionist rather than rule-based. Connection machines (such as neural nets) can encompass rule-based logic but also have the flexibility to make an "educated guess". Thus they are much more capable of parsing ungrammatical language.
After all, our brains work the very same way when we speak or listen.
Consciousness is not what it thinks it is
Thought exists only as an abstraction
Be sceptical (Score:2)
1 - testing: They claim to be the most accurate of the web-based translators. Based on what corpus and measured in what way? This isn't a trivial question, there are no benchmarks for translation programmes.
2 - parsing. If this program uses American style phrase grammar, it will inevitably break down. Phrase grammar is counterintuitive and for AI purposes pretty unproductive. It is computationally simple - see Charniak's last book for good parsing algorithms - but almost certainly isn't the way humans process language.
All of the most successful natural language translation systems are, in one way or another, dependency grammar based. Dependency based systems are also generally more portable to other languages.
3 - morphology. English is very morphology poor. If morphology is only minimally accounted for (as a lot of poorly thought out, English based NLP systems are), I don't see how it can hope to work in Russian, or Turkish or dozens of other major languages with rich morphology. Furthermore, what kinds of morphological rules can it accept? There are languages that use prefix, postfix and infix morphology. The kinds of simple rules that can account for English will not go vert far with other languages.
I haven't seen this program, and I don't know how seriously these issues have been considered, but they are the kinds of things to keep in mind when looking at machine translation programs.
Re:It's the Stamp Collector syndrome (Score:2)
--
Re:This stuff is hard (Score:2)
aproch -> approach
problime -> problem
systome -> system
difrent -> different
all thoe -> although
linguestes -> linguists
aruge -> argue
substuation -> substitution
rilly -> really
This is obviously not complete, but hey, it's the first version :)
The interesting thing about Moore's spelling is that he's consistent. More consistent than, (to bring it back on topic) translating from German to English.
--
Not real yet. (Score:2)
Bruce
Context of translation (& meta-moderation on /.) (Score:2)
I don't see an easy way to get out of this -- the needed 'world knowledge' that people have pointed out as necessary for this really is huge.
But (and this is why I mention slashdot's metamoderation), there is a certain amount of brute-forcing which could serve as a useful basis for creating improved context interpretation. For instance, let's say you visit this translation engine and choose some text for it to translate ("Mein Hund ist in dein Aktentasche," say). At the same time, there might be a few selections of recent translations requested by others, and the resultant translations, which could be shown to you based on the languages you know. (Not telepathically
The resultant translations could be joined with alternate tranlations / permutations, and each reader could (say), rank-order them, or choose the best one, as far as they can determine by context, etc.
And hopefully, the program can then be taught (wrong word, but I'm being figurative)that (anthropomorphically), something like "OK, if there are several computer-related terms in the translated text, like megabyte and power-supply, 'run' is likely to mean 'execute.' If 'run' however appears in a context which does not indicate computer use, and / or directly before the paired words 'away from,' it should probably be the bipedal-movement one. And if it's in front of a business-type name, like 'bank,' 'lemonade stand' or 'brothel,' then it is likely to mean 'manage' or 'administer."
In my (interested but ignorant layman's) understanding of AI translators, this is the kind of discrimination that they try to make, nothing out of the ordinary. But, because words can fit into so many categories, I think this sort of gradual, piecemiel accumulation holds hope of making it work better over the long haul. It would take too many linguists to account for all the wacky ways that words get used.
Just thoughts,
timothy
Re:It's the Stamp Collector syndrome (Score:2)
Vovida, OS VoIP
Beer recipe: free! #Source
Cold pints: $2 #Product
The Matrix has us. (Score:2)
I'll take a stab at your puzzle: "I toss my cookies down the toilet." Just a guess, highly dependent on humorous context. ;)
Vovida, OS VoIP
Beer recipe: free! #Source
Cold pints: $2 #Product
Using search engines to determine context (Score:2)
Anyhow, no conflict here -- I think translation engines are going to have to use a number of strategies on every input text and see which ones make the most sense in the end, then applying the information that for text-chunk X, translation X-prime (or whichever) was the best translation. That way when phrasings similar / identical to ones in text-chunk X appear again, there is at least a reference to check against.
timothy
antonyms (Score:2)
Couldn't one also use antonyms in this case. I.e. a word/phrase can be a replacement, if it is synonymous, and
Re:Better Context Analysis (Score:2)
The result is that there are some phrase structures which you want to add to in order to complete the sentence but you can't do it without breaking the rules or generating a sentence of incomprehensible drivel.
Most people prefer to break the rules than spout drivel, so for complex sentences in the real world, grammar often breaks down.
BTW, It's obvious that there is an innate potential for grammar in the human brain but I don't agree with Chomsky that we are all born with the same basic grammatical structures hardwired. If you wonder how it is that so many of us end up sharing a similar meta-grammar (to coin a phrase) then you ought to read William H Calvin's book The Cerebral Code [washington.edu] (yes, the whole thing is online, thanks Prof!). He shows at the end precisely how neural structures to support basic grammar could form spontaneously to enable thoughts about who did what to whom, and with what. The same structures are probably used to generate the word order when the thought is spoken.
You may have noticed that the higher apes (principally chimps and gorillas) used in language experiments have demonstrated the ability to form simple grammatical structures too. There were also reputedly some experiments with an African Grey parrot which demonstrated similar ability (but I've not often heard the work cited and don't know how reliable it is).
PS. If you like Calvin's book, his latest one Lingua ex Machina [washington.edu] is all about the evolutionary development of language. Like all of his books this one's online too.
Consciousness is not what it thinks it is
Thought exists only as an abstraction