A Universal Networking Language for the Internet? 291
Anonymous Coward writes: "The United Nations University is developing a
Universal Networking Language for the Internet, which is designed to allow effective communication between people writing in their native languages, with automatic conversion through an intermediate Meta-language (perhaps a precursor to Star Trek's Universal Translator.)
They will be holding a symposium on the technology on 18 November in Brussels, Belgium, where they will publicly announce their achievement. They claim that the initial stage of UNL will support 16 languages: Arabic, Chinese, English, French, Russian,
Spanish, German, Hindi, Italian, Indonesian, Japanese, Latvian, Mongol, Portuguese, Swahili and Thai." An interesting idea, but this is one of those "the devil is in the details" things. It'll be interesting to see how/if this can work.
Re:sounds difficult - not as you say (Score:1)
OTOH, a transformational grammar has not yet been shown to be powerful enough (at least I haven't heard that it has). I think that one would require a complete ATN network with recursion. Bounded recursion would probably be sufficient, as I don't feel that folk understand more than about three layers. Certainly it only goes deeper as a stylistic perversion of normal syntax (but fashion can do strange things).
A worse problem is divergent mappings. No language uses an atomic view of the world, so each concept in each language is a set of items selected from the universe of possible concepts. This can be noticed even within a single language when moving from one dialect to another. It is most easily noticed when discussing things that map readily onto sensory images, e.g., "What is your name for the color of the object?", but it exists in all aspects of lanugage. (What is the difference between "dog" and "hound"?) When one translates the term "black hole" into Russian, I am told, one must use a different term, because in Russian a "black hole" is something specific which is not astronomical (not sure what, but it was taboo).
Now this is mainly something that can be handled by a lot of detail work. But I mean a lot of detail work. To get a very mild idea of a part of what I am talking about, pull out an unabridged dictionary and open it to a random page of definitions. Each meaning listed will probably need to be a separate term in the meta language. And that's just the distinctions that an english speaker would notice.
Re:Who needs it? (Score:1)
1) Financial Markets
2) Aviation
3) Scientific Publication
4) Popular Culture
5) The computer industry ( along with English)
6) Everything else that matters
I hate to be critical... (Score:1)
Actually, the only real data they offer suggests that they are recreating the work Anna Wierzbicka was doing in the 80's with her ad-hoc theory of semantics. She ultimately showed why it wouldn't work, and now criticises the idea of using controlled language at all for machine understanding.
No, these people don't seem to have any idea what they've gotten themselves into. This kind of thing was what I did graduate work on. Controlled language is a useful idea, but a very limited one, and using pivot languages for translation will only take you about as far as Systrans' system (the one used in Babelfish.)
There are much more sophisticated efforts going on elsewhere, and even those are getting bogged down in the ugly reality of natural language. This will languish and go nowhere. With some luck, some more realistic project, like some of the automatic text summary projects and natural language to knowlege base projects will eventually produce a usable product, but this UN university effort sounds like a waste of time.
DLT project did this with Esperanto (Score:2)
What's evil about these projects, of course, is that they don't let people just talk to one another. It would be neat to be able to have access to the literuture of other countries, but that pales in comparison to having access to the people in other countries. If you just learn Esperanto you can really converse with people without needing technology or anything. It just works.
Re:this is great (Score:1)
Bablefish crap (Score:2)
Machine translation will improve, but the best oranization is still going to be browser or proxy based translation. If that translation package internally uses an intermediate semantic representation, then fine, but the day
You have to admire the democratic thinking though (NOT!) - rather than just foreigners seeing your web page as crap, you can (must) see it that way too! Designed by politicians, no doubt.
Scary (Score:1)
Idns.org Internationalized Domain System (Score:1)
IDNS.org [idns.org] has a spec for non-ASCII domain names. They have a modified version of Bind available for download.
Getting this adopted universally is nontrivial.
Re:this is great (Score:1)
> get smitten or something?
Well, "smited", perhaps.
I think "smitten" has a _slightly_ different meaning there...
Re:Language support (Score:1)
I know, I know, people are going to come up with reasons not to use Esperanto. But it seems like if a solution that will work exists, why not use it?
(Note: Even though I like and occasionally use Esperanto, I would welcome use of a similar language like Interlingua or Latino sine Flexione that would be equally easy to learn and do the job just as well.)
Iun vi konfidas, kun ni li alig^as.
Re:Noam Chomsky and the Universal Grammar (Score:1)
Chomsky claims (despite evidence to the contrary) that syntax can be analysed apart of semantics, implying that if we could agree to a universal word list and definitions, it might be possible to devise an equally neutral grammar to use for machine translation. However, it is quite clear that words, even pseudosynonyms, don't mean the same thing in different languages.
My inclination is that Chomsky is just plain wrong about it in the first place: that there is no universal underlying order of constituents, but rather that human language structure are restrained to a subset of all valid ways of organising information linearly, and that those constraints are biological.
This means that any real machine translation requires us first to make real progress in understanding how humans process and store linguistic information. This field is in its infancy.
NewSpeak (Score:1)
Re:It won't work (not as you may expect) (Score:1)
No way (Score:1)
Not only is point one completely and utterly impossible for reasons well discussed here already (slang, local expressions, evolvement of languages etc.), point two actually contradicts point one! They want UNL to be an exact representation of the meaning expressed in the native language, while simultaneously having it to be generic enough so everybody (or at least all "enconverter" developers) can understand what is being said. Assuming the average "enconverter" developer will be as technically (il)literate as the authors of this document, there's no way they are going to understand what technical people are talking about even when using his native language. No way is UNL going to help with that. So how, then, is he going to understand that very same conversation translated from a language he doesn't understand in the first place? Forget it!
Nice idea. Store it in the bin with all the other equally nice ideas: "Health and food for all" and "Can't we all just get along?".
Re:lost in the translation (Score:1)
Akatosh dun said:
It's very interesting that you bring that up. Idioms can be a bear to translate at times, much less cultural references (even from English to Spanish and back--in many fansubbed animes, the fansubbers have to include a section at the beginning for cultural references and idioms that Americans wouldn't necessarily get but Japanese audiences would). Not only that, some concepts do not translate clearly across languages (I actually find it easier to think of the Japanese concept of honour in terms of the Tao or the Dine' {Navaho} concept of the Path of Beauty than in English!).
A really good shot of how translation can require translating idioms and noting cultural reference is the discussion of the upcoming American release of "Mononoke Hime"/"Princess Mononoke" (click here [princess-mononoke.com] for the gory details :). Neil Gaiman is translating for the dub, and apparently there were multiple major issues in translating it including:
The fact the entire dialogue in the movie is not in modern Japanese but in an archaic form (roughly akin to Middle English or the old form of English used in the King James Bible)
A mess of cultural references that Americans would not be aware of (such as one of the main characters cutting his hair--in Japan this is recognised that a warrior is leaving forever and to be among the dead)
A number of idiomatic phrases that had to be translated into American idioms (such as a comment that a character's soup tasted like water--which is about as low as one can go to insult one's cooking...this ended up being retranslated into "Your soup tastes like piss" which is more understandable to silly gaijin :).
Needless to say, it was quite illuminating...especially since some cultural references were noted that I didn't pick up on the first time I saw it (I've seen the fansubbed version) and I'm an otaku. Apparently Gaiman has rewritten the script explaining some stuff that American audiences wouldn't catch, either...and to be honest (IMHO) Gaiman is probably one of the few people who could've pulled it off.
Another really good example of this is the first tape of the anime "Compiler"--which was dubbed, but they STILL had to explain at the end why a giant Colonel Sanders turned into a Japanese baseball player and defeated a mad statue :) (Basically...Roy Bass won the Japanese equivalent of the World Series for the Tigers...the celebrating fans grabbed a statue of Colonel Sanders from a KFC, it being the only Anglo-looking statue that could be found, and threw it into the sea...they have not won the pennant since, and legend goes that some say the town will not win the pennant until the statue of Colonel Sanders is retrieved because the sea gods are pissed. :) Neat story, but not one most Americans would get...then again, the Japanese wouldn't get why octopi are often thrown at Detriot games if they get in the Stanley Cup :)
Re:Interesting, but... (Score:1)
Hmm, I assume that there was an implyed 'only' in there. I have read a few Chinese authors and poets who would very strongly disagree with you, my Eurocentric friend. In fact I would venture to suggest that the body of literature in Chinese is substantially greater than in either Sanskrit or Greek, although I freely admit that I have absolutely no facts whatsoever.
Anyone got any figures?
Hey, something hard! (Score:1)
In any case, sounds like a worthy effort.
Re:Interesting, but... (Score:1)
(And many people would argue that the body of English literature is no greater than that of, for example, German or Japanese.)
Re:The "meta-language" (Score:1)
I'm not sure English is the proper starting point for this type of a machine-read hyper-language. English is primarily a spoken language, with all the fuzziness that implies.
What may be more appropriatte would be to start with written Chinese. From what I undserstand, "Chinese" is already something of a hyper-language, with one written language expressing several spoken languages. Modify the set of ideagrams to include some phonetic symbols (to properly represent the many names that are best represented as sounds). Ideally the syntax would allow for defining custom linguistic symbols, much like XML's ability to define custom tags. Tweak the hell out of this until you have a machine readable language (do less than 2^16 standard "words" seem adequate? Should this blow unicode out of te water and use 32-bit "words"?)
What shape will it be? (Score:3)
Language support (Score:2)
Re:Language support (Score:1)
Re: (Score:1)
Re:Lojban, not Esperanto (Score:1)
I guess the problem is that it's difficult to adapt a computer-friendly language to humans, or a human-friendly language to computers. But like teaching a computer to play chess, that doesn't mean it isn't worthwhile.
Sounds possible.. (Score:1)
Is this what happens inside the head of a bi-lingual person? (This is posed as a question to any readers who might be)
Esperanto just works (Score:1)
Excellent post. Wish I had moderator points today, so I could move it up! Esperanto just works.
I have come to believe that, in the human brain, the language center is tied somehow to the emotions, because people start acting irrationally whenever you start suggesting language alternatives. It's like asking them to change sexual orientation or something--their language is too strongly tied into their concept of personal identity to permit approach. So in an open forum, I seldom see anyone who is not already an Esperantist discuss the language objectively. Sad, really.
But hope springs eternal. I post this URL every time, in hopes that it may someday be of interest to someone: If you are interested in Esperanto, the world's most popular constructed language, try the Esperanto.net web site [esperanto.net] for starters.
As for the UNL, most Esperantists have been aware of it for some time. We wish them well, most of us, really we do. But most people who know more than one human language hold limited hope for such a project's success.
Re:It won't work. (Score:1)
Other types of tech will probably steer just as clear of it when they realise how frustrating it is to compose for an artificial semantically unambiguous language.
Re:The "meta-language" (Score:1)
this is great (Score:1)
And by eliminating all communication restraints... (Score:2)
(with a nod to Douglas Adams)
But still a very cool idea!
Re:Interesting, but... (Score:1)
Of course, this would probably ruin the entire project, but I'm not very confident that it will succeed anyway.
lost in the translation (Score:1)
Re:sounds difficult - not as you say (Score:1)
(1) Not every language has every tense. German has fewer tenses than english, and another poster said that Chinese has none.
(2) Language can't be described in A BNF grammar: it isn't sophisticated enough to capture singular vs. plural, gender, case, verb declensions etc. Phrase structure grammar extend BNF grammar s with parameters to capture these, and Chomsky showed that these are sufficnet to capture all of natural language.
I would guess that the meta-language design is based upon transformational grammar, which exposes the essential similarities between sentences like `The door is closed' and `Close the door!'. This would allow it to express subtleties like different ways of representing the same sentence.
Re:Sounds possible.. (Score:2)
Uh, no. Typically, I believe bi-lingual people internally switch back and forth, or represent some concepts in one language and others in the other. That is to say, when they actually are using internal linguistic representations of things at all.
Re:enlessly [sic] (Score:1)
Re:Sounds possible.. (Score:1)
Re:Allow unicode in email addrs and domain names! (Score:1)
Au contrare, mon frere (Score:1)
If I remember my German, you're talking about something like
Mein Hut, der hat drei Ecken
where "der" referes to "Hut". That's something that will have to be covered in the rules both for translation into and translation out of German, no matter what language you're using to go into or out of German. Otherwise you end up with the English translation being
My hat, the has three corners
where a proper English translation would of course be
My hat has three corners
Of course this is a very simplified example, but I think you get the idea.
I just think that for the foreseeable future (and, since this is computing, that could be, oh, say, six months) the best computers on the planet are the ones we carry around in our skulls. To me it would make more sense to have a single language that everyone would agree on, but then the problem is to agree on the language. All of the "evolved" languages carry their own cultural baggage, and few people seem to think that a "constructed" language is up to the task, even though certainly Esperanto and possibly Interlingua and a couple of others have proven that hypothesis wrong.
Of course just outside the foreseeable future everybody will be speaking Bocci anyway, so what the heck.
Belgium, man, Belgium! (Score:1)
All of europe really needs these kinds of technologies, but Belgium is one of the more multilingual countries within Europe.
Open source Babelfish anyone? (Score:1)
If anybody is interested in starting such a project, please reply in this thread.
dos/tres equis
Just in case anyone's curious... (Score:1)
I did.
English -> French
French -> English
English -> German
German -> English
English -> Italian
Italian -> English
English -> Spanish
Spanish -> English
English -> Portuguese
Portuguese -> English
The end result???
In the first place of all, crío to really distant distant of interest who the O.N.U all can produce and not point out technician. I have the taste of the modernity of the IETF:" we create in the agreement approached and the operation bases it ". the champions money^H^H^H^H^Hcode in the beginning, please. Which is therefore extreme special UNL? The theoretical translation of the language to the inside to a universal language and with of the language B is nearly therefore old here how much the translation " of the machine ". Like the memory, IT CREDITS the first jambs of capturing in the automatic translation it has been based on a similar idea -- and has been isolated like the landslide. For entire a landslide and it good slaughter houses of the example that the automatic translation, manages, in
Re:it will fail (Score:2)
Use FRENCH (Score:1)
Plus, all those frogs like those nifty blue berets.
So... ribbit?
Re:What shape will it be? (Score:1)
Well, if it is, I think it died at some point while it was stuck in someone's lughole...
Reason being? Well... reading the English info about the project (which I can only assume was run through their "enconverter" and "deconverter"):
Multi-lingual network aims to enable people to communicate in their mother language with peoples of different language. UNL is a common language shared by people over the world in multi-lingual network. UNL system basically consists of network and conversion program between UNL and native languages.
A conversion system from native languages into UNL is called "enconverter", and that from UNL into native languages is called "deconverter". Information in each language, being "enconverted", is exchanged via network in the form of UNL. Information represented in UNL is "deconverted" into each native language on the terminal of network.
In transmission of information, the information which is expressed in a mother language is enconverted into UNL. Preciseness of conversion can be verified by deconverting the UNL representation into the language from which the UNL is obtained.
I'm just hoping that they get people who can actually read and write their target languages fluently to do the testing...
An example (Score:1)
Here is an example so you can have a better feeling of what it's like:
So, this a two-sentences, one-paragraph text.
The first sentence has an agent (the team) who won something in the past, and an object (the match) which was won: "The team won the match".
The second sentence has an agent (the player, who is male) who broke something, an agent (the leg) which was broken, and modifiers which specify that this leg is that player's own left leg: "The player broke his left leg."
"Show me the code" -- Linus.
Compiler construction (Score:1)
This problem would map into a modern compiler architecture. The compiler architecture has mutltiple front-ends, languages, and multiple back-ends, machine architectures, bound in the middle by an intermediate, but heavily simplified language. The idea is that a front-end parses and type checks the input and then outputs intermediate language. This can then be fed into any back-end built for a particular architecture.
For example, if you have front ends for C and fortran and backends for PPC and i386, then you can compile fortran programs for PPC or i386 and also C programs for PPC or i386. Any combination. Add another backend, say MIPS and with no extra work, C and fortran compiling are possible.
When dealing with natural languages, you would need a front-end and a back-end for each language.
There are a number of catches, here are a few:
Bottom line, of course a universal translator is possible, but until we discover BabbleFish or the brainwave reading equivalent (would reading brainwaves be enough, would all species "think" alike?), there will be plenty of input restrictions. Afterall, somethings just don't translate. Because of these restrictions, it will be infuriating and impractical to use.
Re:Bahasa Indonesia (Score:1)
Re:Noam Chomsky and the Universal Grammar (Score:1)
Re:sounds difficult (Score:1)
Ideas like run, walk, buy, sell, ect. would easily translate... however things like "glark", "glob", "grep" may not translate accurately. That is the 6 russian verb forms you mentioned may all be mapped directly to only 3 verbal meanings.
ie: grep to look, glob to list, and glark to understand... and so on.
The resulting word elements could then be arranged by a simple pattern-matching AI into an acceptable form. The result is a valid natural-language sentance which has some shadow of the original meaning. In practice this could allow for useful bussiness communication but prevent discussions of abstract ideas.
Yet another fine example of how problem-domain-"scoping" affects over-all software functionality.
Re:Confused (Score:1)
Re:Interesting, but... (Score:1)
Chinese is derived from Sanskrit.
Thank you
Re:lost in the translation (Score:1)
They'd have to go with the "norm" of the language like what is used in spelling dictionaries and school texts i guess.
I can see it now some really silly looking english sentences translated from spanish or italian.
Could work in the following context: (Score:1)
Here's how it might work:
This has the disadvantage that you lose some flexibility, subtlety and art in your writing, but you decided to give that up when you decided to go multilingual, right?
The point is that if you write text specifically so that can go to one foreign language and back smoothly, it's probably pretty translatable to many languages, I'm guessing.
You can try this now with Babelfish. Take a passage of text you wrote in English, convert it to something (e.g. French) and back. Then edit the original until the English that comes back is decent. This will force you to remove colloquialisms and force you to work around deficiencies in the translation program, but isn't this worth it for a good translateable piece of text?
Final note: We have all seen Babelfish make funny translations. There will always be some words/phrases that software cannot translate perfectly without AI. But certainly, we are all smart enough to craft text that software can translate well! As the software gets better, we can put less and less effort into this.
Re:Esperanto ?? (Score:1)
I'm all for the idea of composite metalanguages, for computers, but I don't see why anyone should cripple a metalanguage so people can use it too.
Re:The "meta-language" (Score:1)
Re:sounds difficult - not as you say (Score:1)
Mongol? (Score:1)
Re:Sounds like Esperanto - take 2 (Score:1)
All it takes is learning a lot of words and working on your pronunciation, but English grammar is absurdly easy compared to that of most Romance languages, and can be learnt in an afternoon.
I'd say French is harder for Spanish speakers than English, precisely because it has a complex Romance grammar, and a fucked up pronunciation to boot.
This Won't Work (Score:1)
The Boob Factor.
That's right. The Boob Factor hasn't been addressed. None of these meta-langauges or intermediate langauges have addressed this important topic.
What is the Boob Factor, you ask? Quite simply, the Boob Factor is you, it is us. We are the Boobs.
Meta-languages or intermediate langauges, we will assume, work on well known grammatical and linguistic rules. In order to function correctly, these rules must be adhered to flawlessly.
Let us examine the following statement:
I like red meet.
You the reader have been blessed to have a couple of ounces of grey matter resting on your more than likely underdeveloped shoulders. You have the ability to infer the offending Boob's meaning in this sentence. Do you place faith in a meta-langauge or intermediate langauge to do the same? I don't think so. The Boob Factor has reared its ugly little head.
We should all sit back and wait until God has reversed his Babel of Confusion mayhem that he inflicted upon us in a drunken stupor. We can then all go back to speaking tongues in the master language of Sumerian. Oh, the joy for that day.
Is this necessary? (Score:1)
However... Do we really need a new metalanguage? Couldn't we just as easily use an existing language as the intermediate form? It would be just as easy to translate, and you wouldn't have to learn a new language to understand the system.
There's an idea in linguistics which is similar to the Church-Turing theorem in philosophy, although it's not as well established: Every modern language is assumed to have equivalent expressivity. If you wanted, you could translate from English to Chinese using an Aborigine language as your intermediate without any problems. (Except deficiencies in vocabulary, but it's easy to make up new words.)
I suspect the real need for this meta-language has to do with this project's association with the UN: They don't want to offend any ethnic group by chosing an existing lanugage as the standard.
why esperanto sucks. (Score:1)
anyway, the real reason that esperanto is not successful is that it still has stupid rules -- for example, nouns still have gender which means that there are still too many pronouns and you still can not complete a sentence without knowing the gender of the subject.
not to say that esperanto is bad, but we all know that esperanto is just spanish V2.0 and no one will admit it.
a truly universal language must be written from scratch with all of the "fluff" removed. people say that you will lose the poetic qualities and you will lose the innuendo and colloqualism -- i say, that these people are pathetic whiners who are trained to be cynical of anything which could be considered progress. "poetic quality" and innuendo has _nothing_ to do with the language which it is written in. you might prefer german opera to italian, but it does not make one any "better" than the other. either way, you could still write your poetry in the language of your choice -- and, thanks to the UNL, people will still know what youre talking about.
the fact of the matter is, effective worldwide communication is a much more serious matter than an old-fashioned idea of what is "good" poetry. poetry will persist so long as there are good poets; we do not need to acomodate them with prissy, "romantic" languages. this just makes it easier for unskilled drunks to make more sappy, bad poetry.
as they say, you have to crack some eggs to make an omelette. i say thanks to UNU for cracking some eggs, and to everyone that thinks they can improve upon any of the current languages, please stop picking eggs out of the trash.
Bad example. (Score:1)
Re:Bahasa Indonesia (Score:1)
2. One tense is not necessarily true. We have other ways to explain the tense in our sentences.
I don't wan't to explain much about Bahasa Indonesia, since this is not a linguistic site.
But, as a person who has involved in several computational linguistic projects in Bahasa Indonesia.. we do have our own difficulties to deal with.
One of it is the verb-formation which is very flexible. This makes the stemming algorithm works harder for Bahasa Indonesia than other languages.
The Doc
Re:Um..is Enconverter even a word? (Score:1)
I, for one, find I can get my idea across in Spanish a lot more often than I can understand what a native Spanish speaker is trying to tell me.
dos/tres equis
UNU Appears to be lacking in linguistic knowledge (Score:1)
The first major problem that they will have is defining a syntax for the language. That's not so tough if you just define an arbitrary syntax and leave it at that. But I suspect that they will try hard to design a syntax that distills the most popular aspects of each of the languages that they're translating from, thus getting stuck in a linguistic tar-pit from which they will never escape. I hope.
You see, there have been many attempts at discussing the "universal syntax", that is the base syntax for the language that the brain uses. In most flavors of Chomskyan syntax theory this is termed something like "deep structure" (lately it's been "D-Structure" to avoid any implied but inaccurate meanings of the word 'deep'). DStruct is in essence the most general syntax needed for accurate expression of any sentence structure in any human language. It's supposed to be general, not differentiating between different languages on the syntax level (notice that I haven't mentioned meaning yet -- that's something completely different, the
Another theory, Head-driven Phrase Structure Grammar says that every word projects its own dependent structure, and that the structure projected from a word in the lexicon must adjoin properly to other words projected from the lexicon to form grammatical sentences. This theory also takes into account some semantics issues as well, and is very popular amongst the Computational Linguistics and Natural Language Programming crowds, but isn't too popular amongst the older ranks of theoretical linguists. It too is language dependent in its structure of syntax, although very comprehensive syntaxes of certain languages have been developed with some success.
That's just syntax. It's not easy. It's not very regular. It's very context sensitive. If anyone has written a compiler for any programming language they know how complex a language will get if you allow it to be context sensitive (instead of context free).
Semantics, the meaning behind a particular word or phrase, is a ridiculously complicated problem in linguistic research. People have spent their entire lives researching it with little success, and at various times in the history of linguistics certain well-known demagogues have denounced the study of semantics in its entirety because it appeared to them to be too unfounded or scientifically reasonable. Chomsky to this day makes nasty comments about semanticians and is well-known for denouncing research into semantics because most work is not provably consistent in even restricted domains.
Semantics is gnarly. It's weird. Researchers who work in semantics are said to get their more successful ideas from hallucinogenic chemicals. Semantics is a subdiscipline in which any random researcher can overturn the field with one paper, tossing out all of the research done previously -- and get away with it successfully. I don't mean to degrade the work of semanticians, and I'd love to join their ranks some day, but it must be admitted that much of semantic research and theory has a hard time standing up because it's in its infancy.
Look carefully at the construction of a programming language compiler. It deals with what's known as a 'regular language'. This is a language that is known to follow certain rules consistently, and all special cases are well-defined (for most languages anyway
To put all of this into perspective, consider a universal translator for computer languages -- what's it called? It's called a computer. So what do we call a universal translator for human languages? Surprise -- a human.
Re:Sounds possible.. (Score:1)
This brings up a problem with translations and learning new languages. IMO you don't really KNOW the language unless you THINK in that laguange. Doing a BabelFish in your head is not the same. This is exactly the problem I (and probably most adults) have learning new languages; we babelfish a new language instead of learning it.
Re:UNL? Yeah, right! (Score:1)
The Problems with Meta Languages (Score:1)
When you look at existing technology, like Babelfish at Altavista, you see that the 'devil in the details' might be more of a 'great satan' than one might think. I'm not sure you can have any kind of accurate translation without a human acting as a filter for meaning. Its easy to apply some rules to a metta language interpreter, but using it in discourse would probably create quite a bit of ambiguity. Just look at this translation if you don't believe me.
English to German and Back
If you regard available technology, like Babelfish with Alta Vista, you see you that the ' devil in the power of the details could think much more from a large satan than one. I am not safe you can type exact translation without human serve as a filter for meaning to have. Its easy, some guidelines to more mettasprachinterpreter to apply but at using it in the statement would probably create much ambiguity. Straight lines view of this translation, if you do not believe me.
English to French and Back
When you look at existing technology, like Babelfish at Altavista, you see that the ' devil in the force of the details much more than one great Satan which one A could think. I am not sure you then not to have any kind of precise translation without acting human as a filter for the significance. Its easy to apply some rules to an interpreter of language of metta, but to use it in the speech would probably create ambiguity much. Glance right with this translation if you do not believe me.
and my personal favorite....
English to Portuguese and Back
When you look at it existing technology, as Babelfish in Altavista, sees that ' the devil in the power of the one details much more satan great of that one could think. I am not certain you I can have no type of the accurate translation without acting human as a filter for meaning. Its easy one to apply some rulers to an interpreter of the language of metta, but to use it in the speech would create probably the ambiguity sufficient. To look at just in this translation if you not to believe me.
Need I say more?
Inconsistency + BS (Score:1)
The UNL will be inconsistent as a few of messages has already pointed out.
Moreover, is this suppose to be the project of some freshman? The web page is messed up; there are lots of errors. One of the lines says "How to joint the UNL Community" on page http://www.unl.ias.unu.edu/eng/unlhp-e. html [unu.edu]. I find a few by just looking at it. I think the people who are responsible for ths do not even care. The pages are poorly coded (made by some win9x program) and pictures look distorted. They did not even give an explanation of how will it be done.
<!--#include virtual="disclaimer"-->Re:It won't work. (Score:1)
So it only works for boring documents. They're plenty happy with that.
Re:Do they have a clue? (Score:1)
My apologies to all you fifth-graders out there, sorry.
Better that babel? (Score:1)
And if/when its use becomes widespread people might start writing to the meta-language. Not writing in it, necessarily, but, for example, being explicit on things that would confuse it. If that happens, then it really would work.
sounds difficult (Score:4)
For example, in the case of inflected languages, how do you get the declensional case information into the metalanguage? In many languages, there are grammatical cases have overlapping declensions, so there's ambiguity about what would be intended with meaning. And mapping between languages would be really tough.
Verbs would be really tough. Like in Russian, you have three tenses (past, present, and future) as well as two verb aspects. So you have pairs of verbs, one expressing action that occurs once, the other expressing habitual activity.
Sounds like the project would be lots of fun to work on, though. It's a really neat idea, linguistically.
The "meta-language" (Score:2)
Overall, I agree strongly with the idea. From a testing standpoint, with the development of an effective meta-language, all one would need to do test the translation for the most part is go from language x->meta language->language x. If that works, than presumably the meta language did not slaughter language x.
One question I have is how the language engine will handle words it does not know--or, more likely, abbreviations, misspellings, and slang. From what I've gathered, this is where other translators fail. If the translator doesn't understand half the sentence, than it generally has too much trouble finding context for the rest for anything to make sense. Just a thought.
We already have a "Universal Language" (Score:2)
We already have a "Universal Language." It's called English.
I'm not trying to be facetious; I'm not saying English is better than other languages; and I'm not saying that English will serve you best, or even tolerably well in all places; but it is an inevitable conclusion you must come to after spending any reasonable length of time abroad: if there is anything resembling a universal language in this world, it's English.
English is already a lingua franca in technical and many academic fields. Many universities in non-English-speaking countries actually demand that graduate students write their theses in English, because that is the best way to ensure its diffusion. Some such schools even conduct their classes themselves in English.
The Hollywood movie industry has also no doubt played a large part in helping to making English (not to mention Western culture) palatable and popular the world over. Dubbed versions of films are hardly ever as popular as subtitled ones (exception: kiddie films).
Is English the best choice for a universal language? Definitely not from the point of being easy to learn. Esperanto would be much better. But realistically Esperanto doesn't have a chance. If English ever encounters a contender, it will probably be Chinese, if only because 1/5 of the planet speaks the language.
Confused (Score:3)
Re: (Score:2)
Re:UNL? Yeah, right! (Score:3)
Watch out this is very, very long...
Don't think about it as "automatic" translation, it's much more likely to work out as semi-automatic. I expect that the process would be something like this:
1.Run automatic converter from natural language to intermediate.
2.Have an expert in the intermediate language review the translation.
3.Run automatic converters to the target natural languages.
4.Have linguists review the output.
Compare and contrast with a "traditional" translation process:
1. Ask a translator to translate from language "A" to target "B". Ideally, the person in charge of the translation should be fluent in language A, a native speaker of B and have at least basic knowledge of the subject at hand (for instance: Open Source).
2. Ask a linguist, (ideally fluent in language A, native speaker of B, etc.) to review the translation produced at step 1.
The point is that the intermediate language should be designed to be free of the ambiguities that plague language translation.
And how exactly can you do this? Either your intermediate language is "limited" (that is to say: misses many of the subtleties of the original language), which eases step #1 but certainly introduces many errors down the line. Or, it is an "advanced" language, that is able to translate many of the finer point of your "start" language -- but then, the interesting thing is the translation engine itself. Not the intermediate language. If your translation engine is good enough to translate, say, Spanish into UNL with little/no loss of meaning, it is also good enough to translate Spanish to English with no intermediate step!! If this is true, what's the point of UNL.
Another point is, how can you be an expert in an "intermediate language"? Either the language is "human-readable", but probably produces an output compared to sludge and correcting this sludge may introduce additional errors. Not to mention the pain it represents to check on something that borders on the unreadable. Or it is machine readable -- but in that case, who is going to read it?
Final point is productivity: using UNL, computers and machine translation may take longer than a simple translation "by hand" with human grey matter. A Windoze95 machine with MS Word and some good "paper" or digital dictionaries is, in many cases, more efficient and cheaper than going through the pain of machine translation.
The hope is to minimize or eliminate step (4).
Good luck! Frankly, this has been the "Holy Grail" of machine translation ever since it started. And I do not think we are any closer. So, far, every large, international institution that I am aware of (UN, UNESCO, EU Commission, EU Parliament, NATO, IMF, etc) either use tons of translators or have standardized on a couple of languages at most (English being, of course, the "Lingua Franca"). All the large international institutions mentioned above that use machine translations ahve discovered that, even on simple subjects, the 4th step you describe above is the one that consumes the largest time.
It would be a big win if you could get to the point where all the hard stuff is done just *once* instead of repeated over and over again for all of your target languages.
Again, this is the "Holy Grail" of machine translation. I don't believe that we are any closer to it than we were, say, 30 years ago. At least not judging from the output of some of the software available out there...
And no, this will not work for poetry or humor, but there's no good way to translate poetry and humor in any case. The idea would be to get it to work with technical, legal, and business language.
Sorry to say this, but this does not work very well either for legal or technical language. It may work with Business, since PHBs are so limited intellectually =). Legal translation can be horrendous: I have translated many legal documents in the past and I can tell you there is nothing worse than that, because legal terms are incredibly complicated and old-fashioned and also since legal trivia has to be rendered in a very exact manner. Legal terminology (in almost every language) is one of the most confusing and complicated one. Plus, lawyers and legal people are a major pain in the neck when it comes to Once you get the terminology right, I agree the rest of a legal document is usually a matter of "filling the blanks". But getting the legal terms right is enough to drive you nuts.
Technical translation is another problem: I think some technical areas may be the best bet for machine translation yet. The problem, as far as the technical field is concerned, is that in fast-moving areas (computer science is one) the technical vocabulary is changing and evolving so fast it's hard to keep up pace. I read up to 5 computer magazines a week (not to mention a daily dose of Slashdot =) just to keep up-to-date with the latest evolution in language and technology. Keeping a UNL database of terms and translation could prove to be a daunting task...
>What's so special about UNL? Theoretical translation of language A into a universal language and from there to language B is almost as old as "machine" translation itself.
The fundamental argument is that it hasn't worked before so it isn't going to work now is stupid. It has been demonstrated how difficult it is to do this, but not that it is impossible.
Please note that I never said (in the sentence you quote above) that this is not going to work. I just said that, as far as I am concerned, using an "intermediate" language is old news. This may be a new and interesting idea to you, but, frankly, for someone who has worked in translation, you could very well trace back this concept all the way to Volapuk and Esperanto. And these two were invented in the 19th century.
As far as I am concerned, I think you could prove that correct translation is impossible. All you would have to prove is that a "human" language is a chaotic complex system, which usually follows unpredictable rules and has several strange attractors, inducing a runaway complexity.
Case in point: English. Roots: Saxon dialects, Norman dialects, Old English and Old French. Latin. A little bit of Greek. Maybe German and Old Dutch. Evolution influenced by French and a myriad of other languages. Now divided into several branches (US English, British English, Irish English, Australian English, Indian English, International English), all of them influencing each other and countless other languages. Reducing the English language to a set of neat little equations and computer routines is left as an exercice to the reader... =)
Please understand me: computer translation of "basic" English into UNL and from there into Chinese, French, Spanish, Italian, Japanese, etc... is no big deal. Computer translation of highly technical/scientific papers may be achieved. But even then, due to the inherent complexity of English (or any other human language), a human will have to review the machine translation and correct it.
I therefore suppose that perfect translation does not exist (or is impossible). Translation (like programming) is an art, not a science. You can have a certain number of "artistic" rules, but you cannot have a "perfect", scientifically proven, solution.
Example: give a problem to be solved to two good programmers, and they'll probably come up with two different and equally valid solutions. Which solution you pick has to be determined by other factors (speed of implementation, maintenance and evolution of the system, optimization, resources used, etc).
Give a translation to be done to two good translators and they will probably come up with two rather different and equally valid translations. Which one you pick is then determined by other factors (length of translation, speed of said translators, price of translation, style, etc). Complex systems, like languages, cannot be reduced or predicted. They can be analyzed and more or less "solved" -- the quality of the solution being dependent on many factors, such as the experience of the specialist, his choice of tools, etc. This is true even in reductive or limited systems, where, for instance, the vocabulary to used is small (see technical translation above).
Remember the butterfly in Brazil that creates a storm at the other end of the world? I suspect translation (especially multiple language translation) may well be the kind of complex system that is so hard to solve using computers.
Perfect translation, like perfect programming, is only possible in a very limited scope. A "DO
>For a good example of the total and dismal failure of machine translation,
>try translating this text into French (or Spanish, or Italian, or whatever)
>with Babelfish and back to English. Then do it a few times. Then try
>English to Chinese and back a few times. Case closed.
Hardly, Here's why that is not a valid test
1.Babelfish doesn't use an intermediate language.
2.Babelfish doesn't even achieve loseless translation from
language A to B and back to A. This is the simplest case and
one which can be improved the most with a good definition for UNL.
1. A intermediate language should introduce even more bugs into Babelfish translation. See above.
2. "Lossless" translation is impossible. See above. Complex systems, such as human languages, cannot be reduced easily to a set of equations.
>It is, in fact, an even better AI test than the Turing test.
They do not claim perfect translation, but yes computer which could translate between languages and do it perfectly would pass the test. Do you really argue that it is impossible for computer programs to ever pass the turing test? It is only a matter of time till this happens. The only way to stop it is to stop making computers.
Actually, I thought a computer had managed to recently pass the Turing Test, or some limited version of it. Anyone out there could supply information on this one?
But: I don't think the Turing test is actually a very good AI test. There is a huge difference between a program that is able to "talk" to you (parrot back what you said) and one which is able to understand you. A computer able to understand human language would probably be the first real AI on this planet. Most Turing test software are based on some variation of Eliza, and this has been around for ages.
Here we are reading
Well, this may be surprising to you, but the work of a professional translator has not evolved very much. Computer and communication technologies have eased their task a lot. Like many other professions, translators are now able to work from home, access the Internet and its wealth of information, send documents to clients by e-mail, and even use some very clever software that ease the translation process (TM/2, Trados, etc).
Word processing, in particular, certainly is the best thing to happen to translators since sliced bread =). Also, I agree that many new translation fields have been added in the past century: biology, computer science, aerospace, etc.
But the central fact remains this: to be a translator you have to be fluent in (at least) one language, a native speaker of another, and have a good expertise in one or more field of human activity. That's it. Oh, and you have to have a certain "talent" with languages, just like you need to have a certain "talent" for programming. It's an art, remember? Even the best-trained translator is worth 0 if he/she does not have that special "talent". Exactly like a lot of people work on Linux -- but there is only one Linus Torvald. =)
We may translate faster, have more tools and information at our disposal, and produce better-looking documents -- but the core skills remain the same and the work process is exactly the same. You could train a translator today in the exact same way they were trained 100 years ago: with a pen and a piece of paper. Sorry to disappoint you, but Computer technology is not always the perfect solution it prides itself to be...
That's All Folks!
Poor website (Score:3)
Interesting, but maybe off the mark (Score:2)
Although this is an interesting idea, it makes an assumption that all language is based off of one abstract "map". IMHO different languages have different maps. Having spent a fair ammount of time learning ancient greek in high school and college, I can say that the map for that language is quite different from english, and those are both Indo-European languages.
The concepts that exist in one language may not in many other languages, which is often very problematic. Eventually, to learn any language, you must actually just start thinking in it, and not doing translation to your native language. Contemplating the 3 voices in greek (active, passive, and middle) is something I rather enjoy doing, as it is very foreign to english.
I am just afraid that they will have to produce a Least Common Denominator language which won't be useful for anything beyond technical specifications and instructions. I will have to admit that that would be useful on many fronts, but may not be the dream that we were all hoping for.
Who is gonna patent this first? (Score:2)
Amazon does it with ecommerce 1-click. Microsoft does it with style sheets. Hell, if its a good.. Interesting technology why not, lets take it and pantent it to death! Then we can charge everyone for it and make a zillion-and-one dollars. Perhaps I should send in my application today!
There needs to be limits on patents. Yes, I believe they do foster invention, but they also can stop community work on a really-good-thing.
Perhaps a community-patent-agency and a easy, low cost effort to setup patents that are held by some sort of group for the explicit reasoning of keeping some basic ideas *free* for us geeks and the rest of the world.
Really, it shouldn't have to come down to this tho. But someone will patent the implementation of this and we will all be screwed.
My $0.02
It won't work. (Score:2)
Yorkshire, UK, for example, still uses "thee" and "thou". If you translate this into some kind of meta-language, it's either going to barf, or lose details. Those details may be important to meaning. God only knows how it'd cope with Cockney slang, or even common phrases (eg: "from the horse's mouth", "a sticky wicket")
As I see it, this can ONLY work for formalised documents, using a formalised subset of the various languages. eg: Legal papers, UN treaties, etc. It'll NEVER work with informal, written language.
Re:sounds difficult - not as you say (Score:2)
Verb tenses are not the problem. Every language can express every tense, just in a different way. Hard yes, impossible no.
Additionally, approximations work well enough. Ex. Most English readers couldn't tell you the difference between past tense and preterite(sp?) tense.
Grammar is easily defined. 90% of language could be described in a BNF. adv-adj-noun in one, noun-adj-adv in another. So what. That is probably the simplest part.
My interest would be in the meta-language design. Words by number? string? Grammar by parsing into a std format, or classifying each word? Are there multiple ways to organize a statement? What about this "word hierchy" they talk about. Quite cool there.
This is a very good idea (Score:2)
Okay, so some of the idioms and convoluted sentences will be improperly converted, and will need some manual tweaking. Hopefully this system will allow this tweaking to take place. By providing multiple different conversions back into the author's native tongue, they may be able to see some of the translational oversights, and fix them.
This won't be good for poetry, but will allow people who only know one language (English speakers seem more likely to fit this category than other people) to publish documents readable by people who do not speak English - that's a substantial breakthrough.
It would be nice if this standard would allow segments to be set to fixed translations, so that if I really wanted the English to read a particular way, I could enforce that particular idiom, without loss of generality. ("Normally translate 'it has a low probability' but if you ARE translating to english, substitute the literal string 'fat chance'")
it will fail (Score:3)
Automatic translation between languages in the EU is something that could save a lot of money. So there have been a lot of research projects funded with loads of EU money to accomplish this. All of these projects have failed (as far as I know).
This seems to be a similar effort, this time by the UN which is an equally burocratic organization. I think the goal of this project is probably too ambitious to work. Even translations between two related languages (english and german)are troublesome (babelfish for example is not exactly perfect), so I can't see why translations to an intermediate language would change things (ever tried to do that in babelfish? the result is not pretty).
So, it will probably fail and loads of money will be wasted on it.
Spelling errors (Score:2)
A holy war could be started because of the sentence:
If it's spelled incorrectly.
Use your imagination.
UNL? Yeah, right! (Score:3)
Several points -- for full disclosure, let me just state that I am a localization engineer, with a 5+ years of experience in software localization (read: adaptation into different languages) and a 7+ years experience in translation. If that does not makes me qualified to comment on this, I don't know what does.
Of course, I may be completely wrong and UNL may be the next best thing since sliced bread. But I doubt it.
And here's an antecedent... (Score:2)
Wow. I've been on
Do they have a clue? (Score:2)
Are there any actual computer scientists or linguists involved with this project? Their web site looks like it's either a team of bureaucrats or fifth-graders.
Re:Language support - Esperanto? (Score:2)
In German it is possible to use the definite article to refer back to something used in the previous sentence, rather like `it' in English: but with the crucial distinction that what we refer back to must be of matching gender. So if a masculine, feminine and neuter word occur in the sentence it is possible to refer to any of them with the `it'. This ability to refer on the basis of gender must be captured in our syntactic model. Similarly the case system allows one to have multiple indirect objects (one accusative, one datave and one genetive, for example) directly attached to a verb, where in english one would use a preposition.
one big bland language coming right up (Score:2)
On the other hand, they just might be able to come up with a way to map a small subset of natural language, computer-speak for example, for the purpose of easing the creating of internationalized apps and making web sites more navigable. But I don't see how this could be successful in a general case.
Noam Chomsky and the Universal Grammar (Score:3)
Universal grammar is defined by Chomsky as ``the system of principles, conditions, and rules that are elements or properties of all human languages... the essence of human language'' [Chomsky, 1978].
Thus, all languages that we are accustomed, English, Arabic, Malay, Japanese, and Chinese are special cases of a universal grammar. Chomsky and subsequent linguists are looking for those common elements of all languages.
Universal grammar and the innateness hypothesis
Universal Grammar in Prolog [nyu.edu]
There are lots of discussion about this... see google [google.com].
Key differences from Esperanto (Score:2)
1) It's not Romance-based and thus won't be as Euro-centric and will thus probably translate Eastern languages better.
2) It's designed as an intermediate language and not as a final end-user language. As far as I can tell, it could even be machine-readible and not speakable. In any case, it will not have as many constraints as a language like Esperanto that is designed for human speech.
These are just my gueses. I don't know what kind of language they're actually trying to implement. (The website is skimpy on those details.)
Interesting, but... (Score:4)
Re:Interesting, but maybe off the mark (Score:2)
Re:Interesting, but... (Score:2)
In actuality English is perhaps the most complex language in modern use, for a number of reasons. It has by far the largest vocabulary; it takes root words from many many other laguages; it's rules of grammer are highly irregular; because of the introduction of printing before the great vowel shift the spoken form of English does not agree at all with the written from; the geographic spread is so large that several dialects pigdins and patois forms of English now exist. If you propose that we adopt English as the base language you are going to have to be very specific about WHAT English using what local idioms and rules.
The richness and complexity of English is perhaps best exemplified by the richness of it's body of great literature and poetry where expression and level of meaning are best brought to form by a language that has a great richness of vocabulary and ability to express multiple levels of ideas in a single word. Of all the languages of the world there are three that clearly have great bodies of literature - Sanskrit, Greek, and yes, English.
Linkages (Score:2)
Re:Noam Chomsky and the Universal Grammar (Score:2)
Bahasa Indonesia (Score:2)
has, to my knowledge, only one tense. And no articles. And plural noted by saying the noun twice ("orang" is person, "orang-orang" is people).
Needless to say, there isn't much poetry in Indonesian...
Re:It won't work. (Score:2)