Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
Check out the new SourceForge HTML5 internet speed test! No Flash necessary and runs on all devices. ×
The Internet

A Universal Networking Language for the Internet? 291

Anonymous Coward writes: "The United Nations University is developing a Universal Networking Language for the Internet, which is designed to allow effective communication between people writing in their native languages, with automatic conversion through an intermediate Meta-language (perhaps a precursor to Star Trek's Universal Translator.) They will be holding a symposium on the technology on 18 November in Brussels, Belgium, where they will publicly announce their achievement. They claim that the initial stage of UNL will support 16 languages: Arabic, Chinese, English, French, Russian, Spanish, German, Hindi, Italian, Indonesian, Japanese, Latvian, Mongol, Portuguese, Swahili and Thai." An interesting idea, but this is one of those "the devil is in the details" things. It'll be interesting to see how/if this can work.
This discussion has been archived. No new comments can be posted.

A Universal Networking Language for the Internet?

Comments Filter:
  • Actually, as long as the final interpretation is done by a human rather than by a computer, some parts of the understanding can be let fly.

    OTOH, a transformational grammar has not yet been shown to be powerful enough (at least I haven't heard that it has). I think that one would require a complete ATN network with recursion. Bounded recursion would probably be sufficient, as I don't feel that folk understand more than about three layers. Certainly it only goes deeper as a stylistic perversion of normal syntax (but fashion can do strange things).

    A worse problem is divergent mappings. No language uses an atomic view of the world, so each concept in each language is a set of items selected from the universe of possible concepts. This can be noticed even within a single language when moving from one dialect to another. It is most easily noticed when discussing things that map readily onto sensory images, e.g., "What is your name for the color of the object?", but it exists in all aspects of lanugage. (What is the difference between "dog" and "hound"?) When one translates the term "black hole" into Russian, I am told, one must use a different term, because in Russian a "black hole" is something specific which is not astronomical (not sure what, but it was taboo).

    Now this is mainly something that can be handled by a lot of detail work. But I mean a lot of detail work. To get a very mild idea of a part of what I am talking about, pull out an unabridged dictionary and open it to a random page of definitions. Each meaning listed will probably need to be a separate term in the meta language. And that's just the distinctions that an english speaker would notice.
  • Ok. In Russia, Russian is used for:

    1) Financial Markets
    2) Aviation
    3) Scientific Publication
    4) Popular Culture
    5) The computer industry ( along with English)
    6) Everything else that matters

    :)
  • There's no real technical information on the website, and no evidence at all that a linguist is actually participating in this project. It sounds like a bunch of computer scientists who think they understand language.

    Actually, the only real data they offer suggests that they are recreating the work Anna Wierzbicka was doing in the 80's with her ad-hoc theory of semantics. She ultimately showed why it wouldn't work, and now criticises the idea of using controlled language at all for machine understanding.

    No, these people don't seem to have any idea what they've gotten themselves into. This kind of thing was what I did graduate work on. Controlled language is a useful idea, but a very limited one, and using pivot languages for translation will only take you about as far as Systrans' system (the one used in Babelfish.)

    There are much more sophisticated efforts going on elsewhere, and even those are getting bogged down in the ugly reality of natural language. This will languish and go nowhere. With some luck, some more realistic project, like some of the automatic text summary projects and natural language to knowlege base projects will eventually produce a usable product, but this UN university effort sounds like a waste of time.
  • The Distributed Language Translator (DLT) was a project in the Netherlands that made a first pass at this 10 years ago. They started with Esperanto and then made some changes to disambiguate words (even more than Esperanto already does). It worked pretty well, but suffered from the same kinds of problems anyone who's used translating software has seen before. Here's a nice article about it [geocities.com] -- in Esperanto [esperanto.net], of course.

    What's evil about these projects, of course, is that they don't let people just talk to one another. It would be neat to be able to have access to the literuture of other countries, but that pales in comparison to having access to the people in other countries. If you just learn Esperanto you can really converse with people without needing technology or anything. It just works.

  • Uh, does that mean the end of the world? Didn't the creators of the tower of babel get smitten or something? I remember something about god not being happy so he did something and destroyed the tower...
  • Why would anyone want their web page to read as if it's been run through a bablefish? A translation from netspeak into, say, English is always going to suffer some mangling, and most likely is not going to allow idiom, metaphor, etc.

    Machine translation will improve, but the best oranization is still going to be browser or proxy based translation. If that translation package internally uses an intermediate semantic representation, then fine, but the day /. reads like bablefish crap is the day I find myself an English web site.

    You have to admire the democratic thinking though (NOT!) - rather than just foreigners seeing your web page as crap, you can (must) see it that way too! Designed by politicians, no doubt.
  • While I don't believe myths and such, it is rather scary how it matches the story of babel. Since we can't be scattered to the corners of the earth, what'll happen?

    http://logos.uoregon.edu/polyphonia/babel.html
  • IDNS.org [idns.org] has a spec for non-ASCII domain names. They have a modified version of Bind available for download.

    Getting this adopted universally is nontrivial.

  • > Didn't the creators of the tower of babel
    > get smitten or something?

    Well, "smited", perhaps.

    I think "smitten" has a _slightly_ different meaning there...


    Cheers,
  • Forget support for Esperanto -- just use Esperanto as the intermediary language it was designed to be. Somehow I don't think encouraging people to include support for ISO 8859-3 in operating systems, browsers, etc. is going to be any less difficult than making allowances for bi-directional text in any of a number of character sets, to say nothing of language nuances (quick, how would you translate "Gemütlichkeit" into anything but German?). Esperanto is not that hard to learn, even for non-Indo-European-language speakers (there have been, and presumably still are, significant Esperanto movements in Japan and China, for example). The grammar can be grasped in about 30 minutes and you can carry the essential vocabulary around in your wallet.

    I know, I know, people are going to come up with reasons not to use Esperanto. But it seems like if a solution that will work exists, why not use it?

    (Note: Even though I like and occasionally use Esperanto, I would welcome use of a similar language like Interlingua or Latino sine Flexione that would be equally easy to learn and do the job just as well.)
    --
    Iun vi konfidas, kun ni li alig^as.
    --
  • Chomsky revises everything he thinks every 10 years or so. The existence of a universal grammar of the type Chomsky currently advocates (and it is by no means clear that this is true) still doesn't necessarily mean that we can construct a common, useable language for everyone. Remember, every language used in the world is one of those "special cases."

    Chomsky claims (despite evidence to the contrary) that syntax can be analysed apart of semantics, implying that if we could agree to a universal word list and definitions, it might be possible to devise an equally neutral grammar to use for machine translation. However, it is quite clear that words, even pseudosynonyms, don't mean the same thing in different languages.

    My inclination is that Chomsky is just plain wrong about it in the first place: that there is no universal underlying order of constituents, but rather that human language structure are restrained to a subset of all valid ways of organising information linearly, and that those constraints are biological.

    This means that any real machine translation requires us first to make real progress in understanding how humans process and store linguistic information. This field is in its infancy.
  • This is all well and good until somone in the UN declares that isn't a word anymore...
  • I'm surprised nobody's mentioned Hofstadter yet; he had a pretty good translation of Jabberwocky into German and French. Should you translate "Campbell's Soup" to "Borscht"? "Jakobstrasse" to "Jacob Street"? Why bother translating Dickens; just read Dostoyevsky!
  • Quote from their document [unu.edu]:
    UNL is designed with the following aims:
    (1) UNL is to be capable of exactly representing all the information expressed in any language.
    (2) UNL expression must be defined not only as rigorous but also as general as possible in order to be understood by any people who are engaged in the development of "enconverters" and "deconverters" in each language.

    Not only is point one completely and utterly impossible for reasons well discussed here already (slang, local expressions, evolvement of languages etc.), point two actually contradicts point one! They want UNL to be an exact representation of the meaning expressed in the native language, while simultaneously having it to be generic enough so everybody (or at least all "enconverter" developers) can understand what is being said. Assuming the average "enconverter" developer will be as technically (il)literate as the authors of this document, there's no way they are going to understand what technical people are talking about even when using his native language. No way is UNL going to help with that. So how, then, is he going to understand that very same conversation translated from a language he doesn't understand in the first place? Forget it!

    Nice idea. Store it in the bin with all the other equally nice ideas: "Health and food for all" and "Can't we all just get along?".

  • Akatosh dun said:

    The concept is nice, but you're still stuck with the problem that most languages are based on anacdotal references as well as accual words. You can translate the words, but the concepts will still frequently be lost.

    It's very interesting that you bring that up. Idioms can be a bear to translate at times, much less cultural references (even from English to Spanish and back--in many fansubbed animes, the fansubbers have to include a section at the beginning for cultural references and idioms that Americans wouldn't necessarily get but Japanese audiences would). Not only that, some concepts do not translate clearly across languages (I actually find it easier to think of the Japanese concept of honour in terms of the Tao or the Dine' {Navaho} concept of the Path of Beauty than in English!).

    A really good shot of how translation can require translating idioms and noting cultural reference is the discussion of the upcoming American release of "Mononoke Hime"/"Princess Mononoke" (click here [princess-mononoke.com] for the gory details :). Neil Gaiman is translating for the dub, and apparently there were multiple major issues in translating it including:

    The fact the entire dialogue in the movie is not in modern Japanese but in an archaic form (roughly akin to Middle English or the old form of English used in the King James Bible)

    A mess of cultural references that Americans would not be aware of (such as one of the main characters cutting his hair--in Japan this is recognised that a warrior is leaving forever and to be among the dead)

    A number of idiomatic phrases that had to be translated into American idioms (such as a comment that a character's soup tasted like water--which is about as low as one can go to insult one's cooking...this ended up being retranslated into "Your soup tastes like piss" which is more understandable to silly gaijin :).

    Needless to say, it was quite illuminating...especially since some cultural references were noted that I didn't pick up on the first time I saw it (I've seen the fansubbed version) and I'm an otaku. Apparently Gaiman has rewritten the script explaining some stuff that American audiences wouldn't catch, either...and to be honest (IMHO) Gaiman is probably one of the few people who could've pulled it off.

    Another really good example of this is the first tape of the anime "Compiler"--which was dubbed, but they STILL had to explain at the end why a giant Colonel Sanders turned into a Japanese baseball player and defeated a mad statue :) (Basically...Roy Bass won the Japanese equivalent of the World Series for the Tigers...the celebrating fans grabbed a statue of Colonel Sanders from a KFC, it being the only Anglo-looking statue that could be found, and threw it into the sea...they have not won the pennant since, and legend goes that some say the town will not win the pennant until the statue of Colonel Sanders is retrieved because the sea gods are pissed. :) Neat story, but not one most Americans would get...then again, the Japanese wouldn't get why octopi are often thrown at Detriot games if they get in the Stanley Cup :)

  • Of all the languages of the world there are three that clearly have great bodies of literature - Sanskrit, Greek, and yes, English.

    Hmm, I assume that there was an implyed 'only' in there. I have read a few Chinese authors and poets who would very strongly disagree with you, my Eurocentric friend. In fact I would venture to suggest that the body of literature in Chinese is substantially greater than in either Sanskrit or Greek, although I freely admit that I have absolutely no facts whatsoever.

    Anyone got any figures?
  • And it will probably take the UN 42 years to provide the first draft specs.

    In any case, sounds like a worthy effort.
  • It's definitely complex and inconsistent, but the point was that it's explicit--eg, instead of inflection you use prepositions and auxiliary verbs--and to that extent superior to many languages for scientific purposes. (Incidentally, Whitehead was a logician/mathematician, arguably the greatest of this century; I feel inclined to believe him when he says English offers an advantage in his own field.) You're absolutely correct, but I don't think that invalidates his statement.

    (And many people would argue that the body of English literature is no greater than that of, for example, German or Japanese.)
  • (I'm just thinking online here. I don't even know many spoken languages, but many of my Asian friends have spent long hours telling me how terrible English is.)
    I'm not sure English is the proper starting point for this type of a machine-read hyper-language. English is primarily a spoken language, with all the fuzziness that implies.
    What may be more appropriatte would be to start with written Chinese. From what I undserstand, "Chinese" is already something of a hyper-language, with one written language expressing several spoken languages. Modify the set of ideagrams to include some phonetic symbols (to properly represent the many names that are best represented as sounds). Ideally the syntax would allow for defining custom linguistic symbols, much like XML's ability to define custom tags. Tweak the hell out of this until you have a machine readable language (do less than 2^16 standard "words" seem adequate? Should this blow unicode out of te water and use 32-bit "words"?)
  • by stx23 ( 14942 ) on Wednesday October 13, 1999 @06:24AM (#1616409) Homepage Journal
    I'm hedging my bets it will be fish shaped, and will fit into the inner ear.
  • What, no support for Esperanto?!

    -=-=-=-=-

  • Close, but not exactly. About the closest you might get in English would be "the coziness you feel when you're together with family and friends, or at a pub you're quite fond of." And even that isn't quite on the mark.
    --
  • We've already got English.

    Now, the Academy Francais may not like it, but English is already the language of:

    1) Financial Markets
    2) Aviation
    3) Scientific Publication
    4) Popular culture
    5) The computer industry
    6) Everything else that matters.

    English is the new Latin: Deal with it.

    -jcr
  • Well, part of the problem is that lojban really doesn't resemble anything so much as an explosion in a type factory. Esperanto and Interlingua at least have the occasional Latin or Greek root that's worked itself into worldwide usage.

    I guess the problem is that it's difficult to adapt a computer-friendly language to humans, or a human-friendly language to computers. But like teaching a computer to play chess, that doesn't mean it isn't worthwhile.
    --
  • Seems to me that it will be easier to write a translator from your native language to a very well defined and documented intermediate language than trying to understand the fine details of a non-native language.


    Is this what happens inside the head of a bi-lingual person? (This is posed as a question to any readers who might be)

  • Excellent post. Wish I had moderator points today, so I could move it up! Esperanto just works.

    I have come to believe that, in the human brain, the language center is tied somehow to the emotions, because people start acting irrationally whenever you start suggesting language alternatives. It's like asking them to change sexual orientation or something--their language is too strongly tied into their concept of personal identity to permit approach. So in an open forum, I seldom see anyone who is not already an Esperantist discuss the language objectively. Sad, really.

    But hope springs eternal. I post this URL every time, in hopes that it may someday be of interest to someone: If you are interested in Esperanto, the world's most popular constructed language, try the Esperanto.net web site [esperanto.net] for starters.

    As for the UNL, most Esperantists have been aware of it for some time. We wish them well, most of us, really we do. But most people who know more than one human language hold limited hope for such a project's success.

  • Trust me, no linguist will use this. It would be like getting a perl user to switch to TCL - they would carp for years about all the things they can't do the way they want to, assuming they can even do all the things they want.

    Other types of tech will probably steer just as clear of it when they realise how frustrating it is to compose for an artificial semantically unambiguous language.
  • I think I would be more inclined to agree with you if you were arguing that we should use English because other languages have a large number of words borrowed from it.

    Hamish
  • Now we can start work on that Tower of Babel again. :-)

    human://billy.j.mabray/
  • it will allow more and longer flamewars than anything else since the invention of SNTP!
    (with a nod to Douglas Adams)

    But still a very cool idea!
  • I agree, but I'd like to toss this goody in: AN Whitehead's Science and the Modern World included a section about language (as an analogy for mathematics, IIRC). One of the points he made was that while English is a shallow language, even compared to other Germanic languages, it makes up for it, in some ways, by being utterly explicit. Nothing is implied or masked by, eg. inflection; the entire language open, simple, and, to some extent, precise. (That a bit of an exaggeration, of course.) I believe his argument was that that made English a superior language for science, where ambiguity is a Bad Thing, but I can see who it could be extended to this, in the form of using English as the lowest-level of the metalanguage, then building protocols for the other languages on top, in a hierarchy of language features.

    Of course, this would probably ruin the entire project, but I'm not very confident that it will succeed anyway.
  • The concept is nice, but you're still stuck with the problem that most languages are based on anacdotal references as well as accual words. You can translate the words, but the concepts will still frequently be lost.
  • Two mistakes in the above:

    (1) Not every language has every tense. German has fewer tenses than english, and another poster said that Chinese has none.

    (2) Language can't be described in A BNF grammar: it isn't sophisticated enough to capture singular vs. plural, gender, case, verb declensions etc. Phrase structure grammar extend BNF grammar s with parameters to capture these, and Chomsky showed that these are sufficnet to capture all of natural language.

    I would guess that the meta-language design is based upon transformational grammar, which exposes the essential similarities between sentences like `The door is closed' and `Close the door!'. This would allow it to express subtleties like different ways of representing the same sentence.
  • Is this what happens inside the head of a bi-lingual person?

    Uh, no. Typically, I believe bi-lingual people internally switch back and forth, or represent some concepts in one language and others in the other. That is to say, when they actually are using internal linguistic representations of things at all.
  • how old are you? 3, 4? this kind of comment is sad, you must have alot of personal problems to be so lame as to take a normal conversation and just start insulting it... lemme guess... your first /. post?
  • I am bilingual speaking english and french. When speaking french I think in french. This continues untill I come upon an idea/concept I don't know in french. I then switch to english thinking and try to translate another way of saying the idea/word into french. If I used a kind of non-linguistic referencing system it would be harder for just two languages. Any multi(4+)-linguists out there. Do you use a non-linguistic referencing sytem or do you reference everything back to your mother tongue?
  • Yeah, but then how are we supposed to access these sites with a 'western' keyboard? It's not that i've got nothing against the idea. If it cost no extra to say register domain using roman/asian/russian characters, then sure, no problem.
  • to quote George Carlin :)

    If I remember my German, you're talking about something like

    Mein Hut, der hat drei Ecken

    where "der" referes to "Hut". That's something that will have to be covered in the rules both for translation into and translation out of German, no matter what language you're using to go into or out of German. Otherwise you end up with the English translation being

    My hat, the has three corners

    where a proper English translation would of course be

    My hat has three corners

    Of course this is a very simplified example, but I think you get the idea.

    I just think that for the foreseeable future (and, since this is computing, that could be, oh, say, six months) the best computers on the planet are the ones we carry around in our skulls. To me it would make more sense to have a single language that everyone would agree on, but then the problem is to agree on the language. All of the "evolved" languages carry their own cultural baggage, and few people seem to think that a "constructed" language is up to the task, even though certainly Esperanto and possibly Interlingua and a couple of others have proven that hypothesis wrong.

    Of course just outside the foreseeable future everybody will be speaking Bocci anyway, so what the heck. :)
    --
  • Belgium is fast becoming the Mecca for speech and language technology with players like Lernout & Hauspie [lhsl.com] and projects like Flanders Language Valley [www.flv.be].

    All of europe really needs these kinds of technologies, but Belgium is one of the more multilingual countries within Europe.
  • Since it seems related, I've had a dream open source project in mind for some time. Not so much along the lines of UNL as Babelfish. I think this is the perfect project for the open source model because people from around the world could contribute work relevant to their own languages. A propreietry project would have to employ many specialists.

    If anybody is interested in starting such a project, please reply in this thread.

    dos/tres equis
  • "First of all, I do not really believe the UN can produce anything remotely interesting, technically speaking. I like the IETF motto: "we believe in rough consensus, and working code". Show me the money^H^H^H^H^Hcode first, please. What's so special about UNL? Theoretical translation of language A into a universal language and from there to language B is almost as old as "machine" translation itself. As far as I remember, early EU research into machine translation were based on a similar idea -- and they were dismissed as a failure. For a good example of the total and dismal failure of machine translation, try translating this text into"

    I did.
    English -> French
    French -> English
    English -> German
    German -> English
    English -> Italian
    Italian -> English
    English -> Spanish
    Spanish -> English
    English -> Portuguese
    Portuguese -> English

    The end result???

    In the first place of all, crío to really distant distant of interest who the O.N.U all can produce and not point out technician. I have the taste of the modernity of the IETF:" we create in the agreement approached and the operation bases it ". the champions money^H^H^H^H^Hcode in the beginning, please. Which is therefore extreme special UNL? The theoretical translation of the language to the inside to a universal language and with of the language B is nearly therefore old here how much the translation " of the machine ". Like the memory, IT CREDITS the first jambs of capturing in the automatic translation it has been based on a similar idea -- and has been isolated like the landslide. For entire a landslide and it good slaughter houses of the example that the automatic translation, manages, in
  • You're right, but any decent "universal translator" will not stop at translating individual words. Its dictionary would extend to phrases of the sort you mention. Perhaps it would define "stand at window" as "stand .1m - 1m away from window, while normal vector from plane of body intersects window." Regardless, it would be quite a chore to accomplish this. Context is everything. The more subtle the meaning, the more context you need For example, if I said "That's really smart", you don't know if I'm being complimentary, self-deprecating, ironic, or insulting.
  • French is a dead language, one whose speakers stubbornly refuse to admit it, and one whose primary nation tightly controls it.

    Plus, all those frogs like those nifty blue berets.

    So... ribbit?

    --Corey
  • I'm hedging my bets it will be fish shaped, and will fit into the inner ear.

    Well, if it is, I think it died at some point while it was stuck in someone's lughole...

    Reason being? Well... reading the English info about the project (which I can only assume was run through their "enconverter" and "deconverter"):

    Multi-lingual network aims to enable people to communicate in their mother language with peoples of different language. UNL is a common language shared by people over the world in multi-lingual network. UNL system basically consists of network and conversion program between UNL and native languages.
    A conversion system from native languages into UNL is called "enconverter", and that from UNL into native languages is called "deconverter". Information in each language, being "enconverted", is exchanged via network in the form of UNL. Information represented in UNL is "deconverted" into each native language on the terminal of network.
    In transmission of information, the information which is expressed in a mother language is enconverted into UNL. Preciseness of conversion can be verified by deconverting the UNL representation into the language from which the UNL is obtained.


    1. Why not call it "encoding" and "decoding" like the rest of the world?
    2. Internet is spelt "Internet", not "Inter-Net"
    3. The grammar is terrible. "Information represented in UNL is "deconverted" into each native language on the terminal of network."... er... anyone else see words missing from that sentence?


    I'm just hoping that they get people who can actually read and write their target languages fluently to do the testing...

    Simon
  • Here is an example so you can have a better feeling of what it's like:


    [unl-t]
    [unl-p]
    [unl-s]
    agt(win(icl>event).@past.@entry.@73,team(icl>col lective).@def.@topic)
    obj(win(icl>event).@past.@entry.@73,match(icl>en tity).@def)
    [/unl-s]
    [unl-s]
    agt(break(icl>event).@past,player(icl>male).@def .@140)
    obj(break(icl>event).@past,leg(icl>body).@372)
    mod(leg(icl>body).@372,player(icl>male).@def.@14 0)
    mod(leg(icl>body).@372,left(icl>state).@140)
    [/unl-s]
    [/unl-p]
    [/unl-t]

    So, this a two-sentences, one-paragraph text.

    The first sentence has an agent (the team) who won something in the past, and an object (the match) which was won: "The team won the match".

    The second sentence has an agent (the player, who is male) who broke something, an agent (the leg) which was broken, and modifiers which specify that this leg is that player's own left leg: "The player broke his left leg."

    --
    "Show me the code" -- Linus.

  • This problem would map into a modern compiler architecture. The compiler architecture has mutltiple front-ends, languages, and multiple back-ends, machine architectures, bound in the middle by an intermediate, but heavily simplified language. The idea is that a front-end parses and type checks the input and then outputs intermediate language. This can then be fed into any back-end built for a particular architecture.

    For example, if you have front ends for C and fortran and backends for PPC and i386, then you can compile fortran programs for PPC or i386 and also C programs for PPC or i386. Any combination. Add another backend, say MIPS and with no extra work, C and fortran compiling are possible.

    When dealing with natural languages, you would need a front-end and a back-end for each language.

    There are a number of catches, here are a few:

    • Finding the intermediate language. It should be possible, but a pain in the ass. After all, it has happened for computer languages and they vary widely.
    • Computer languages are confined to a certain syntax. To make this work, the input would have to be checked for valid syntax and type checked. In other words, poor grammar, incorrect use of words, etc. would simply not be allowed to get past first base. After tons of research, some AI might be introduced here to make the rules more flexible.
    • There will be a learning curve to using the system. Users will have to figure out what is valid. I think this goes for every system. Slang is going to send almost any solution guessing.

    Bottom line, of course a universal translator is possible, but until we discover BabbleFish or the brainwave reading equivalent (would reading brainwaves be enough, would all species "think" alike?), there will be plenty of input restrictions. Afterall, somethings just don't translate. Because of these restrictions, it will be infuriating and impractical to use.

  • Well, how do they connive to impress their girlfriends then? Or should I say girlfriend-girlfriend?
  • But note that Chomsky's UG hypothesis is about grammar, i.e. syntax. Semantics (which is what you're trying to preserve in translation) is another thing entirely, and something that Chomsky has always been pessimistic about being able to formalize.

    --Seen

  • It would be most likely that the Meta-Language would only be able to handle a small subset of the meanings available in any of it's natural language counter-parts. The subset of "Meta-meanings" would be the set of all common meanings between all the languages.

    Ideas like run, walk, buy, sell, ect. would easily translate... however things like "glark", "glob", "grep" may not translate accurately. That is the 6 russian verb forms you mentioned may all be mapped directly to only 3 verbal meanings.

    ie: grep to look, glob to list, and glark to understand... and so on.

    The resulting word elements could then be arranged by a simple pattern-matching AI into an acceptable form. The result is a valid natural-language sentance which has some shadow of the original meaning. In practice this could allow for useful bussiness communication but prevent discussions of abstract ideas.

    Yet another fine example of how problem-domain-"scoping" affects over-all software functionality.




    - // Zarf //
  • If you run Universal Networking Language through their coder-decoder thing, it comes out as "Colossal waste of resources and money". In other words, Microsoft.
  • by Anonymous Coward
    Dear friend,

    Chinese is derived from Sanskrit.

    Thank you
  • Good point, especially when you try and deal with dialects or regional meanings. I'm sicilian and the italian spoke in sicily is a different beast from the one spoke in the mainland Italy.

    They'd have to go with the "norm" of the language like what is used in spelling dictionaries and school texts i guess.

    I can see it now some really silly looking english sentences translated from spanish or italian.
  • Details aside, I think an idea like this could work well in the right context. Namely for a document which you don't want to contain nuance, for example, international law or multilingual web pages, etc.

    Here's how it might work:
    • Write your document in your native language
    • Translate it out to "Universal"
    • Translate back to native
    • Look for mistranslations and change the original to avoid this
    • Repeat until the out-and-back translation conveys the same meaning as the original
    • When you are happy, post the Universal version on your web site (and maybe ask a friend who speaks another language to read it once in her language)
    • Hope that the other deconverters are as good as yours

    This has the disadvantage that you lose some flexibility, subtlety and art in your writing, but you decided to give that up when you decided to go multilingual, right?

    The point is that if you write text specifically so that can go to one foreign language and back smoothly, it's probably pretty translatable to many languages, I'm guessing.

    You can try this now with Babelfish. Take a passage of text you wrote in English, convert it to something (e.g. French) and back. Then edit the original until the English that comes back is decent. This will force you to remove colloquialisms and force you to work around deficiencies in the translation program, but isn't this worth it for a good translateable piece of text?

    Final note: We have all seen Babelfish make funny translations. There will always be some words/phrases that software cannot translate perfectly without AI. But certainly, we are all smart enough to craft text that software can translate well! As the software gets better, we can put less and less effort into this.
  • I'm not sure the internet is going to do Esperanto any favours, considering that anyone who uses the Net in a regular basis is likely to already speak English (which, to all intents and purposes is the Lingua Franca for at least the next century).

    I'm all for the idea of composite metalanguages, for computers, but I don't see why anyone should cripple a metalanguage so people can use it too.
  • OK according to the web site, the UNL aka "the meta language" Will be based off of english with a means for defining new words, as long as you can provide a word in your original language, AND locate it in the conceptual hierarchy. The most effective step they could take at this point, to increase the propagation would be to come up with an XML dtd, for UNL dictionary entries, and conversion/deconversion mappings. BTW it looks like the web site was produced using UNL technology, and it's not too bad, not as good as a native speaker with strong rhetorical skills but sufficient to carry technical and commercial traffic. The one thing it probably won't be very good at is translating persuasive text meant to convince people. Not such a great loss.
  • Is that the same tribe whose numbering system consists of "one", "two" and "many"? :)


  • Since when did Mongol become one of the world's major languages? Half the people in Mongolia are nomads, besides! Thats like Al Gore's suggestion to bring the internet to Africa, to help the people who don't have electricity & running water. weird
  • Nah, English is piss-easy for Spanish speakers too, if they put half a brain cell to work on it.

    All it takes is learning a lot of words and working on your pronunciation, but English grammar is absurdly easy compared to that of most Romance languages, and can be learnt in an afternoon.

    I'd say French is harder for Spanish speakers than English, precisely because it has a complex Romance grammar, and a fucked up pronunciation to boot.
  • This project is doomed to hell. I will tell you why if you listen intently.

    The Boob Factor.

    That's right. The Boob Factor hasn't been addressed. None of these meta-langauges or intermediate langauges have addressed this important topic.

    What is the Boob Factor, you ask? Quite simply, the Boob Factor is you, it is us. We are the Boobs.

    Meta-languages or intermediate langauges, we will assume, work on well known grammatical and linguistic rules. In order to function correctly, these rules must be adhered to flawlessly.

    Let us examine the following statement:

    I like red meet.

    You the reader have been blessed to have a couple of ounces of grey matter resting on your more than likely underdeveloped shoulders. You have the ability to infer the offending Boob's meaning in this sentence. Do you place faith in a meta-langauge or intermediate langauge to do the same? I don't think so. The Boob Factor has reared its ugly little head.

    We should all sit back and wait until God has reversed his Babel of Confusion mayhem that he inflicted upon us in a drunken stupor. We can then all go back to speaking tongues in the master language of Sumerian. Oh, the joy for that day.
  • My first reaction on seeing this story was, "Wow, what a cool idea!" I'd love to work on designing this language. (And it is possible. All of the objections that I have seen in these threads can be resolved if you understand modern linguistics. Take a course, it's worth it.)

    However... Do we really need a new metalanguage? Couldn't we just as easily use an existing language as the intermediate form? It would be just as easy to translate, and you wouldn't have to learn a new language to understand the system.

    There's an idea in linguistics which is similar to the Church-Turing theorem in philosophy, although it's not as well established: Every modern language is assumed to have equivalent expressivity. If you wanted, you could translate from English to Chinese using an Aborigine language as your intermediate without any problems. (Except deficiencies in vocabulary, but it's easy to make up new words.)

    I suspect the real need for this meta-language has to do with this project's association with the UN: They don't want to offend any ethnic group by chosing an existing lanugage as the standard.
  • i have wanted to make a "universal language" for as long as i can remember, but i still have not found the time ...

    anyway, the real reason that esperanto is not successful is that it still has stupid rules -- for example, nouns still have gender which means that there are still too many pronouns and you still can not complete a sentence without knowing the gender of the subject.

    not to say that esperanto is bad, but we all know that esperanto is just spanish V2.0 and no one will admit it.

    a truly universal language must be written from scratch with all of the "fluff" removed. people say that you will lose the poetic qualities and you will lose the innuendo and colloqualism -- i say, that these people are pathetic whiners who are trained to be cynical of anything which could be considered progress. "poetic quality" and innuendo has _nothing_ to do with the language which it is written in. you might prefer german opera to italian, but it does not make one any "better" than the other. either way, you could still write your poetry in the language of your choice -- and, thanks to the UNL, people will still know what youre talking about.

    the fact of the matter is, effective worldwide communication is a much more serious matter than an old-fashioned idea of what is "good" poetry. poetry will persist so long as there are good poets; we do not need to acomodate them with prissy, "romantic" languages. this just makes it easier for unskilled drunks to make more sappy, bad poetry.

    as they say, you have to crack some eggs to make an omelette. i say thanks to UNU for cracking some eggs, and to everyone that thinks they can improve upon any of the current languages, please stop picking eggs out of the trash.

    -abf.
  • Babelfish isn't great. We all know this. However babelfish doesn't use an intermediate language, and no, French is not an intermediate language. The idea behind an intermediate language is that you can have groups working very hard to get their language to translate into the intermediate language. Then, by doing that, their language can be translated into every other language that the intermediate language supports. If you worked for as long as it would take to translate your language into 10 different languages on only one language, you'd come out with a pretty good translation. An intermediate language would also have the advantage of being able to be optimized to be translated into and especially translated from.
  • 1. No articles? You gotta be kidding!
    2. One tense is not necessarily true. We have other ways to explain the tense in our sentences.

    I don't wan't to explain much about Bahasa Indonesia, since this is not a linguistic site.
    But, as a person who has involved in several computational linguistic projects in Bahasa Indonesia.. we do have our own difficulties to deal with.
    One of it is the verb-formation which is very flexible. This makes the stemming algorithm works harder for Bahasa Indonesia than other languages.

    regards,
    The Doc

  • As a person who speaks only English, but resides in a country where the official languages are French, Flemish (Dutch by any other name), and German, I can tell you from personal experience that it is *much* easier to understand someone speaking another language than it is to try to make yourself understood in another language.

    In my experience, this is universal.

    I, for one, find I can get my idea across in Spanish a lot more often than I can understand what a native Spanish speaker is trying to tell me.

    dos/tres equis
  • It appears that the people at UNU who discussed this idea of UNL didn't bother to talk to anyone trained in modern linguistic theory. (I don't claim to be trained, but I have more than a passing familiarity with linguistics, *and* I read /., so...)

    The first major problem that they will have is defining a syntax for the language. That's not so tough if you just define an arbitrary syntax and leave it at that. But I suspect that they will try hard to design a syntax that distills the most popular aspects of each of the languages that they're translating from, thus getting stuck in a linguistic tar-pit from which they will never escape. I hope.

    You see, there have been many attempts at discussing the "universal syntax", that is the base syntax for the language that the brain uses. In most flavors of Chomskyan syntax theory this is termed something like "deep structure" (lately it's been "D-Structure" to avoid any implied but inaccurate meanings of the word 'deep'). DStruct is in essence the most general syntax needed for accurate expression of any sentence structure in any human language. It's supposed to be general, not differentiating between different languages on the syntax level (notice that I haven't mentioned meaning yet -- that's something completely different, the /semantics/ of a language). This unfortunately isn't usually the case, since many languages like to put different sentence constituents in different locations within a typical sentence structure. English, f'rinstance, is said to have an SVO (Subject Verb Object) word order. That is, the Subject of a sentence will be the first major constituent in a sentence, followed by the Verb, then the Object constituent. This isn't general, however. In the case of a question, the word order of an English sentence often (but not always!) changes to a VSO (Verb Subject Object) form. Other languages use completely different word orders. Japanese, IIRC has a word order approximating OSV (could be wrong). That doesn't even consider the lower levels of syntax, where one discusses what's known as "X-bar theory". X-bar theory uses representations of constituent phrases connected in various manners to develop a phrase structure tree that represents the syntax of a particular sentence or phrase. Thus a noun phrase (NP) has two branches from it, one being a specifier (Spec), the other being the intermediate projection of the NP (N', read "n-bar" for hysterical raisins). N' in turn projects a complement (Comp) and a noun (N). Syntax in one language, say English, will project the Spec to the left of NP, and the Comp to the right of N'. Thus the noun N is in the middle of the NP. This isn't true for all languages, they are free to choose whatever branching order they wish to have (dependent on certain Parameters which define particular instances of Principles, which I won't get in to).

    Another theory, Head-driven Phrase Structure Grammar says that every word projects its own dependent structure, and that the structure projected from a word in the lexicon must adjoin properly to other words projected from the lexicon to form grammatical sentences. This theory also takes into account some semantics issues as well, and is very popular amongst the Computational Linguistics and Natural Language Programming crowds, but isn't too popular amongst the older ranks of theoretical linguists. It too is language dependent in its structure of syntax, although very comprehensive syntaxes of certain languages have been developed with some success.

    That's just syntax. It's not easy. It's not very regular. It's very context sensitive. If anyone has written a compiler for any programming language they know how complex a language will get if you allow it to be context sensitive (instead of context free).

    Semantics, the meaning behind a particular word or phrase, is a ridiculously complicated problem in linguistic research. People have spent their entire lives researching it with little success, and at various times in the history of linguistics certain well-known demagogues have denounced the study of semantics in its entirety because it appeared to them to be too unfounded or scientifically reasonable. Chomsky to this day makes nasty comments about semanticians and is well-known for denouncing research into semantics because most work is not provably consistent in even restricted domains.

    Semantics is gnarly. It's weird. Researchers who work in semantics are said to get their more successful ideas from hallucinogenic chemicals. Semantics is a subdiscipline in which any random researcher can overturn the field with one paper, tossing out all of the research done previously -- and get away with it successfully. I don't mean to degrade the work of semanticians, and I'd love to join their ranks some day, but it must be admitted that much of semantic research and theory has a hard time standing up because it's in its infancy.

    Look carefully at the construction of a programming language compiler. It deals with what's known as a 'regular language'. This is a language that is known to follow certain rules consistently, and all special cases are well-defined (for most languages anyway ;-). The syntax of the language (the part you wrote in lex) is usually a bit simpler than the semantics (the part you wrote in yacc), if you examine the respective sources for complexity. Now consider the fact that for *any* human language the complexity of both of these tasks is exponentially (perhaps even factorially) more difficult. Since semantics is at least an order of magnitude more complex than syntax with respect to computer languages, one could imagine how bloody awful complex this is for a human language. Now consider that to make a translation requires *complete* semantic comprehension of both the source and target languages -- translation is not a simple word-for-word lookup table (and I'm glad it isn't -- we wouldn't have much expressibility and I wouldn't be able to write this if it were).

    To put all of this into perspective, consider a universal translator for computer languages -- what's it called? It's called a computer. So what do we call a universal translator for human languages? Surprise -- a human.
  • I usually switch back and forth, and sometimes even in the middle of a sentence without thinking about it. It's not a big deal when your friends understand all 3 languages, but if you only spoke one of the 3 or 4 languages you'd probably be lost.

    This brings up a problem with translations and learning new languages. IMO you don't really KNOW the language unless you THINK in that laguange. Doing a BabelFish in your head is not the same. This is exactly the problem I (and probably most adults) have learning new languages; we babelfish a new language instead of learning it.

  • What's so special about UNL? Theoretical translation of language A into a universal language and from there to language B is almost as old as "machine" translation itself. The fundamental argument is that it hasn't worked before so it isn't going to work now is stupid. It has been demonstrated how difficult it is to do this, but not that it is impossible.
    For a good example of the total and dismal failure of machine translation, try translating this text into French (or Spanish, or Italian, or whatever) with Babelfish and back to English. Then do it a few times. Then try English to Chinese and back a few times. Case closed.
    Hardly, Here's why that is not a valid test
    1. Babelfish doesn't use an intermediate language.
    2. Babelfish doesn't even achieve loseless translation from language A to B and back to A. This is the simplest case and one which can be improved the most with a good definition for UNL
    It is, in fact, an even better AI test than the Turing test.
    They do not claim perfect translation, but yes computer which could translate between languages and do it perfectly would pass the test. Do you really argue that it is impossible for computer programs to ever pass the turing test? It is only a matter of time till this happens. The only way to stop it is to stop making computers.
    Frankly, would you trust somthing as big, bureaucratic and inefficient as the UN to determine the next standard in machine translation?
    This could be a real concern. You have to hope that once the UNL is defined you could extend it for your own purposes and still have every thing work.
    Finally, I have some friends who work at the UN as official translators, and they are doing perfectly fine, thank you very much (and, I should manking some serious money). Why? Because, AFAIK, no machine has ever been able to translate perfectly
    Here we are reading /. At the very heart of the cutting edge. Isn't it obvious to all us that the only thing we know we can expect in the next few decades are massive amounts of change? I wouldn't expect your friends to be out of work any time soon. But isn't the job of a professional translator radically different now than it would have been 100 yrs ago? Political change was not the only thing that caused this change... communication technology has had a big role.
    Machine translation has its place, but only on documents of a very limited scope/vocabulary and of a very repetitive and technical nature. Even then, a human translator is needed to correct the multiple mistakes made by the machine.
    Do you honestly believe this is the best possible solution? That machines can't get better?
  • Original

    When you look at existing technology, like Babelfish at Altavista, you see that the 'devil in the details' might be more of a 'great satan' than one might think. I'm not sure you can have any kind of accurate translation without a human acting as a filter for meaning. Its easy to apply some rules to a metta language interpreter, but using it in discourse would probably create quite a bit of ambiguity. Just look at this translation if you don't believe me.

    English to German and Back

    If you regard available technology, like Babelfish with Alta Vista, you see you that the ' devil in the power of the details could think much more from a large satan than one. I am not safe you can type exact translation without human serve as a filter for meaning to have. Its easy, some guidelines to more mettasprachinterpreter to apply but at using it in the statement would probably create much ambiguity. Straight lines view of this translation, if you do not believe me.

    English to French and Back

    When you look at existing technology, like Babelfish at Altavista, you see that the ' devil in the force of the details much more than one great Satan which one A could think. I am not sure you then not to have any kind of precise translation without acting human as a filter for the significance. Its easy to apply some rules to an interpreter of language of metta, but to use it in the speech would probably create ambiguity much. Glance right with this translation if you do not believe me.

    and my personal favorite....

    English to Portuguese and Back

    When you look at it existing technology, as Babelfish in Altavista, sees that ' the devil in the power of the one details much more satan great of that one could think. I am not certain you I can have no type of the accurate translation without acting human as a filter for meaning. Its easy one to apply some rulers to an interpreter of the language of metta, but to use it in the speech would create probably the ambiguity sufficient. To look at just in this translation if you not to believe me.

    Need I say more?

  • The UNL will be inconsistent as a few of messages has already pointed out.

    Moreover, is this suppose to be the project of some freshman? The web page is messed up; there are lots of errors. One of the lines says "How to joint the UNL Community" on page http://www.unl.ias.unu.edu/eng/unlhp-e. html [unu.edu]. I find a few by just looking at it. I think the people who are responsible for ths do not even care. The pages are poorly coded (made by some win9x program) and pictures look distorted. They did not even give an explanation of how will it be done.

    <!--#include virtual="disclaimer"-->
  • So what? They're not trying to translate television. They're *trying* to translate "Legal papers [and] UN treaties".

    So it only works for boring documents. They're plenty happy with that.
  • What's the difference?

    My apologies to all you fifth-graders out there, sorry.
  • The question is simple -- will it work better than babel? Babel sucks, sure, but it's sure useful when I'm know something has the information I want, but don't speak the language. For that sort of thing, I suspect that this would be very useful.

    And if/when its use becomes widespread people might start writing to the meta-language. Not writing in it, necessarily, but, for example, being explicit on things that would confuse it. If that happens, then it really would work.
  • by TheCodeMaster ( 101307 ) on Wednesday October 13, 1999 @06:30AM (#1616489)
    I'd think it would be difficult to make an abstracted meta-language out of human languages. There's lots of grammatical issues which would be particularly difficult to deal with well.

    For example, in the case of inflected languages, how do you get the declensional case information into the metalanguage? In many languages, there are grammatical cases have overlapping declensions, so there's ambiguity about what would be intended with meaning. And mapping between languages would be really tough.

    Verbs would be really tough. Like in Russian, you have three tenses (past, present, and future) as well as two verb aspects. So you have pairs of verbs, one expressing action that occurs once, the other expressing habitual activity.

    Sounds like the project would be lots of fun to work on, though. It's a really neat idea, linguistically.


  • It seems to me that the strength of the meta-language will be the entire strength of this system. The question is, will the meta-language be skewed towards one language (*cough* English *cough*) or will they manage to create a language that does not impose biases toward one language.

    Overall, I agree strongly with the idea. From a testing standpoint, with the development of an effective meta-language, all one would need to do test the translation for the most part is go from language x->meta language->language x. If that works, than presumably the meta language did not slaughter language x.

    One question I have is how the language engine will handle words it does not know--or, more likely, abbreviations, misspellings, and slang. From what I've gathered, this is where other translators fail. If the translator doesn't understand half the sentence, than it generally has too much trouble finding context for the rest for anything to make sense. Just a thought.

    -Keelor

  • We already have a "Universal Language." It's called English.

    I'm not trying to be facetious; I'm not saying English is better than other languages; and I'm not saying that English will serve you best, or even tolerably well in all places; but it is an inevitable conclusion you must come to after spending any reasonable length of time abroad: if there is anything resembling a universal language in this world, it's English.

    English is already a lingua franca in technical and many academic fields. Many universities in non-English-speaking countries actually demand that graduate students write their theses in English, because that is the best way to ensure its diffusion. Some such schools even conduct their classes themselves in English.

    The Hollywood movie industry has also no doubt played a large part in helping to making English (not to mention Western culture) palatable and popular the world over. Dubbed versions of films are hardly ever as popular as subtitled ones (exception: kiddie films).

    Is English the best choice for a universal language? Definitely not from the point of being easy to learn. Esperanto would be much better. But realistically Esperanto doesn't have a chance. If English ever encounters a contender, it will probably be Chinese, if only because 1/5 of the planet speaks the language.

    BH

  • by Urmane ( 2213 ) on Wednesday October 13, 1999 @06:33AM (#1616520) Homepage
    I'm a little confused ... does "Universal Networking Language" mean Esperanto or TCP/IP?
  • Seems to me that it will be easier to write a translator from your native language to a very well defined and documented intermediate language than trying to understand the fine details of a non-native language.

    Though I know nothing about natural language parsing and translation, it seems to me that, from a software engineering point of view, translating from a spoken language into the metalanguage will be the hardest part of the exercise. This will be especially true with languages such as English that have inconsistent grammar and more than one way to do everything (makes you wonder if English is the spoken equivalent of Perl :-)

    Once you've got your text translated into the regular, simple metalanguage, it should be an easier task to convert it into a natural language than conversion to the metalanguage was.

    (Incidentally, the parallel with Star Trek's universal translator was a good one. In Star Trek, outgoing communications from Starfleet vessels are translated into a metalanguage called Linguacode which is supposedly easier for the alien's translation computer to process.)

    -Stephen

  • by Noryungi ( 70322 ) on Thursday October 14, 1999 @01:49AM (#1616529) Homepage Journal
    OK, here are some more answers.

    Watch out this is very, very long...

    Don't think about it as "automatic" translation, it's much more likely to work out as semi-automatic. I expect that the process would be something like this:

    1.Run automatic converter from natural language to intermediate.
    2.Have an expert in the intermediate language review the translation.
    3.Run automatic converters to the target natural languages.
    4.Have linguists review the output.

    Compare and contrast with a "traditional" translation process:

    1. Ask a translator to translate from language "A" to target "B". Ideally, the person in charge of the translation should be fluent in language A, a native speaker of B and have at least basic knowledge of the subject at hand (for instance: Open Source).

    2. Ask a linguist, (ideally fluent in language A, native speaker of B, etc.) to review the translation produced at step 1.

    The point is that the intermediate language should be designed to be free of the ambiguities that plague language translation.

    And how exactly can you do this? Either your intermediate language is "limited" (that is to say: misses many of the subtleties of the original language), which eases step #1 but certainly introduces many errors down the line. Or, it is an "advanced" language, that is able to translate many of the finer point of your "start" language -- but then, the interesting thing is the translation engine itself. Not the intermediate language. If your translation engine is good enough to translate, say, Spanish into UNL with little/no loss of meaning, it is also good enough to translate Spanish to English with no intermediate step!! If this is true, what's the point of UNL.

    Another point is, how can you be an expert in an "intermediate language"? Either the language is "human-readable", but probably produces an output compared to sludge and correcting this sludge may introduce additional errors. Not to mention the pain it represents to check on something that borders on the unreadable. Or it is machine readable -- but in that case, who is going to read it?

    Final point is productivity: using UNL, computers and machine translation may take longer than a simple translation "by hand" with human grey matter. A Windoze95 machine with MS Word and some good "paper" or digital dictionaries is, in many cases, more efficient and cheaper than going through the pain of machine translation.

    The hope is to minimize or eliminate step (4).

    Good luck! Frankly, this has been the "Holy Grail" of machine translation ever since it started. And I do not think we are any closer. So, far, every large, international institution that I am aware of (UN, UNESCO, EU Commission, EU Parliament, NATO, IMF, etc) either use tons of translators or have standardized on a couple of languages at most (English being, of course, the "Lingua Franca"). All the large international institutions mentioned above that use machine translations ahve discovered that, even on simple subjects, the 4th step you describe above is the one that consumes the largest time.

    It would be a big win if you could get to the point where all the hard stuff is done just *once* instead of repeated over and over again for all of your target languages.

    Again, this is the "Holy Grail" of machine translation. I don't believe that we are any closer to it than we were, say, 30 years ago. At least not judging from the output of some of the software available out there...

    And no, this will not work for poetry or humor, but there's no good way to translate poetry and humor in any case. The idea would be to get it to work with technical, legal, and business language.

    Sorry to say this, but this does not work very well either for legal or technical language. It may work with Business, since PHBs are so limited intellectually =). Legal translation can be horrendous: I have translated many legal documents in the past and I can tell you there is nothing worse than that, because legal terms are incredibly complicated and old-fashioned and also since legal trivia has to be rendered in a very exact manner. Legal terminology (in almost every language) is one of the most confusing and complicated one. Plus, lawyers and legal people are a major pain in the neck when it comes to Once you get the terminology right, I agree the rest of a legal document is usually a matter of "filling the blanks". But getting the legal terms right is enough to drive you nuts.

    Technical translation is another problem: I think some technical areas may be the best bet for machine translation yet. The problem, as far as the technical field is concerned, is that in fast-moving areas (computer science is one) the technical vocabulary is changing and evolving so fast it's hard to keep up pace. I read up to 5 computer magazines a week (not to mention a daily dose of Slashdot =) just to keep up-to-date with the latest evolution in language and technology. Keeping a UNL database of terms and translation could prove to be a daunting task...

    >What's so special about UNL? Theoretical translation of language A into a universal language and from there to language B is almost as old as "machine" translation itself.

    The fundamental argument is that it hasn't worked before so it isn't going to work now is stupid. It has been demonstrated how difficult it is to do this, but not that it is impossible.

    Please note that I never said (in the sentence you quote above) that this is not going to work. I just said that, as far as I am concerned, using an "intermediate" language is old news. This may be a new and interesting idea to you, but, frankly, for someone who has worked in translation, you could very well trace back this concept all the way to Volapuk and Esperanto. And these two were invented in the 19th century.

    As far as I am concerned, I think you could prove that correct translation is impossible. All you would have to prove is that a "human" language is a chaotic complex system, which usually follows unpredictable rules and has several strange attractors, inducing a runaway complexity.

    Case in point: English. Roots: Saxon dialects, Norman dialects, Old English and Old French. Latin. A little bit of Greek. Maybe German and Old Dutch. Evolution influenced by French and a myriad of other languages. Now divided into several branches (US English, British English, Irish English, Australian English, Indian English, International English), all of them influencing each other and countless other languages. Reducing the English language to a set of neat little equations and computer routines is left as an exercice to the reader... =)

    Please understand me: computer translation of "basic" English into UNL and from there into Chinese, French, Spanish, Italian, Japanese, etc... is no big deal. Computer translation of highly technical/scientific papers may be achieved. But even then, due to the inherent complexity of English (or any other human language), a human will have to review the machine translation and correct it.

    I therefore suppose that perfect translation does not exist (or is impossible). Translation (like programming) is an art, not a science. You can have a certain number of "artistic" rules, but you cannot have a "perfect", scientifically proven, solution.

    Example: give a problem to be solved to two good programmers, and they'll probably come up with two different and equally valid solutions. Which solution you pick has to be determined by other factors (speed of implementation, maintenance and evolution of the system, optimization, resources used, etc).

    Give a translation to be done to two good translators and they will probably come up with two rather different and equally valid translations. Which one you pick is then determined by other factors (length of translation, speed of said translators, price of translation, style, etc). Complex systems, like languages, cannot be reduced or predicted. They can be analyzed and more or less "solved" -- the quality of the solution being dependent on many factors, such as the experience of the specialist, his choice of tools, etc. This is true even in reductive or limited systems, where, for instance, the vocabulary to used is small (see technical translation above).

    Remember the butterfly in Brazil that creates a storm at the other end of the world? I suspect translation (especially multiple language translation) may well be the kind of complex system that is so hard to solve using computers.

    Perfect translation, like perfect programming, is only possible in a very limited scope. A "DO ... UNTIL" loop is the perfect solution for certain problems, and "dinero" is the perfect translation of "money" into Spanish. A TCP/IP stack, no matter which OS it is running on, will always have some sort of ACK/NACK test. But these are all very limited examples.

    >For a good example of the total and dismal failure of machine translation,
    >try translating this text into French (or Spanish, or Italian, or whatever)
    >with Babelfish and back to English. Then do it a few times. Then try
    >English to Chinese and back a few times. Case closed.

    Hardly, Here's why that is not a valid test

    1.Babelfish doesn't use an intermediate language.
    2.Babelfish doesn't even achieve loseless translation from
    language A to B and back to A. This is the simplest case and
    one which can be improved the most with a good definition for UNL.


    Answers:

    1. A intermediate language should introduce even more bugs into Babelfish translation. See above.
    2. "Lossless" translation is impossible. See above. Complex systems, such as human languages, cannot be reduced easily to a set of equations.

    >It is, in fact, an even better AI test than the Turing test.

    They do not claim perfect translation, but yes computer which could translate between languages and do it perfectly would pass the test. Do you really argue that it is impossible for computer programs to ever pass the turing test? It is only a matter of time till this happens. The only way to stop it is to stop making computers.

    Actually, I thought a computer had managed to recently pass the Turing Test, or some limited version of it. Anyone out there could supply information on this one?

    But: I don't think the Turing test is actually a very good AI test. There is a huge difference between a program that is able to "talk" to you (parrot back what you said) and one which is able to understand you. A computer able to understand human language would probably be the first real AI on this planet. Most Turing test software are based on some variation of Eliza, and this has been around for ages.

    Here we are reading /. At the very heart of the cutting edge. (some text removed) I wouldn't expect your friends to be out of work any time soon. But isn't the job of a professional translator radically different now than it would have been 100 yrs ago? Political change was not the only thing that caused this change... communication technology has had a big role.

    Well, this may be surprising to you, but the work of a professional translator has not evolved very much. Computer and communication technologies have eased their task a lot. Like many other professions, translators are now able to work from home, access the Internet and its wealth of information, send documents to clients by e-mail, and even use some very clever software that ease the translation process (TM/2, Trados, etc).

    Word processing, in particular, certainly is the best thing to happen to translators since sliced bread =). Also, I agree that many new translation fields have been added in the past century: biology, computer science, aerospace, etc.

    But the central fact remains this: to be a translator you have to be fluent in (at least) one language, a native speaker of another, and have a good expertise in one or more field of human activity. That's it. Oh, and you have to have a certain "talent" with languages, just like you need to have a certain "talent" for programming. It's an art, remember? Even the best-trained translator is worth 0 if he/she does not have that special "talent". Exactly like a lot of people work on Linux -- but there is only one Linus Torvald. =)

    We may translate faster, have more tools and information at our disposal, and produce better-looking documents -- but the core skills remain the same and the work process is exactly the same. You could train a translator today in the exact same way they were trained 100 years ago: with a pen and a piece of paper. Sorry to disappoint you, but Computer technology is not always the perfect solution it prides itself to be...

    That's All Folks!
  • by Tet ( 2721 ) <(ku.oc.enydartsa) (ta) (todhsals)> on Wednesday October 13, 1999 @06:37AM (#1616540) Homepage Journal
    For a project that's supposed to allow effective communication, they could at least have designed a web site that works well in all browsers. No alt attributes for images... Sigh. Those of us using lynx just have guess, based on the image names :-(
  • I will admit to not having read all of the UN documentation, but what I can tell about it from what I have read, they are attempting to create a abstraction of language in general.

    Although this is an interesting idea, it makes an assumption that all language is based off of one abstract "map". IMHO different languages have different maps. Having spent a fair ammount of time learning ancient greek in high school and college, I can say that the map for that language is quite different from english, and those are both Indo-European languages.

    The concepts that exist in one language may not in many other languages, which is often very problematic. Eventually, to learn any language, you must actually just start thinking in it, and not doing translation to your native language. Contemplating the 3 voices in greek (active, passive, and middle) is something I rather enjoy doing, as it is very foreign to english.

    I am just afraid that they will have to produce a Least Common Denominator language which won't be useful for anything beyond technical specifications and instructions. I will have to admit that that would be useful on many fronts, but may not be the dream that we were all hoping for.
  • I think this is a very neat idea. My worry is, who will patent the technology first and screw the world.

    Amazon does it with ecommerce 1-click. Microsoft does it with style sheets. Hell, if its a good.. Interesting technology why not, lets take it and pantent it to death! Then we can charge everyone for it and make a zillion-and-one dollars. Perhaps I should send in my application today!

    There needs to be limits on patents. Yes, I believe they do foster invention, but they also can stop community work on a really-good-thing.

    Perhaps a community-patent-agency and a easy, low cost effort to setup patents that are held by some sort of group for the explicit reasoning of keeping some basic ideas *free* for us geeks and the rest of the world.

    Really, it shouldn't have to come down to this tho. But someone will patent the implementation of this and we will all be screwed.


    My $0.02
  • At least, not in general. Regional expressions, local terminology, written accents, cultural mannerisms, and all sorts of other fiddly details, might not HAVE a direct translation, into the meta-language OR into any other language.

    Yorkshire, UK, for example, still uses "thee" and "thou". If you translate this into some kind of meta-language, it's either going to barf, or lose details. Those details may be important to meaning. God only knows how it'd cope with Cockney slang, or even common phrases (eg: "from the horse's mouth", "a sticky wicket")

    As I see it, this can ONLY work for formalised documents, using a formalised subset of the various languages. eg: Legal papers, UN treaties, etc. It'll NEVER work with informal, written language.

  • I will be difficult, but I don't think for those reasons:

    Verb tenses are not the problem. Every language can express every tense, just in a different way. Hard yes, impossible no.

    Additionally, approximations work well enough. Ex. Most English readers couldn't tell you the difference between past tense and preterite(sp?) tense.

    Grammar is easily defined. 90% of language could be described in a BNF. adv-adj-noun in one, noun-adj-adv in another. So what. That is probably the simplest part.


    My interest would be in the meta-language design. Words by number? string? Grammar by parsing into a std format, or classifying each word? Are there multiple ways to organize a statement? What about this "word hierchy" they talk about. Quite cool there.
  • This allows the semantic extraction to be MUCH more computationally intensive than systems like babelfish can afford. When you make a document, it's okay to spend an extra 15 seconds to extract a pretty good representation of the gist of it, so long as it doesn't need to happen every time the page is viewed. (babelfish doesn't even cache translations, does it?)

    Okay, so some of the idioms and convoluted sentences will be improperly converted, and will need some manual tweaking. Hopefully this system will allow this tweaking to take place. By providing multiple different conversions back into the author's native tongue, they may be able to see some of the translational oversights, and fix them.

    This won't be good for poetry, but will allow people who only know one language (English speakers seem more likely to fit this category than other people) to publish documents readable by people who do not speak English - that's a substantial breakthrough.

    It would be nice if this standard would allow segments to be set to fixed translations, so that if I really wanted the English to read a particular way, I could enforce that particular idiom, without loss of generality. ("Normally translate 'it has a low probability' but if you ARE translating to english, substitute the literal string 'fat chance'")

  • by jilles ( 20976 ) on Wednesday October 13, 1999 @06:51AM (#1616575) Homepage
    Though at first sight the idea of translating to an intermediate language seems interesting, I can't help but note that similar projects in europe have all failed so far.

    Automatic translation between languages in the EU is something that could save a lot of money. So there have been a lot of research projects funded with loads of EU money to accomplish this. All of these projects have failed (as far as I know).

    This seems to be a similar effort, this time by the UN which is an equally burocratic organization. I think the goal of this project is probably too ambitious to work. Even translations between two related languages (english and german)are troublesome (babelfish for example is not exactly perfect), so I can't see why translations to an intermediate language would change things (ever tried to do that in babelfish? the result is not pretty).

    So, it will probably fail and loads of money will be wasted on it.
  • Imagine the translation errors when people spell words like "you're" "lose" "its" and "too" wrong.
    A holy war could be started because of the sentence:
    Your wife is going to lose, but you will win.

    If it's spelled incorrectly.
    Use your imagination.
  • by Noryungi ( 70322 ) on Wednesday October 13, 1999 @07:01AM (#1616592) Homepage Journal
    All right, all right, all right...

    Several points -- for full disclosure, let me just state that I am a localization engineer, with a 5+ years of experience in software localization (read: adaptation into different languages) and a 7+ years experience in translation. If that does not makes me qualified to comment on this, I don't know what does.

    • First of all, I do not really believe the UN can produce anything remotely interesting, technically speaking. I like the IETF motto: "we believe in rough consensus, and working code". Show me the money^H^H^H^H^Hcode first, please.
    • What's so special about UNL? Theoretical translation of language A into a universal language and from there to language B is almost as old as "machine" translation itself. As far as I remember, early EU research into machine translation were based on a similar idea -- and they were dismissed as a failure.
    • For a good example of the total and dismal failure of machine translation, try translating this text into French (or Spanish, or Italian, or whatever) with Babelfish and back to English. Then do it a few times. Then try English to Chinese and back a few times. Case closed.
    • People, Star Trek is nothing but TV! Don't misunderstand me: I love spending an evening with Cap't Kirk and Mr Spock, but this not reality! The Universal Translator is, in my opinion, a perfect (read: impossible) dream. It is, in fact, an even better AI test than the Turing test. The day a computer can perfectly translate a text from language A to language B is (a) the day I'll be out of a job and (b) the day I'll begin to seriously worry about that glowing red camera and calm voice saying: "Would you like a nice game of chess, Dave?".
    • Frankly, would you trust somthing as big, bureaucratic and inefficient as the UN to determine the next standard in machine translation?
    • Finally, I have some friends who work at the UN as official translators, and they are doing perfectly fine, thank you very much (and, I should manking some serious money). Why? Because, AFAIK, no machine has ever been able to translate perfectly the multiple meanings, subtle changes in context, double-entendre, puns, cultural and historical framework, regionalisms, etc. that exist in every language on this Earth. Call it the "Curse of Babel" if you will, but a human brain is, and will remain for a long time the best translation machine there is. Machine translation has its place, but only on documents of a very limited scope/vocabulary and of a very repetitive and technical nature. Even then, a human translator is needed to correct the multiple mistakes made by the machine.


    Of course, I may be completely wrong and UNL may be the next best thing since sliced bread. But I doubt it.

  • This [slashdot.org] discusses a similar project...

    Wow. I've been on /. longer than I thought, to have remembered to look for that...
  • "enconverter software"? "Inter-Net"?

    Are there any actual computer scientists or linguists involved with this project? Their web site looks like it's either a team of bureaucrats or fifth-graders.
  • Let me just add something to the above, since I haven't made myself clear in what I have said in the above.

    In German it is possible to use the definite article to refer back to something used in the previous sentence, rather like `it' in English: but with the crucial distinction that what we refer back to must be of matching gender. So if a masculine, feminine and neuter word occur in the sentence it is possible to refer to any of them with the `it'. This ability to refer on the basis of gender must be captured in our syntactic model. Similarly the case system allows one to have multiple indirect objects (one accusative, one datave and one genetive, for example) directly attached to a verb, where in english one would use a preposition.
  • Assuming this scheme can work and they can map some subset of all languages to one another, the result won't be terribly pretty to use. The whole idea behind having different languages is to express cultural diversity - like the old adage that eskimos have 11 different words for snow. There's no way a universal language could capture that level of subtle differentiation.

    On the other hand, they just might be able to come up with a way to map a small subset of natural language, computer-speak for example, for the purpose of easing the creating of internationalized apps and making web sites more navigable. But I don't see how this could be successful in a general case.

  • by Hasdi Hashim ( 17383 ) on Wednesday October 13, 1999 @07:26AM (#1616625) Homepage
    For those of you who think this is impossible because of the variations between languages, Noam Chomsky has something to say to you. I was exposed to his idea back in formal languages and automata class. Basically, his argument is that we have universal grammar (UG) parser built within us when we are born. We 'hardened' the parameters to the UG to conform to our prefered language. Sorta of like guile and perl where guile is a very expressive language but perl, while express less, can express the same thing in a more consise manner.

    Universal grammar is defined by Chomsky as ``the system of principles, conditions, and rules that are elements or properties of all human languages... the essence of human language'' [Chomsky, 1978].

    Thus, all languages that we are accustomed, English, Arabic, Malay, Japanese, and Chinese are special cases of a universal grammar. Chomsky and subsequent linguists are looking for those common elements of all languages.
    [www.hj.se]
    Universal grammar and the innateness hypothesis

    Universal Grammar in Prolog [nyu.edu]

    There are lots of discussion about this... see google [google.com].

    Hasdi
  • As far as I can tell (just guessing, though) there are 2 key differences between UNL and Esperanto:

    1) It's not Romance-based and thus won't be as Euro-centric and will thus probably translate Eastern languages better.

    2) It's designed as an intermediate language and not as a final end-user language. As far as I can tell, it could even be machine-readible and not speakable. In any case, it will not have as many constraints as a language like Esperanto that is designed for human speech.

    These are just my gueses. I don't know what kind of language they're actually trying to implement. (The website is skimpy on those details.)
  • by Millennium ( 2451 ) on Wednesday October 13, 1999 @07:10AM (#1616633) Homepage
    It's not going to work very well. The problem is that each language has its own nuances, and in many cases these don't translate very well into other languages. I'll use Japanese honorifics as an example. The list of them is relatively long ( -san, -sama, -kun, -chan, -sensei, -wa, and others). Simply by attaching one to the end of a person's name, I can make the same sentence express immoderate flattery or extreme derision. This can be translated in an extremely limited fashion to romance languages such as Spanish or French (by using familiar vs. formal form of address, but it's still limited). It doesn't translate into English at all (this is why I prefer subtitled anime; get the general meaning from the subtitles, and actually listen to the Japanese for the nuances). And, of course, you still have the problem of inflection not translating very well into written words. This makes English particularly unsuitable for network communications, actually, since so much meaning is left to inflection. What's the solution? I don't know. There probably isn't one. Even Esperanto isn't immune to this problem of losing meanings in translation. I don't think a "universal meta-language" is going to work, though.
  • Although this is an interesting idea, it makes an assumption that all language is based off of one abstract "map". IMHO different languages have different maps.

    Actually, the dominant paradigm in formal linguistics, generative grammar, implies that all languages are generated from one abstract "map", the so-called Universal Grammar. Now, actual grammars vary a lot, but the idea is that they can be generated from the Universal Grammar by tweaking various parameters. The main evidence for this is the specialized language-learning ability of human children, and particular evidence about how that ability works and doesn't.

    Now, as to whether this will make universal automated translation via a metalanguage possible, that depends a lot on the metalanguage. I envision the metalanguage looking a lot like "glosses" in syntax papers, rather than an actual language, so that you preserve all of the language features of the original in the metalanguage. The more languages the metalanguage is supposed to accomodate, the larger it will be.

    Even if the metalanguage is perfect for all the supported languages, there will be problems with idioms, probably with slang, and certainly with cultural concepts. But in general, how important those failings are will probably vary depending on the conversation. On the whole, I think that both the most enthusiastic and most critical posts I've seen in the comments to this article are underinformed.


    --
    The scalloped tatters of the King in Yellow must cover
    Yhtill forever. (R. W. Chambers, the King in Yellow)
  • AN Whitehead was obviosly not a linguist at all.

    In actuality English is perhaps the most complex language in modern use, for a number of reasons. It has by far the largest vocabulary; it takes root words from many many other laguages; it's rules of grammer are highly irregular; because of the introduction of printing before the great vowel shift the spoken form of English does not agree at all with the written from; the geographic spread is so large that several dialects pigdins and patois forms of English now exist. If you propose that we adopt English as the base language you are going to have to be very specific about WHAT English using what local idioms and rules.

    The richness and complexity of English is perhaps best exemplified by the richness of it's body of great literature and poetry where expression and level of meaning are best brought to form by a language that has a great richness of vocabulary and ability to express multiple levels of ideas in a single word. Of all the languages of the world there are three that clearly have great bodies of literature - Sanskrit, Greek, and yes, English.

  • Linkwa, pink dama, arf muzheek. Rintintinambulation. Alla da peepholes enda voold, enda looniverse, cargo a schlong ender hertz. Epp, dat schlog arf Unamunda.

    -Chris
  • Perhaps there is a universal grammer that is innate to humans. But what makes you think that you can implement it in a Turing Machine?

  • has, to my knowledge, only one tense. And no articles. And plural noted by saying the noun twice ("orang" is person, "orang-orang" is people).

    Needless to say, there isn't much poetry in Indonesian...

    -
    /. is like a steer's horns, a point here, a point there and a lot of bull in between.
  • I shall continue to believe in Compo, no matter what thee says, lad!

If you push the "extra ice" button on the soft drink vending machine, you won't get any ice. If you push the "no ice" button, you'll get ice, but no cup.

Working...