ICANN Mulling Multilingual URLs 213
griffjon writes "The Washington Post is reporting that ICANN is testing out fully multilingual domain names. These won't just be [non-western-language].com, but would have TLDs translated into other scripts, fixing annoyances for non-English speaking audiences. An example: 'Speakers of Hebrew, Arabic and any other language written from right to left must type half of the URL in one direction and the other half — the .com, .net or .org postscript — the opposite way.' Let's hope it goes better this time around: 'Next week's experiments use the domain name "example.test" translated into 11 languages. A previous model, however, used "hippopotamus" instead of "test." These plans went awry when an Israeli registrar realized the Hebrew word ICANN thought meant "hippopotamus" was an expletive and threatened to involve the Israeli government.'"
Multilingual URLs... (Score:5, Funny)
Re:Multilingual URLs... (Score:5, Funny)
Re:Multilingual URLs... (Score:5, Informative)
Re: (Score:2, Informative)
RFC 2606, Section 3. It's referenced at (where else) example.com [example.com].
Re: (Score:2)
I really wonder what they mistook it for....
Re: (Score:2)
Some actual facts (Score:2)
"The response was basically, 'I'm too busy. Go learn English."
That's about right. In the day ICANN was concentrating on trademark issues and the reasons it got to exist in the first place (new tlds, international domain names) were back benched. It's not like we didn't have laws against trademark infringment, but the trademark lobby wanted greater rights in cyberspace than it has in the real world,
Re:Some actual facts (Score:4, Interesting)
Well, lately I've been testing a lot of my old code in various UTF-8 environments, and I've been duly impressed by the fact that, as Ken intended, almost all the code "just works" with Arabic, Chinese, Japanese, etc.
It turns out that there's a simple explanation. If the code doesn't examine chars with bit 8 turned on, but just treats them as unexamined "data" (or letters if the code is trying to distinguish that way), then everything works right. The only time the code needs to actually look at non-ASCII characters' values are when the text is being rendered in physical form. And hardly any code ever actually does that. Almost all my code reads data from files and writes data to other files, but never does anything with the physical representation of the data. It passes the data to other programs for that.
A case in point: I was recently working on some multi-language HTML files, and I decided to try a fun test with CSS: I defined a whole lot of classes whose names were in Chinese. This made sense, since these classes were being used for pieces of the text that contained mostly Chinese characters, not counting things like spaces and punctuation. I tested the CSS using more than a dozen browsers that I have installed on my linux and OSX test machines. I was unable to find a single case where it didn't work. I even hunted down some Windows boxes and tested the files on IE6 and IE7; the worked fine (despite the well-known CSS incompatibilities in IE
Now, I don't think for a second that the writers of all those browsers spent time making sure that their code could handle UTF-8-encoded Chinese identifiers in CSS. I suspect that most of them never even considered the possibility. I'd bet that the code just takes anything that's not a significant character in CSS syntax, and tacitly treats it as a "letter". This is all it takes to make UTF-8 work correctly in this case.
I did mention this in a couple of browsers' newsgroups. The responses were basically of the form "Well, of course it works. Why wouldn't it? You don't need special code to handle charset=UTF-8, except for the rendering. You'd have to be a fairly incompetent programmer to write code that doesn't work correctly with UTF-8. Except for rendering."
I can hear people saying "but those browsers all need to render the text." Yeah, but the CSS routines don't render text. They parse the CSS input, and fill in fields in data structures that tell the rendering code how to position and color the text. But the charset-handling code is probably not called anywhere in the CSS modules; it's only called in the few places that actually need to color pixels on the screen.
Lots of people have suggested declaring UTF-8 to be the only encoding for URLs. If this is done, there's probably very little URL-handling code anywhere that needs to be changed; it'll mostly "just work", because char codes 0x800 to 0xFF are treated as "letters". The only question is whether the final step of rendering the text's pixels will produce the right glyph, and the URL-handling code doesn't care about that.
I happen to have a DNS server handy. Maybe I'll try a little test: In one of the domains, I'll add hostnames in Russian, Chinese, Arabic, and maybe a few other non-Roman alphabets. I'll wait a while, and see if I can access the machines via those names from a few other machines. I'll predict that it'll also "just work".
Re:Multilingual URLs... (Score:4, Informative)
Japanese writing has pretty much been converted to the western left to right style. Formal government documents and newspapers are written that way and in day-to-day life in Japan one will rarely encounter top to bottom writing except in traditional restaurants, certain stylized ads and museums. You actually encounter it less than outright English (English is very popular in ads see http://www.engrish.com/ [engrish.com] ), which few people read.
My brief trip to China seemed to indicate that they've done the same thing there.
It's not an issue.
Re: (Score:2, Informative)
Chinese text can also be seen written horizontally from right to left on some old signs and buildings. This comes from before horizontal writing was common and is actually a special case of vertical printing where there is room for only on
Domain name != URL (Score:5, Informative)
A URL is an entire address, including the protocol, local path and fragment identifier. This is a URL:
A domain name does not include the protocol, the local path or the fragment identifier. This is a domain name:
This is talking about domain names, not URLs. If anybody would talk about multilingual URLs, it would be the IETF, not ICANN, and they already have, they are called IRIs [ietf.org].
Re: (Score:2)
Re: (Score:2)
Re: (Score:3, Interesting)
Re: (Score:3, Funny)
Ah, but that's where you're wrong my friend. Like it or not, "1234.html" can be expa
Seriously (Score:3, Interesting)
Re: (Score:3, Interesting)
How to accommodate those?
Re: (Score:2, Funny)
How to accommodate those?
Rotate your screen 90 degrees...
Re: (Score:2)
Re: (Score:2)
For now, one line works fine for everyone.
Re: (Score:2)
Re: (Score:2)
Yes, it's flippant, but nothing compared to my solution for the article's dilemma:
ATTENTION ARAB- AND HEBREW- SPEAKING PEOPLES. We have fixed the internets. Please use the following protocols in all communications:
The ".net" domain is now "ten." ; ".com" is "moc." and so on.
The proper procedure for forming a URL is is: subpage, top-level-domain, domain, subdomain, then "//:ptth".
PLEASE USE THIS PROTOCOL AND ONLY THIS PROTOCOL IN YOUR FUTURE USE OF ALL OF THE INTE
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
To type a Ukranian URL with of course. Hopefully the webpage behind the URL will be in Gujarati so his friend in California can get one of his co-workers translate it for him.
Re: (Score:2)
If by "Japanese keyboard" you mean a keyboard with Japanese characters on the keycaps... There are two types, there's the huge monster kanji typewriter with something like a thousand keys that I don't think anyone knows how to use and kana keyboards which
Re: (Score:2)
It's a hard problem indeed, but you have to consider the foreigner's view. What they're essentially forced to do now is learn a second way of writing things in their own language and it is pretty annoying.
For example I went the other direction and learned some Japanese. I can read hiragana, katakana, and a few kanjis. The hiragana and katakana are equivalent writing systems but for different purposes. Katakana is usually reserved for foreign words or emphasis (sorta like how people sometimes use all capit
Re: (Score:2)
Seriously, multilingual domain names are a pain (for the whole humanity). Visiting japan, last year, I saw a lot of servers using japanish simplified language on it. As a foreigner, I hadn't the minimal idea about what the site was (without clicking on ot). Clicking on it didn't help either. Yes, a lot of japanese have the same problem with english domain names, but adding multilanguage names adds more complexity to the whole thing. I would like to see the face of a chinese guy trying to decrypt some URL using ukranian characters... or... trying to write it on his japanese keyboard...
English domain names will stay forever, there are way too many references to them.
The international domain name will be purchased in addition. I would buy 3 domains right now if they finally decide what to do.
Japanese and Chinese would be hard to understand, just imagine your nick (El Lobo) is some local language and without special chars, it reads as "Lobotimised Moron" in that language. Sounds extreme? There are WORSE situations than that which I better not tell.
These people pay the same price for domain
Re: (Score:2)
Re: (Score:2)
Well, uh, we could click (Score:5, Funny)
Maybe if I did a search for something, and the answer is in one of those "other" languages written by those "other" people, maybe I could somehow click some kind of--I don't know--maybe a representation of that site, using my rat or squirrel or whatever these new-fangled devices are called. Then of course I'd like to be able to save this transportation capability for future use; if only there were a way to save some kind of cyber-bookmark in my browser, to keep my place without having to type in all those funny characters ever again. I think I have some ideas, but I need to contact my patent attorney first.
Oh, no. Wait. I just thought of something bad. You know, when I actually get to this site, it's probably going to be really hard to understand what's written on the page. Funny squiggles and such. I suppose there's really just no reason for me to go to such a page, if I can't read it anyway, so why even bother? Plus "they" probably don't know anything good anyway, but there's always a chance that "they" might be more intelligent than we thought. If only there were some site that provided a service that could help me translate this page, then maybe, just maybe, I'd be Ok with allowing these foreign-speaking visitors to spread their native language like some kind of disease all over "my" Internet. If only...
Re: (Score:2)
The fellow above (humorously) mentioned that we should allow full i18n and let th
What word? (Score:2, Funny)
Re: (Score:2)
Re: (Score:2)
All this time, I thought a Water Buffalo was a horrible 1970's-vintage Suzuki motorcycle.
Re: (Score:2, Funny)
Re: (Score:2)
Winner: Dog (Score:2)
Re: (Score:3, Interesting)
Re: (Score:2)
I'm curious too, but, just a guess, I wonder if the word in question is "behema", which is actually "water buffalo", but is used to refer to loud, rowdy, uncouth people?
Re: (Score:2)
Though, to take offense at this seems to be crazy.
Re: (Score:2)
I share your curiosity. Sounds like someone, somewhere, had a hippo-potty-mouth.
Re: (Score:3, Funny)
Or at least, that's what I recall from 4th grade biology class.
More info here (Score:5, Funny)
!knil taht kcilc t'noD !GMO (Score:2, Funny)
Silly ICANN, don't insult the Israelis (Score:2)
I wonder what test translates to... I hope they hired a translator who doesn't like practical jokes.
I am registering (Score:5, Funny)
Re: (Score:2)
I cannot find where I've seen it...
Re: (Score:2, Funny)
I cannot find where I've seen it...
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
In United England, you kay dot!! (Score:2)
Bad Title (Score:2)
Re: (Score:2)
True. It should be "ICANN Hippopotamus Multilingual URLs".
About Time (Score:2)
One request I would have of ICANN is to limit the use of accented character to help prevent fishing scams.
Palindrome Domains (Score:2)
Analogy with the official language of Aviation (Score:2, Insightful)
Internationalization in software and operating systems is in a horrible state of excess
complexity right now. When everything top to bottom runs unicode UTF8 as its default
mode, then MAYBE.
But even then, there is a single language for Aviation communications (happens
to be English) but that is done so that there is some hope that everyone will know what
everyone is talking about, because everyone can learn the aviation sub
Damn them all... (Score:2)
overlap problem? (Score:2)
The original mistake was the address bar (Score:2)
Re: (Score:2)
LMAO (Score:2)
Re: (Score:2)
Of course - the RTL / LTR adds one new dimension, as does the different asian symbols, among them the hiragana, katakana and kanji character sets.
And just figure - all that easier for a lot of the obscure sites on the net when they can use some really odd characters combined with running IPv6 only. RIAA will probably get really worried...
Re: (Score:2, Insightful)
Re: (Score:2)
And don't forget what a boon this will be for phishers. Won't they be able to register domains like "ebaý.com" or "banköfamerica.com" or even names with characters that would look, to users of English-language browsers, identical to the real domains?
Re: (Score:2)
Browser writers might want to consider shipping with 8 bit character domain names as the default for a start.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
And remember, "Vibrant Local traditions' almost always mean "This is what we did to segregate us from others in the old day"
And way can't new things be vibrant? Whay can't there be new cuisine? Hell, we loss 200 languages a year, Big whoop-de-doo.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Why, when my name is written one way in my identity acts, I have to use another name on the Internet in my country?
Re: (Score:2)
Re:This negates the entire purpose of DNS (Score:4, Insightful)
Now, of course, most of these countries have their own issues about Internet connectivity and interoperability, but this at least is one less acceptable reason they behave that way.
That has nothing to do with language (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
There's no other set of characters (and their binary representations) that can be typed in by virtually anyone, anywhere. If you use anything else, you start locking people out based on the equipment they have.
Re: (Score:2)
If you're going to visit a web site written in Chinese, chances are you have a Chinese keyboard or the Chinese IME installed. Or you're following a link.
Not necessarily, though (Score:2)
Re: (Score:2)
"normal" keyboards is dependent on where you live. What may be "normal" in your region is most definitely not in others. My keyboard contains ääüß without shift or breaking a finger. Yours likely does not (but your backslash does not need you to press Alt Gr, you lucky SOB). Cyrillic keyboards contain mostly non-latin characters in their default setting. Don't get me started on Japanese and Chinese ones.
The point of DNS was never to make it easy for Y
"normal" keyboards (Score:2)
Really, is there a pressing need for änderungsaufträge.de to be distinct from anderungsauftrage.de?
That's not practical (Score:2)
ASCII is available to them (Score:2)
Re: (Score:2)
You've just proven my point (Score:2)
It has nothing to do with reading them (Score:2)
How are they going to use them... (Score:2)
Re: (Score:2)
You can type ASCII (Score:2)
No, they won't (Score:2)
I"m supporting a standard which can be easily implemented on all available hardware.
Re: (Score:2)
I am Brazilian and the language I speak (Portuguese, not Spanish) has lots of accented letters, namely á à ã â é ê í ó ô õ ú ü ç. We have no problem dropping the accents in most cases, it RARELY causes ambiguity. For example, consider a supermarket called "pão de açúcar". In the current system, they just registered www.paodeacucar.com.br, and this is good enough. Now, with this new system, how many permutations they
Re: (Score:2)
2nd+ level IDNs have existed for over 7 years - they are not new.
What is new is the TLD part, such as
Re: (Score:2)
Re:The "Balkanisation" of the Internet (Score:4, Insightful)
I agree that segregating the Internet into separate "internets" for particular countries is a bad idea; however, if other people want to have networks that operate in their native languages, who are we to tell them that they should stop that and be forced to use English instead? Wouldn't it be better to just make the Internet (the one that we have now, predominantly English) capable of supporting multiple languages, so that if and when people want to build networks in other languages, they're at least connectable to our internet, even if we can't type the domain names directly from our English keyboards? The alternatives are either making everyone build their networks in English, which WOULD be cultural imperialism, or ignoring the pressure for multilingual networks to the point that completely incompatible non-English alternatives spring up.
The world is already largely divided up by language. I doubt you (presumably a native English speaker in a predominantly English-speaking country) visit many Chinese websites written entirely in Chinese languages for Chinese speakers in China right now, even though their domains are written in 7-bit ASCII script like every other site on the Internet. This proposition won't make that any better, but it won't make it any worse either; and it holds the possibility of staving off the even worse alternative of completely separate, incompatible, non-ASCII "internets" springing up to meet the demands of these other peoples. At least with this multilingual system, an English site (with an ASCII domain) can link to a Chinese site (with a Hanzi domain). If China were to invent their own Hanzi-based DNS protocol, separate from our existing DNS protocol, not even that would be possible. Making our network multilingual actually prevents Balkanization more than it induces it.