Internationalized Domain Names Coming Soon 526
rduke15 writes "You think you know how to parse a domain name for validity? Well, in case you haven't noticed, things are getting tougher as registrars keep adopting IDN (Internationalized Domain Names), which uses a weird encoding named Punycode to enable accented characters in domain names. The Register reports about Switzerland, Germany and Austria's joint move to enable IDN. See the overview in English from Switch. But I guess it would be difficult to talk about this on /., since it does not even support basic Latin-1 ... :-)"
Ah great... (Score:5, Insightful)
Re:Ah great... (Score:3, Insightful)
Re:Ah great... (Score:4, Insightful)
Re:Ah great... (Score:3, Insightful)
Re:Ah great... (Score:3, Funny)
sounds like (Score:3, Insightful)
Unicode.org [unicode.org]
Re:sounds like (Score:2)
You RTFA (Score:5, Insightful)
This means that it can't possibly include ALL of the unicode spectrum, as Unicode supports far more than just 92 extra characters.
Also, the way the coding is going to work, you still can't register a name with B.
Re:You RTFA (Score:3, Informative)
I am aware that the German scharf s is not a capital B. I had it correctly in my submission, but someone who was working on the slashcode thought it would be a good idea to eliminate accents, rather than to possibly HTMLize them.
Try it yourself, put in an scharf s into a Slashdot comment, and see what happens.
I notice that you DIDN'T complain about the missing accent on the French e, or the missing slash thr
Isn't there a better way? (Score:5, Interesting)
Why not extend dns to support unicode? That way they'd be no translation or other crap to go through.
Granted software would need changing but that be the case with the mangled crap that's mentioned in the article.
What am I not understanding here? Or is this just implementation dreamed up to make life complicated?
Re:Isn't there a better way? (Score:2, Funny)
Accented characters are so Old World and passe, anyway.
Re:Isn't there a better way? (Score:2)
All this I18N and L10N is anti-globalization.
Re:Isn't there a better way? (Score:2, Interesting)
djbdns doesn't support unicode either, although it doesn't rely on standard c-libraries, so unicode support might only take a few weeks to add.
Unicode would be better than punycode, but punycode works with existing DNS client and server software.
It's not for trolls. (Score:4, Insightful)
djbdns is 8-bit clean. Use UTF-8 all you want right now.
Re:It's not for trolls. (Score:3, Insightful)
That's correct - no unicode codepoint apart from [FULL STOP] will cause a \x2E to appear in a UTF-8 stream. UTF-8 encodes the first 128 code points of Unicode using the identical ASCII values (which all have the eighth bit set to 0), and then only using combinations of the other 128 byte values (which all have the eighth bit set to 1) to encode every other character. It's very co
Re:It's not for trolls. (Score:3, Insightful)
UTF-8 only uses non-ascii values to produce non-ascii characters. That's one of the things that make it really neat, and easy to convert to. It also means that you jump into an UTF-8 stream at any point without getting out of sync and receiving trash. this makes it more powerfull than UTF-16.
Re:Isn't there a better way? (Score:3)
Re:Isn't there a better way? (Score:4, Insightful)
If Paul Vixie did say that it would kinda argue for chosing that route rather than trying to get the IETF to agree to anything, so far it has been over five years since the start of this effort and counting.
The real problem is not fixing Bind, that is easy. Deploying bind updates and deploying compatible client updates is the real problem. It just isn't feasible.
Re:Isn't there a better way? (Score:4, Interesting)
The registries using UTF-8 (most notably
The Swedish registry is only using IDN. The reason for that is that UTF-8 in DNS is not an internet supported standard at all.
http://www.xn--rksmrgs-5wao1o.se/ [xn--rksmrgs-5wao1o.se] will work if you are using a recend Mozilla. (Slashdot should upgrade to at least ISO-8859-1 or UTF-8... I couldn't write raksmorgas.se correctly.)
Microsoft are extremly slow in supporting IDN, and will probably not launch it until next OS release which is in 2006... There are plugins from Verisign.
Do a good thing, release an open source plugin for MSIE.
Backwards compatability (Score:4, Informative)
Sounds like a great idea.... If you're willing to re-implement the DNS code in my Win-95 box.... or on my Amiga-4000. How about my 10 year old Apollo workstation or the SUN-3 that's still working just fine, thank you. etc. etc.
A lot of old DNS implementations would choke (and properly so) on UTF-8 encoded DNS names. We probably could have seeded the needs of the future by saying that IP-6 DNS servers should support unicode, but I think that even that boat has been missed. (or is quickly leaving dock).
In the meantime the old DNS and it's anglo-centric presumptions and restrictions are with us for the next few years (or decades, as the case may be). Clearly some people feel the need to live within those restrictions.
Re:Isn't there a better way? (Score:3, Insightful)
It's not as simple as you may think. I am all for Unicode, but to use it for domain names can lead to unwanted consequences.
There already exists some intenationalized domain names in Chinese, so instead of having chopsticks.com we can have [insert chinese characters for chopsticks
I for one... (Score:3, Redundant)
I don't care what the issues are. I have had it up to HERE with charset issues! ENOUGH ALREADY!
If you can't do it using UTF-8, don't do it at all!
Dammit.
Re:Isn't there a better way? (Score:5, Insightful)
DNS should never get Unicode support, or any form of "internationalization" for that matter!
DNS is supposed to be a way for humans to communicate with computers about internet hosts. The intent is not for some human to be able to read it, but for all humans. This has worked until now because hostnames were limited to only ~37 characters. Regardless of native language, any computer operator can quickly learn to handle the [a-z][0-9] gylphs. Basically anyone literate in one language can copy ASCII characters from a signpost onto a notepad, and then punch those into a keyboard. Even if her culture doesn't use the ASCII set in normal daily activities (which about everyone in America, Europe, and Japan does), then the shapes are at least simple enough to copy geometrically.
But if 16-bit charsets are allowed in DNS, we could get hostnames composed of 3 Chinese characters and two Arabic ones, and which a Russian or Briton will be incapable of processing without tremendous pain.
DNS is something that should be left in a "lowest common denominator" form, so that it's accessible to all of humanity (if they meet the low hurdle of operating a normal PC)
Internationalized host identifiers in URLs will be important, of course. But they should be a separate layer implemented on top of DNS. DNS is a standard that already exists. Rather than changing the standard and breaking every single internet-using computer (the "flag day" scenario), a new system should be rolled out for people who want host identifiers in funny-looking squiggles.
Re:Isn't there a better way? (Score:5, Insightful)
So, to sum it up, you are right that current Unicode encodings will not meet current DNS RFCs, but the reason you gave wasn't quite right. Punycode does solve the problem, but ugh, punycode is an awful hack of a character encoding system. I'd hate to see it live on forever, but it might be useful getting us started on i18n-ified DNS.
Re:Isn't there a better way? (Score:5, Informative)
No. The problem that punycode solves is that the encoded DNS names are themselves valid RFC1034 DNS names. That is, even when encoded, standard DNS validity checkers will accept the name.
UTF-8 does not have this property
Server support (Score:2)
really dumb sounding (Score:5, Interesting)
Re:really dumb sounding (Score:3, Informative)
I think the *real* solution here is to reimplement ALL top level DNS servers to support unicode. But the overhead in doing this, when you really think about it, seems difficult (ICANN approval, unicode related bugs, getting everyone to use new DNS server, etc). At least, since the ASCII text supported by DNS are exactly the same in Unicode, backwards compatibility should not be a problem.
Why not (Score:4, Funny)
Just say the ascii number?
Re:Why not (Score:2, Informative)
Hint: ascii is 7-bit.
Re:Why not (Score:2, Funny)
Accented a: 0x61
"gze" sound: 0x670x7a0x65
That was easy!
Because (Score:2, Funny)
r&#
ic
o ty
like &#
is
Re:Because (Score:3, Funny)
r 21; diff
icu
o ty&; #112;e 
like & #1 16;h
is
An opportunity to quote one of my favorite bits of .sigfodder of all time:
Re:Because (Score:2)
Useful? Naw. (Score:4, Interesting)
I'm not sure what all the accents are on the alphabet, will I have to know to type them to access a simple website? Sorry, this doesn't make using the net easier.
Re:Useful? Naw. (Score:5, Insightful)
Who types URLs? (Score:3, Insightful)
Most users don't even *know* that you can type stuff in the Address field.
Re:Useful? Naw. (Score:3, Insightful)
What was your point again?
Re:Useful? Naw. (Score:2, Insightful)
Just as the moderator guideline says "focus on promoting instead of modding down
Re:Useful? Naw. (Score:4, Insightful)
Never fear, oh monolingual one, I found this very handy site [google.com] that will help solve this pesky problem for you. Try it some time and let us know what you think!
Re:Useful? Naw. (Score:3, Interesting)
Punycode *is* a Unicode encoding. (Score:5, Informative)
Punycode *is* a Unicode encoding.
Unicode has many encodings; UTF-8 is one encoding and Punycode is another. UTF-8 aims for efficiency when the majority of the text is ASCII, and Punycode aims for completeness when you must fit in 64 characters and use only the ASCII characters to do it.
Oh great... (Score:2, Funny)
The French will demand that "bandwidth exceeded" errors be renamed to "(web page) surrenders"
The Germans will try to take over the internet.
In a sneak attack, the Iraqis will launch a massive DDOS attack, but accidently hard-code localhost in the trojan. The Iraqi information guru will deny everything.
Re:Oh great... (Score:2, Funny)
Re:Oh great... (Score:2)
I kinda wonder when Germany will stop getting shit for the wars. We'll probably have to wait for the grandchildren of the combatants to die.
On the other hand, most american folks are OK with Japan these days... odd (nothing against the japanese... it's just that they did lotsa nasty stuff too. I guess it's because they aren't a military threat).
Americans are just tweaked at france cuz it didn't fall in line like Britain did. It'll pass.
Re:Oh great... (Score:2)
1) much of the world (and many americans) still look on germany with suspicion, despite all the efforts by germany to move away from its past. Hell germany is one of the U.S.'s strong allies.
2) Most of the world (excluding SE asia, with was most affected) doesn't harbor such feelings towards japan. Japan did really nasty things in
Taco, why did you remove the accents from slashdot (Score:5, Funny)
Taco est un mechant garcon.
'
Maybe not as useful as one might believe (Score:5, Interesting)
Unfortunate but true, if a company has a Chinese domain name, it would probably be only used within China, Taiwan, Hong Kong, Singapore, Japan (since it's unicode), and maybe South Korea. The company would be pretty much limited to the East Asia market.
However, I suppose the company could get both a Chinese domain and an English, or rather Pinyin, domain so they could make their Chinese, or maybe other Asian clients feel "closer" while also being able to reach clients outside of East Asia.
I also think that it'd be great to give people the option of having a native-language email address. It's not too hard to set up a romanized email alias for it. An SMTP "X-Roman-Address" header could even by added to outgoing messages in case a recipient can't read the default "From" line.
Re:Maybe not as useful as one might believe (Score:3, Interesting)
Yeah, they would "limit" themselves to the fastest growing economy in the world and a market of about 2 billion people...who'd want that?
P.S. Why can't that company have a chineese domain name and a roman-character domain name? Is there a law I don't know
Re:Maybe not as useful as one might believe (Score:2)
You solved the problem in your post; get two domains. Not too hard.
URLs that you cannot type (Score:3, Insightful)
I sure hope this harebrained idea doesn't take off.
Re:URLs that you cannot type (Score:4, Insightful)
URLs that you cannot type. But why would they want your hits if you can't even type their domain name? Its not like you'll be able to read the content if you get there, or understand their ads.
Re:URLs that you cannot type (Score:3, Interesting)
Or how about URLs you have to spell differently than you spell the name of the company in question? Thats a pretty harebraided idea, but one very many* people online today. Take for instance norwegians (as I happen to be one myself). The norwegian alphabet consists of 29 letters, the old 26 from latin (a-z) as well as three I can't show you here on /. since the site for some bizarre reason don't support them**. Therefore we're forced to use 'ae', 'oe' and 'aa'*** instead, opening for plenty more misundersta
Companies will shell out more to registrars now (Score:5, Insightful)
Microsoft.com
Microsoft.net
Microsoft.org
Mi
etc..
But also
Microsoft.com
Microsoft.com
Well, you get the picture.
Re:Companies will shell out more to registrars now (Score:2)
Re:Companies will shell out more to registrars now (Score:2)
Not as bad as DNS hijacking, but very similar outcome for the untrained eye.
Everytime I have to click on a link that takes me to Paypal, American Express, my bank's web site, my broker, etc ..., I make sure the URL is not spoofed. Unless there is a referer id you need to reuse in the link that is presented to you, it is always better to go to your famili
Babylon 5 (Score:4, Funny)
Reminds me of that Babylon 5 episode when they find a person named Zathras down on this planet. Ivanova thought she had been talking to Zathras:
"No, that was not Zathras, that was Zathras. There are 10 of us, all of family Zathras, each one named Zathras. Slight differences in how you pronounce. Zathras, Zathras, Zathras.. You are seeing now?" - Zathras, Babylon 5: Conflicts of Interest
IDN? Mozilla supports it (Score:3, Informative)
Mixed feelings (Score:5, Informative)
On the other hand, this is also quite convenient. I live in the US now, and I travel around quite a bit. I often surf on Swedish Internet sites, typically without access to a Swedish keyboard. It would not be very convenient if the domain names used non-English symbols.
Sometimes I go to Japanese sites also, and I am really glad that I don't have to install a Japanese word processor to do this...
Tor
Re:Mixed feelings (Score:2, Insightful)
And that's also why registrars love it.
Super Monkeys! (Score:5, Funny)
Any Internet RFC which includes the phrase, -with-SUPER-MONKEYS, has GOT to be good. (And in case you think I'm trolling, check the link.)
USA! (Score:2, Funny)
If it wasn't for us we'd all be speaking German. Wait.
[ducks]
Why so late? (Score:2)
Subject to Approval (Score:3, Funny)
Taking 1337-speek to a new level (Score:3, Informative)
That last one would be doubly good, because if I understand the Punycode [faqs.org] spec correctly, it'll get translated to ASCII as dixiehicks-XXXX.com. Not my opinion of the group, but maybe it would attract hits from the Toby Keith crowd.
it works fine on /. (Score:5, Funny)
I, for one, welcome our new European overlords.
Same character in different character sets (Score:2)
No change needed... (Score:5, Informative)
Yes, I do, and if you _read_ the RFC you'll see that nothing changes, these domain names are encoded into the same character set as the current DNS system. And hence if you give me a URL I can validate it with existing scripts. There's an example which shows that Bucher.ch (with an umlaut on the u) would be translated to: xn--bcher-kva.ch which looks totally parseable to me.
John.
I can't wait (Score:5, Funny)
Reason (Score:3, Insightful)
Since this solution doesn't break any old implementation just the countries that need it will have to modify their software, and not wait for the slow and expensive process of changing all of DNS, which a large part of the 'net isn't motivated do pay for.
Re:Reason (Score:2)
Plus the guy uses Amuro Namie in one of his examples, so he's gotta be cool.
Just use Google (Score:3, Insightful)
Often used url's I have as book marks and when i need some other site, it is much easier to make a guess via Google. What I am looking for is almost always on page one of googles choices.
Sure Google could find a way to handle the special characters and make an intelligent suggestion, if nothing else based on IP address of the request. If it is from Burundi chances of needing a German umlaut is slim
similar domain registrations... (Score:2)
Re:similar domain registrations... (Score:2)
Wrong way on a one-way track... (Score:2, Insightful)
In the interest of fostering the best method to communicate your ideas, products, services, etc., would you not want to use the characters that most everybody can t
Sorry, but this is really stupid... (Score:2, Insightful)
This isnt going to be abused, AT ALL. Worst idea ever.
The Internet (domain names, top-tier nameservers, nameserver software, web and e-mail server software, all markup documents) runs on english, there's no way to i18n it without opening up a world of hurt. Sorry, but I don't want to have to upgrade BIND
Re:Sorry, but this is really stupid... (Score:2, Informative)
Re:Sorry, but this is really stupid... (Score:5, Insightful)
Anyway, the current infrastructure DOES NO have to be updated and this change is NOT intended to be "some jagoff's playground", but rather for the non-English speaking people - there are quite a few of them.
Well... it's still not perfect (Score:4, Interesting)
Well, what about the
This is the reasoning I've heard, as to why IBM is ai-bi-emu in Japan. And maikurosofuto, souni, etc. (roomaji transliteration there, sorry if you don't get why ai=I)
So what do you do in this case? Unless they can enter Shift-JIS or Unicode URLs, then you're stuck having people enter roomaji versions of your name, which remember, aren't technically trademarkable.
I'd love to hear I'm wrong on some point here, could anyone with more info clue me in?
Well, it had to happen sometime...I guess (Score:2, Insightful)
am I the only one who sees this (Score:2)
This is a step in a direction I dont think we want to go. Imagine if this goes through, if you will. What will follow?
Next you're going to hear about programming languages being developed in other languages. Think outsourcing to india is so great? Wait til your next batch of outsourced code cannot be read, because it's not in english anymore!
One of the things about computing has been the language standardization. Sure, you can do things in other languages, bu
programs programmed in foreign languages (Score:2)
You've obviously never seen a program coded in Perl. *cringe*
Use utf-8 instead of 'punycode'. (Score:2)
Why not put English as IETF standard? (Score:2)
If http can be a standard, xml can be a standard, posix can be a standard, why stop there? Why not have english be the standard too? If developers have to wade through the confused bable that is the W3C recommendations, then certainly the rest of the world can drop their own native languages just as surely as we drop our own native implementations of rendering and networking engines.
English as the world language is surely as efficient as a single standards based unix as a world operating system.
Can we have punctuation while we are at it? (Score:2)
Is there really any reason to continue to disallow things like:
10%.com
10off.com
#dot.org
and most importantly Andy_R.com
while allowing motorhead.com to have their umlauts?
Re:Can we have punctuation while we are at it? (Score:3, Informative)
There are a couple others, but I don't remember them offhand... So in other words, these characters are unusable for a reason.
It's very simple to divide the world (Score:2)
*.Dar-al-Kufr
*.Dar-al-Harb
What more do you need? :-)
Compatibility question (Score:3, Interesting)
I'm asking because today, I've tried out the Netsol [verisign.com] way of doing umlauts and they don't work at all with my Mac OS X and Safari: None of the listed domains work. The page lists a "plugin" that every web user is supposed to install, but it's Win only (of course...) and it's quite silly to have a domain with umlauts if you have to tell all your customers "before visiting me, please install this plugin"...
Any idea if this new way work in all circumstances where the user has a international keyboard? Thanks!
This is important.. (Score:5, Interesting)
In most of the languages with 'funny accents' like umlauts, these characters often have a completely different pronounciation, and are often considered to be a completely different letter than without the 'accent'.
Simply 'brushing off the dirt' and removing the 'accent' thus changes the word. Sometimes with wierd results.
Just ask someone from the town of Moensteraas, Sweden [monsteras.se].
Their website contains mostly municipal information intended for swedes, but due to the restrictions of DNS, the name is instead spelt 'monsteras', which means 'monster-carcass' in Swedish.
Obviously, these people would be happier spelling it with umlauts on the o, and a ring over the a.
Translitteration (Score:3, Informative)
Good question. Basically people don't think/too lazy to translitterate the letters properly.
Some places have the forethought to register both:
Munich in Germany has registered both "munchen.de" and "muenchen.de".
(But it's really a u with an umlaut)
Accecents like case? (Score:4, Insightful)
I know there are times when differnet accents sometimes indicate different words -- but I'm under the impression that it is unlikely that more than one of them would be a "good" domain name. (Am I wrong about that?)
This won't work for non-latin characters, obviously. But UTF-8 seems like a better solution to that. (I understand that most chineese words are 2-3 characters of 2-3 bytes (unified is U-430 to U-9fa and upto U-7ff is 2 characters) for 4-9 bytes -- clearly less than 63 bytes) The obvious downside is that it means that all DNS servers and resolvers must (at least!) be 8-bit clean.
Re:FINALLY! (Score:5, Funny)
I am glad too see others than the Mesopotamians using the wheel which was originally invented for use in Mesopotamia.
Re:FINALLY! (Score:3, Interesting)
For the Maya's, zero was not just a placeholder. It signified the concept of an absence of value, a.k.a. an empty set.
http://en.wikipedia.org/wiki/Zero [wikipedia.org]
History
The numeral or digit zero is used in numeral systems, where the position of a digit signifies its value, with successive positions having higher values, and the digit zero is used to skip a posi
Re:FINALLY! (Score:4, Funny)
Re:FINALLY! (Score:3, Insightful)
I'm glad to see that people other than the Swiss are being recognized on the web. Which originally started as an Swiss scientific project...
Without the rest of the world, the Internet would have been obsolete and irrelevant by now. Deal.
Re:Bad idea but bound to happen with todays thinki (Score:2, Funny)
ludacris
femail
curce
mentaly
Re:Bad idea but bound to happen with todays thinki (Score:2, Insightful)
if you can't handle a little learning curce to access the info, IMO you aren't capable mentaly of doing anything with the info once you access it.
Next time you go to a country the native language of which you can't understand, try planning your whole trip wit
Re:Bad idea but bound to happen with todays thinki (Score:2, Interesting)
Good day to answer to a troll, here goes...
26 letters and 0-9 are not the best way to communicate with computer if your native language has more than 26 letters in its alphabet. It's not about being insulted or offended, it's about being understood. The computer speaks all natural languages equally badly, after all.
Let's think about average nordic webshop owner who sells beds online for a minute, operating for example in Finland or Sweden. He wants to sell stuff to the native dwellers and hence needs a d
Yes, as a matter of fact (Score:2)
Re:Not to be Overly American... (Score:5, Insightful)
Yes, it is. Because it's not just a few "umlauts". When you're talking about Asian or other non-Romanized languages then the Romanization may be totally incomprehensible to even some speakers of that language. It's one thing to lose a few accent marks and such but it's quite another to translate your language into a totally incomprehensible and unrelated format. In fact in kanji based languages at the very least Romanization actually LOSES information. It's not just a matter of transcribing the sounds into another format because the kanji carry additional meaning not present in just the phonetic lanaguage. If you've ever seen two native Chinese or Japanese speakers talk to each other they frequently will "write" kanji in the air or on the palm of the other person's hand with their fingers because their spoken language is imprecise.These changes are very necessary for the Internet to become a truly international phenomenon