Internationalized Domain Names Coming Soon 526
rduke15 writes "You think you know how to parse a domain name for validity? Well, in case you haven't noticed, things are getting tougher as registrars keep adopting IDN (Internationalized Domain Names), which uses a weird encoding named Punycode to enable accented characters in domain names. The Register reports about Switzerland, Germany and Austria's joint move to enable IDN. See the overview in English from Switch. But I guess it would be difficult to talk about this on /., since it does not even support basic Latin-1 ... :-)"
IDN? Mozilla supports it (Score:3, Informative)
Mixed feelings (Score:5, Informative)
On the other hand, this is also quite convenient. I live in the US now, and I travel around quite a bit. I often surf on Swedish Internet sites, typically without access to a Swedish keyboard. It would not be very convenient if the domain names used non-English symbols.
Sometimes I go to Japanese sites also, and I am really glad that I don't have to install a Japanese word processor to do this...
Tor
Punycode *is* a Unicode encoding. (Score:5, Informative)
Punycode *is* a Unicode encoding.
Unicode has many encodings; UTF-8 is one encoding and Punycode is another. UTF-8 aims for efficiency when the majority of the text is ASCII, and Punycode aims for completeness when you must fit in 64 characters and use only the ASCII characters to do it.
Re:Isn't there a better way? (Score:1, Informative)
i would imagine it probably attempts to query with the unicode first, and upon failure tries the munged address. since both versions are in the whois db, as DNS servers become unicode compliant, this would be naturally phased out.
however, it means that any accent-containing domains would actually have two entries; i wonder, would you have to actually register twice (i.e., pay twice)?
one good thing is that it does look like suficiently undesireable names are the result of the conversion, so i don't think there would be much overlap between existing domains and the converted form of new accent-containing domains...
Taking 1337-speek to a new level (Score:3, Informative)
That last one would be doubly good, because if I understand the Punycode [faqs.org] spec correctly, it'll get translated to ASCII as dixiehicks-XXXX.com. Not my opinion of the group, but maybe it would attract hits from the Toby Keith crowd.
Re:Why not (Score:2, Informative)
Hint: ascii is 7-bit.
Re:Taco, why did you remove the accents from slash (Score:1, Informative)
No change needed... (Score:5, Informative)
Yes, I do, and if you _read_ the RFC you'll see that nothing changes, these domain names are encoded into the same character set as the current DNS system. And hence if you give me a URL I can validate it with existing scripts. There's an example which shows that Bucher.ch (with an umlaut on the u) would be translated to: xn--bcher-kva.ch which looks totally parseable to me.
John.
Re:FINALLY! (Score:1, Informative)
I don't think the Mayans even used a base-ten system like the rest of the world, so attributing zero to them seems odd to me.
Re:Isn't there a better way? (Score:5, Informative)
No. The problem that punycode solves is that the encoded DNS names are themselves valid RFC1034 DNS names. That is, even when encoded, standard DNS validity checkers will accept the name.
UTF-8 does not have this property
Re:Sorry, but this is really stupid... (Score:2, Informative)
Backwards compatability (Score:4, Informative)
Sounds like a great idea.... If you're willing to re-implement the DNS code in my Win-95 box.... or on my Amiga-4000. How about my 10 year old Apollo workstation or the SUN-3 that's still working just fine, thank you. etc. etc.
A lot of old DNS implementations would choke (and properly so) on UTF-8 encoded DNS names. We probably could have seeded the needs of the future by saying that IP-6 DNS servers should support unicode, but I think that even that boat has been missed. (or is quickly leaving dock).
In the meantime the old DNS and it's anglo-centric presumptions and restrictions are with us for the next few years (or decades, as the case may be). Clearly some people feel the need to live within those restrictions.
Spoof with accented characters (Score:2, Informative)
Unless the registries all implement some sort of canonicalization, owners of domain names containing the letter "o" are going to have a combinatorial explosion!
Re:really dumb sounding (Score:3, Informative)
I think the *real* solution here is to reimplement ALL top level DNS servers to support unicode. But the overhead in doing this, when you really think about it, seems difficult (ICANN approval, unicode related bugs, getting everyone to use new DNS server, etc). At least, since the ASCII text supported by DNS are exactly the same in Unicode, backwards compatibility should not be a problem.
This solution is a workaround that uses unicode at the client level, encodes it to "punicode" (which only contains characters supported by DNS, unlike, say, BASE-64 or Quoted-Printable), and sends the request to the DNS server. It is a quick and easy solution to a messy problem. But its hacky-ness makes me doubt it will be supported by whatever governing body influences this stuff (IETF, ICANN, etc).
-Mani
Re:Can we have punctuation while we are at it? (Score:3, Informative)
There are a couple others, but I don't remember them offhand... So in other words, these characters are unusable for a reason.
Re:You RTFA (Score:3, Informative)
I am aware that the German scharf s is not a capital B. I had it correctly in my submission, but someone who was working on the slashcode thought it would be a good idea to eliminate accents, rather than to possibly HTMLize them.
Try it yourself, put in an scharf s into a Slashdot comment, and see what happens.
I notice that you DIDN'T complain about the missing accent on the French e, or the missing slash through the Swedish o.
Now, as a speaker of German for 10 years, I'm going to leave it at that.
Translitteration (Score:3, Informative)
Good question. Basically people don't think/too lazy to translitterate the letters properly.
Some places have the forethought to register both:
Munich in Germany has registered both "munchen.de" and "muenchen.de".
(But it's really a u with an umlaut)
Re:Example (Score:2, Informative)
Thanks for the example. Let's do a few quick tests.
The encoded version always works, and leads to a page where you have an unencoded link (normal spelling with the accents).
Copied the unencoded version, and tried:
On WinXP:
- Mozilla 1.4 : OK
- MSIE 6, Opera 6.2 : NO
On Linux - Red Hat 6.2 (of course, that's a pretty old system):
- lynx, ping, host, dig,
(cannot test Mozilla, since this server has no GUI.)
Well, I guess we'll have to live with that horrible Punycode.