Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
The Internet

Internationalized Domain Names Coming Soon 526

rduke15 writes "You think you know how to parse a domain name for validity? Well, in case you haven't noticed, things are getting tougher as registrars keep adopting IDN (Internationalized Domain Names), which uses a weird encoding named Punycode to enable accented characters in domain names. The Register reports about Switzerland, Germany and Austria's joint move to enable IDN. See the overview in English from Switch. But I guess it would be difficult to talk about this on /., since it does not even support basic Latin-1 ... :-)"
This discussion has been archived. No new comments can be posted.

Internationalized Domain Names Coming Soon

Comments Filter:
  • by ospirata ( 565063 ) on Tuesday November 25, 2003 @03:13PM (#7560746)
    I'm delighted to tell that Mozilla is one step forward again, and already supports IDN since version 0.9.5 http://www.mozilla.org/projects/intl/idn_mozilla.h tml [mozilla.org]
  • Mixed feelings (Score:5, Informative)

    by f97tosc ( 578893 ) on Tuesday November 25, 2003 @03:14PM (#7560761)
    I have mixed feelings about this. I am from Sweden, and it always looks kind of ugly when names lose their dots and circles in the domain name.

    On the other hand, this is also quite convenient. I live in the US now, and I travel around quite a bit. I often surf on Swedish Internet sites, typically without access to a Swedish keyboard. It would not be very convenient if the domain names used non-English symbols.

    Sometimes I go to Japanese sites also, and I am really glad that I don't have to install a Japanese word processor to do this...

    Tor
  • by Speare ( 84249 ) on Tuesday November 25, 2003 @03:20PM (#7560838) Homepage Journal

    Punycode *is* a Unicode encoding.

    Unicode has many encodings; UTF-8 is one encoding and Punycode is another. UTF-8 aims for efficiency when the majority of the text is ASCII, and Punycode aims for completeness when you must fit in 64 characters and use only the ASCII characters to do it.

  • by Moebius Loop ( 135536 ) on Tuesday November 25, 2003 @03:20PM (#7560845) Homepage
    It looks as if the goal is to implement this without breaking existing implementations. I did RTFA, although I might be missing something, but it seems to be that the translation is done by the client/local nameserver.

    i would imagine it probably attempts to query with the unicode first, and upon failure tries the munged address. since both versions are in the whois db, as DNS servers become unicode compliant, this would be naturally phased out.

    however, it means that any accent-containing domains would actually have two entries; i wonder, would you have to actually register twice (i.e., pay twice)?

    one good thing is that it does look like suficiently undesireable names are the result of the conversion, so i don't think there would be much overlap between existing domains and the converted form of new accent-containing domains...
  • by RobertB-DC ( 622190 ) * on Tuesday November 25, 2003 @03:22PM (#7560859) Homepage Journal
    Now I won't have to be limited to using a hyphen! I can register d[i-circ]xiechicks.com, or dixi[e-grave]chicks.com, or maybe dixie[c-cedil]hicks.com!

    That last one would be doubly good, because if I understand the Punycode [faqs.org] spec correctly, it'll get translated to ASCII as dixiehicks-XXXX.com. Not my opinion of the group, but maybe it would attract hits from the Toby Keith crowd.
  • Re:Why not (Score:2, Informative)

    by lokedhs ( 672255 ) on Tuesday November 25, 2003 @03:22PM (#7560861)
    Right... What's the ASCII code for the Euro sign? Or even accented "a"? How about the russian Gze?

    Hint: ascii is 7-bit.

  • by Anonymous Coward on Tuesday November 25, 2003 @03:23PM (#7560872)
    don't understand a lick of french? 'taco is a mean man'
  • No change needed... (Score:5, Informative)

    by JohnGrahamCumming ( 684871 ) * <slashdot@jgc.oERDOSrg minus math_god> on Tuesday November 25, 2003 @03:24PM (#7560895) Homepage Journal
    > You think you know how to parse a domain name for validity?

    Yes, I do, and if you _read_ the RFC you'll see that nothing changes, these domain names are encoded into the same character set as the current DNS system. And hence if you give me a URL I can validate it with existing scripts. There's an example which shows that Bucher.ch (with an umlaut on the u) would be translated to: xn--bcher-kva.ch which looks totally parseable to me.

    John.
  • Re:FINALLY! (Score:1, Informative)

    by Anonymous Coward on Tuesday November 25, 2003 @03:38PM (#7561063)
    Elsewhere in the world, the Arabic numeral system (012345679) had zero, and before that, so did ancient India.

    I don't think the Mayans even used a base-ten system like the rest of the world, so attributing zero to them seems odd to me.
  • by Rob Riggs ( 6418 ) on Tuesday November 25, 2003 @03:53PM (#7561209) Homepage Journal
    wouldn't UTF-8 have worked just as well?

    No. The problem that punycode solves is that the encoded DNS names are themselves valid RFC1034 DNS names. That is, even when encoded, standard DNS validity checkers will accept the name.

    UTF-8 does not have this property

  • by pawal ( 6862 ) on Tuesday November 25, 2003 @03:59PM (#7561281)
    Nothing in the DNS infrastructure need to be upgraded. There is only us-ascii in the zones. BUT, you have to upgrade your applications in order to read them the names the way they are supposed to read, otherwise you will end up with www.xn--rksmrgs-5wao1o.se instead of "www.raksmorgas.se".
  • by Stephen Samuel ( 106962 ) <samuel@bcgre e n . com> on Tuesday November 25, 2003 @04:07PM (#7561385) Homepage Journal
    Why not extend dns to support unicode? That way they'd be no translation or other crap to go through.

    Sounds like a great idea.... If you're willing to re-implement the DNS code in my Win-95 box.... or on my Amiga-4000. How about my 10 year old Apollo workstation or the SUN-3 that's still working just fine, thank you. etc. etc.

    A lot of old DNS implementations would choke (and properly so) on UTF-8 encoded DNS names. We probably could have seeded the needs of the future by saying that IP-6 DNS servers should support unicode, but I think that even that boat has been missed. (or is quickly leaving dock).

    In the meantime the old DNS and it's anglo-centric presumptions and restrictions are with us for the next few years (or decades, as the case may be). Clearly some people feel the need to live within those restrictions.

  • by Cardbox ( 165383 ) on Tuesday November 25, 2003 @04:17PM (#7561507) Homepage
    There's no need to put accents on things, you can spoof just as well without. For example: the Greek omicron, Russian lowercase o, and Latin lowercase o all look identical... but they are all different Unicode characters!
    Unless the registries all implement some sort of canonicalization, owners of domain names containing the letter "o" are going to have a combinatorial explosion!
  • by x mani x ( 21412 ) <mghase.cs@mcgill@ca> on Tuesday November 25, 2003 @04:49PM (#7561823) Homepage
    They did obviously consider unicode, perhaps you did not RTFA. However their solution uses unicode at a different layer.

    I think the *real* solution here is to reimplement ALL top level DNS servers to support unicode. But the overhead in doing this, when you really think about it, seems difficult (ICANN approval, unicode related bugs, getting everyone to use new DNS server, etc). At least, since the ASCII text supported by DNS are exactly the same in Unicode, backwards compatibility should not be a problem.

    This solution is a workaround that uses unicode at the client level, encodes it to "punicode" (which only contains characters supported by DNS, unlike, say, BASE-64 or Quoted-Printable), and sends the request to the DNS server. It is a quick and easy solution to a messy problem. But its hacky-ness makes me doubt it will be supported by whatever governing body influences this stuff (IETF, ICANN, etc).

    -Mani
  • by Tazzy531 ( 456079 ) on Tuesday November 25, 2003 @04:58PM (#7561924) Homepage
    There are technical reasons for disallowing certain characters. They are "reserved characters" in URLS.
    • The ? signifies the end of the URL and the beginning of the parameters.
    • The & deliminates the parameters.
    • The % are used for escapes [ie %20; is a space in URL parameters].
    • The = is the assignment operation in URL parameters.
    • The # is link anchors


    There are a couple others, but I don't remember them offhand... So in other words, these characters are unusable for a reason.
  • Re:You RTFA (Score:3, Informative)

    by Krach42 ( 227798 ) on Tuesday November 25, 2003 @06:06PM (#7562682) Homepage Journal
    Actually, I'm aware of that, but Slashdot seems to have stripped out the accents from my stuff...

    I am aware that the German scharf s is not a capital B. I had it correctly in my submission, but someone who was working on the slashcode thought it would be a good idea to eliminate accents, rather than to possibly HTMLize them.

    Try it yourself, put in an scharf s into a Slashdot comment, and see what happens.

    I notice that you DIDN'T complain about the missing accent on the French e, or the missing slash through the Swedish o.

    Now, as a speaker of German for 10 years, I'm going to leave it at that.
  • Translitteration (Score:3, Informative)

    by k98sven ( 324383 ) on Tuesday November 25, 2003 @08:03PM (#7563832) Journal
    Why monsteras instead of moensteraas?

    Good question. Basically people don't think/too lazy to translitterate the letters properly.

    Some places have the forethought to register both:
    Munich in Germany has registered both "munchen.de" and "muenchen.de".
    (But it's really a u with an umlaut)

  • Re:Example (Score:2, Informative)

    by rduke15 ( 721841 ) <rduke15@gm[ ].com ['ail' in gap]> on Tuesday November 25, 2003 @08:08PM (#7563897)
    http://www.xn--rksmrgs-5wao1o.se/ will work if you are using a recend Mozilla

    Thanks for the example. Let's do a few quick tests.

    The encoded version always works, and leads to a page where you have an unencoded link (normal spelling with the accents).

    Copied the unencoded version, and tried:

    On WinXP:

    - Mozilla 1.4 : OK
    - MSIE 6, Opera 6.2 : NO

    On Linux - Red Hat 6.2 (of course, that's a pretty old system):

    - lynx, ping, host, dig, ... : NO
    (cannot test Mozilla, since this server has no GUI.)

    Well, I guess we'll have to live with that horrible Punycode.

With your bare hands?!?

Working...