Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
The Internet

ICANN Mulling Multilingual URLs 213

griffjon writes "The Washington Post is reporting that ICANN is testing out fully multilingual domain names. These won't just be [non-western-language].com, but would have TLDs translated into other scripts, fixing annoyances for non-English speaking audiences. An example: 'Speakers of Hebrew, Arabic and any other language written from right to left must type half of the URL in one direction and the other half — the .com, .net or .org postscript — the opposite way.' Let's hope it goes better this time around: 'Next week's experiments use the domain name "example.test" translated into 11 languages. A previous model, however, used "hippopotamus" instead of "test." These plans went awry when an Israeli registrar realized the Hebrew word ICANN thought meant "hippopotamus" was an expletive and threatened to involve the Israeli government.'"
This discussion has been archived. No new comments can be posted.

ICANN Mulling Multilingual URLs

Comments Filter:
  • by It doesn't come easy ( 695416 ) * on Thursday October 11, 2007 @03:13PM (#20943977) Journal
    Well hippopotamus me, what will they think of next?
    • by Rob T Firefly ( 844560 ) on Thursday October 11, 2007 @03:17PM (#20944055) Homepage Journal
      Meh, they can all go example themselves.
    • Well, you can find out how to talk about hippos here [hippos.com].

      I really wonder what they mistook it for....
    • by sconeu ( 64226 )
      But were they Hungry Hungry Hippos?
    • "I'm glad to say I was there to see the day the US government sold out the Internet in Berlin" - Don Telage.

      "The response was basically, 'I'm too busy. Go learn English."

      That's about right. In the day ICANN was concentrating on trademark issues and the reasons it got to exist in the first place (new tlds, international domain names) were back benched. It's not like we didn't have laws against trademark infringment, but the trademark lobby wanted greater rights in cyberspace than it has in the real world,
      • Re:Some actual facts (Score:4, Interesting)

        by jc42 ( 318812 ) on Friday October 12, 2007 @02:35PM (#20957961) Homepage Journal
        Before they rush on with alphabets that read right to left and use alternative character sets they really should try English words with greater than 8 bit characters. Are they gonna actually work?

        Well, lately I've been testing a lot of my old code in various UTF-8 environments, and I've been duly impressed by the fact that, as Ken intended, almost all the code "just works" with Arabic, Chinese, Japanese, etc.

        It turns out that there's a simple explanation. If the code doesn't examine chars with bit 8 turned on, but just treats them as unexamined "data" (or letters if the code is trying to distinguish that way), then everything works right. The only time the code needs to actually look at non-ASCII characters' values are when the text is being rendered in physical form. And hardly any code ever actually does that. Almost all my code reads data from files and writes data to other files, but never does anything with the physical representation of the data. It passes the data to other programs for that.

        A case in point: I was recently working on some multi-language HTML files, and I decided to try a fun test with CSS: I defined a whole lot of classes whose names were in Chinese. This made sense, since these classes were being used for pieces of the text that contained mostly Chinese characters, not counting things like spaces and punctuation. I tested the CSS using more than a dozen browsers that I have installed on my linux and OSX test machines. I was unable to find a single case where it didn't work. I even hunted down some Windows boxes and tested the files on IE6 and IE7; the worked fine (despite the well-known CSS incompatibilities in IE ;-). I also tried a few CSS class names with Arabic and Hebrew names, and they worked fine, too.

        Now, I don't think for a second that the writers of all those browsers spent time making sure that their code could handle UTF-8-encoded Chinese identifiers in CSS. I suspect that most of them never even considered the possibility. I'd bet that the code just takes anything that's not a significant character in CSS syntax, and tacitly treats it as a "letter". This is all it takes to make UTF-8 work correctly in this case.

        I did mention this in a couple of browsers' newsgroups. The responses were basically of the form "Well, of course it works. Why wouldn't it? You don't need special code to handle charset=UTF-8, except for the rendering. You'd have to be a fairly incompetent programmer to write code that doesn't work correctly with UTF-8. Except for rendering."

        I can hear people saying "but those browsers all need to render the text." Yeah, but the CSS routines don't render text. They parse the CSS input, and fill in fields in data structures that tell the rendering code how to position and color the text. But the charset-handling code is probably not called anywhere in the CSS modules; it's only called in the few places that actually need to color pixels on the screen.

        Lots of people have suggested declaring UTF-8 to be the only encoding for URLs. If this is done, there's probably very little URL-handling code anywhere that needs to be changed; it'll mostly "just work", because char codes 0x800 to 0xFF are treated as "letters". The only question is whether the final step of rendering the text's pixels will produce the right glyph, and the URL-handling code doesn't care about that.

        I happen to have a DNS server handy. Maybe I'll try a little test: In one of the domains, I'll add hostnames in Russian, Chinese, Arabic, and maybe a few other non-Roman alphabets. I'll wait a while, and see if I can access the machines via those names from a few other machines. I'll predict that it'll also "just work".

  • Domain name != URL (Score:5, Informative)

    by Anonymous Coward on Thursday October 11, 2007 @03:19PM (#20944091)

    A URL is an entire address, including the protocol, local path and fragment identifier. This is a URL:

    http://slashdot.org/foo?bar=baz#qux

    A domain name does not include the protocol, the local path or the fragment identifier. This is a domain name:

    slashdot.org

    This is talking about domain names, not URLs. If anybody would talk about multilingual URLs, it would be the IETF, not ICANN, and they already have, they are called IRIs [ietf.org].

    • The domain name is part of the URL. Therefore, multilingual domains will result in multilingual URLs. </pedant>
      • But if the domain part of the URL was multilingual and the rest of the URL wasn't, it wouldn't be a "fully multilingual" URL, would it?
    • Re: (Score:3, Interesting)

      by CastrTroy ( 595695 )
      Sounds like something that the Canadian government would embrace. There's rules for government websites that the url must be bilingual, so the directory path and file names must be mirrored to create the same structure in both French and English. The loophole in the rules is that you don't have to provide multiple directories and folders where the name isn't linguistic, such as calling your file 1243.html, or ESADOFE.html. So you can either mirror your directory structure in French and English, or have a
      • Re: (Score:3, Funny)

        by Phisbut ( 761268 )

        Sounds like something that the Canadian government would embrace. There's rules for government websites that the url must be bilingual, so the directory path and file names must be mirrored to create the same structure in both French and English. The loophole in the rules is that you don't have to provide multiple directories and folders where the name isn't linguistic, such as calling your file 1243.html, or ESADOFE.html.

        Ah, but that's where you're wrong my friend. Like it or not, "1234.html" can be expa

  • Seriously (Score:3, Interesting)

    by El Lobo ( 994537 ) on Thursday October 11, 2007 @03:22PM (#20944171)
    Seriously, multilingual domain names are a pain (for the whole humanity). Visiting japan, last year, I saw a lot of servers using japanish simplified language on it. As a foreigner, I hadn't the minimal idea about what the site was (without clicking on ot). Clicking on it didn't help either. Yes, a lot of japanese have the same problem with english domain names, but adding multilanguage names adds more complexity to the whole thing. I would like to see the face of a chinese guy trying to decrypt some URL using ukranian characters... or... trying to write it on his japanese keyboard...
    • Re: (Score:3, Interesting)

      Speaking of Asian (written) languages, don't a lot of them read top to bottom?

      How to accommodate those?
      • Re: (Score:2, Funny)

        by gregoryb ( 306233 )
        Speaking of Asian (written) languages, don't a lot of them read top to bottom?

        How to accommodate those?


        Rotate your screen 90 degrees...
      • ...but is it still top down in Australia?
      • That's for when we get multiline domain names, in 2020
        For now, one line works fine for everyone.
      • by griffjon ( 14945 )
        You carefully place your monitor on its side. C'mon man, at least try to keep up!
      • Get a browser with a vertical input column?

        Yes, it's flippant, but nothing compared to my solution for the article's dilemma:

        ATTENTION ARAB- AND HEBREW- SPEAKING PEOPLES. We have fixed the internets. Please use the following protocols in all communications:

        The ".net" domain is now "ten." ; ".com" is "moc." and so on.

        The proper procedure for forming a URL is is: subpage, top-level-domain, domain, subdomain, then "//:ptth".

        PLEASE USE THIS PROTOCOL AND ONLY THIS PROTOCOL IN YOUR FUTURE USE OF ALL OF THE INTE
    • Comment removed based on user account deletion
      • by jrumney ( 197329 )

        Why would a Chinese guy be using a Japanese keyboard?

        To type a Ukranian URL with of course. Hopefully the webpage behind the URL will be in Gujarati so his friend in California can get one of his co-workers translate it for him.

      • by SL Baur ( 19540 )
        With great difficulty. Many of the same characters are used, but they sound different and mean different things now. I suspect there's similar difficulties between the Taiwanese and Red Chinese since they now use different characters for writing as well.

        If by "Japanese keyboard" you mean a keyboard with Japanese characters on the keycaps... There are two types, there's the huge monster kanji typewriter with something like a thousand keys that I don't think anyone knows how to use and kana keyboards which
    • by tknd ( 979052 )

      It's a hard problem indeed, but you have to consider the foreigner's view. What they're essentially forced to do now is learn a second way of writing things in their own language and it is pretty annoying.

      For example I went the other direction and learned some Japanese. I can read hiragana, katakana, and a few kanjis. The hiragana and katakana are equivalent writing systems but for different purposes. Katakana is usually reserved for foreign words or emphasis (sorta like how people sometimes use all capit

    • by Ilgaz ( 86384 ) *

      Seriously, multilingual domain names are a pain (for the whole humanity). Visiting japan, last year, I saw a lot of servers using japanish simplified language on it. As a foreigner, I hadn't the minimal idea about what the site was (without clicking on ot). Clicking on it didn't help either. Yes, a lot of japanese have the same problem with english domain names, but adding multilanguage names adds more complexity to the whole thing. I would like to see the face of a chinese guy trying to decrypt some URL using ukranian characters... or... trying to write it on his japanese keyboard...

      English domain names will stay forever, there are way too many references to them.

      The international domain name will be purchased in addition. I would buy 3 domains right now if they finally decide what to do.

      Japanese and Chinese would be hard to understand, just imagine your nick (El Lobo) is some local language and without special chars, it reads as "Lobotimised Moron" in that language. Sounds extreme? There are WORSE situations than that which I better not tell.

      These people pay the same price for domain

  • I'd love to know what Hebrew word for hippo is explicative. All my life I've only ever heard "hipopotam" in Hebrew for hippo- not a very dirty word. In any case, Hebrew URLs have been the norm at the Hebrew Wikipedia since as long as I've been using it. Hebrew domain names, on the other hand, would be interesting (even though I'm sure this is what the poster meant).
    • At least they didn't use waterbuffalo.com [wikipedia.org]
      • That's funny.

        All this time, I thought a Water Buffalo was a horrible 1970's-vintage Suzuki motorcycle.
        • Re: (Score:2, Funny)

          by blinx_ ( 16376 )
          It was a 750cc water cooled 2 stroke triple, that is sweet vintage sex on wheels, not horrible :)
      • Actually, I think that's the issue. There's apparently a (mis)association of the Hebrew "behemot" with hippopotamus (mentioned here [wikipedia.org], coming from the Oxford Companion to the Bible), and "behemot" can be considered the plural of "behema" (which can be translated as "water buffalo").
        • Oh, and I was thinking it was something like, "I seem to be having tremendous difficulty with my lifestyle," a phrase that is known to have started at least two wars.
        • Re: (Score:3, Interesting)

          by zunger ( 17731 )
          Behemot is the plural of behema; the word literally means (roughly) "large, mindless quadruped." In the plural it's often used as an equivalent to "livestock," and in Biblical Hebrew it was used as the (only) word for hippopotamus. In more modern Hebrew, the borrowed word "hipopotam" is used for hippo, and "behema" has a slightly more literary feel to it -- except when it's used to refer to a person, which is probably its most common use today. And not polite. :)
    • I'm curious too, but, just a guess, I wonder if the word in question is "behema", which is actually "water buffalo", but is used to refer to loud, rowdy, uncouth people?

      • Yep, I think you got it. After much scouring, I managed to come up with the (mis)association of "behemot" with hippopotamus (mentioned here [wikipedia.org], coming from the Oxford Companion to the Bible), which seems to be the issue.

        Though, to take offense at this seems to be crazy.
    • I share your curiosity. Sounds like someone, somewhere, had a hippo-potty-mouth.

    • Re: (Score:3, Funny)

      by Red Flayer ( 890720 )
      As we all know, hippopotamos means river whores.

      Or at least, that's what I recall from 4th grade biology class.
  • by Anonymous Coward on Thursday October 11, 2007 @03:29PM (#20944279)
    xc.estaog//:ptth
  • These plans went awry when an Israeli registrar realized the Hebrew word ICANN thought meant "hippopotamus" was an expletive and threatened to involve the Israeli government.

    I wonder what test translates to... I hope they hired a translator who doesn't like practical jokes.

  • by dotpavan ( 829804 ) on Thursday October 11, 2007 @03:40PM (#20944459) Homepage
    http://org.slashdot/ [org.slashdot] or is it org.dotslash://http or org.dotslashcolon://http or.... ah, hippo it!
    • Actually, I read some article about it. Creators of 'Net made a mistake with domain names - but when they realised it was too late. They logically should be made in this way - top to bottom. Protocol://TLD.domain name/rest of URI.
      I cannot find where I've seen it...
      • Re: (Score:2, Funny)

        by tighr ( 793277 )

        Actually, I read some article about it. Creators of 'Net made a mistake with domain names - but when they realised it was too late. They logically should be made in this way - top to bottom. Protocol://TLD.domain name/rest of URI.
        I cannot find where I've seen it...
        So does that mean that in a few years after this change, we'll have the com-dot boom? Will we be living in the age of com-dot? That doesn't even roll off the tounge...
    • Actually, I think it will use the new ptth:// protocol....err, maybe the //:ptth, or \\:ptth? Hmmmm.....
    • In the British JANET [wikipedia.org], machine names looked like UK.AC.HATFIELD.STAR .
  • Mulling != Testing.
  • All I have to say is about time. Yes, I'm a native English speaker, and yes I see some technical problems with this, but I'm also fairly cosmopolitan (not the magazine) and do think that multi-lingual domains are the way to go.

    One request I would have of ICANN is to limit the use of accented character to help prevent fishing scams.
  • hmmmm.... How much would, xxx.madamiamadam.xxx, sell for?
  • I don't think the computing world is ready for this yet, and it may never be a good idea.

    Internationalization in software and operating systems is in a horrible state of excess
    complexity right now. When everything top to bottom runs unicode UTF8 as its default
    mode, then MAYBE.

    But even then, there is a single language for Aviation communications (happens
    to be English) but that is done so that there is some hope that everyone will know what
    everyone is talking about, because everyone can learn the aviation sub
  • This is a worst idea ever invented. There are a lot of languages like Russian where some letters are the same as normal english letters. So if they manage to roll out multiligual domains it meas they will roll out a lot of possibilites to spoof domains. For example russian 'R' looks like english 'P'. So suddenly any jerk could register well-known domain substituting one letter for another and capture a lot of passwords...
  • Did they solve the problem where the same or indistinguishably similar character appears multiple times in the character set?
  • URLs weren't meant to be exposed. They weren't meant to be branded. It was a mistake for the first browsers to have an address bar displaying the URL, once they left alpha-testing. Most people navigate by selecting bookmarks or clicking links to other sites. Seldom do people type in a URL, and if they need too, a pop-up dialog, which went away after taking input, would have been enough. There never was a need to show all the URLs that for a page element flying by in the status bar, or to shoe the site'
    • by Rakishi ( 759894 )
      And when your bookmarks got deleted what would you do? Well? Remember, no search engines and all you can access is some default web page (likely one you never used before).
  • There was an article here on Slashdot last week on multilungualism, and all the anglo slashdotters were talking about some wonderful "universal" language (which of course, meant english because none of them ever bothered to even try to learn anything else), and then they get seriously confused when they go to another country and discover that, shock, gasp, the local not only speak another language, but they also use another script entirely, and this, especially in countries with big populations where there

One man's constant is another man's variable. -- A.J. Perlis

Working...