ICANN Under Pressure Over Non-Latin Characters 471
RidcullyTheBrown writes "A story from the Sydney Morning Herald is reporting that ICANN is under pressure to introduce non-Latin characters into DNS names sooner rather than later. The effort is being spearheaded by nations in the Middle East and Asia. Currently there are only 37 characters usable in DNS entries, out of an estimated 50,000 that would be usable if ICANN changed naming restrictions. Given that some bind implementations still barf on an underscore, is this really premature?" From the article: "Plans to fast-track the introduction of non-English characters in website domain names could 'break the whole internet', warns ICANN chief executive Paul Twomey ... Twomey refuses to rush the process, and is currently conducting 'laboratory testing' to ensure that nothing can go wrong. 'The internet is like a fifteen story building, and with international domain names what we're trying to do is change the bricks in the basement,' he said. 'If we change the bricks there's all these layers of code above the DNS ... we have to make sure that if we change the system, the rest is all going to work.'" Given that some societies have used non-Latin characters for thousands of years, is this a bit late in coming?
Changing a system (Score:5, Insightful)
Wont this open up the system to many more phishing attacks involving addresses which include non-latin characters which look similar to latin ones?
Yes and No (Score:5, Insightful)
Late in coming? (Score:3, Insightful)
If the fault lies with anyone, it's with the individual contributers of the tech. Or better, with the non-latin countries appearent lack of interest in some of the core projects needed to push this through ICANN ( specifically DNS, httpd ).
When you've built on a foundation of straw- (Score:4, Insightful)
DNS upheaval has been a long time coming, and the current anti-American sentiment worldwide isn't exactly helping to stabilize it. We're already seeing all sorts of adhoc routing setups that deal with shortcomings of an ameri-centric DNS. My guess is that within the next few years, ICANN's 'control' of the internet will be in name only as everyone else in the world will have moved on to alternative routing and domain systems.
Stupid question (Score:4, Insightful)
No.
Zonk either knows zero about the histories of the Internet or DNS, or is so enamored of finishing stories with questions that he'll tack on the truly ridiculous.
Re:Changing a system (Score:5, Insightful)
And, of course, you need to make sure when someone types this into a browser some major DNS server someplace won't crash.
I'm all for adding non-latin characters. But I do recognize that it should be a slow process.
Watch out for attacks (Score:5, Insightful)
Sure, go 'head (Score:4, Insightful)
I'd be in favor of the change just because anything that undermines the Unix Tower of Babel -- the dependency on ASCII which complicates text handling sooooo much even when Windows solved the problem soooo long ago -- is good. Even Java gets it. Even Apple (finally) get it. Unix Is Teh Problem.
And the ASCII problem isn't just bad because it forces people to use inefficient encodings like UTF-8 (THREE bytes per character?) It's bad because it allows people to write code like:
if(string[index] == '.' || string[index] == '?' || string[index] == '!') sentenceEnd = true;
(a line repeated, with subtle variations, several hundred times in the code of a certain ubiquitous editor).
And, lo and behold, the above does not work, but once it appears in a few thousand places it's impossible to fix, and a vast towering structure of fixes made by people who don't really understand why it's an issue is built.
So, even though the proposed change would be hugely inconvenient for a huge number of people, I'm in favor, because I want the world to grow the fork up and understand that text != byte array some time while I'm still alive.
URL goldmine. (Score:4, Insightful)
Nice for localising, sure, but how usable will Japanese, Indian, or Arabic script URLs -- for example -- be for those who do not have access to the respective sets or keyboard layouts?
compounding one mistake with another (Score:3, Insightful)
But that doesn't mean it should be done hastily and badly.
English, not latin languages (Score:3, Insightful)
Let's be clear. The domain name system only uses English characters. There are lots of languages in Europe (Italian, Spanish, French...) which are closer to latin than English (which isn't really a latin language at all) which are not currently represented, because you can't use accents in domain names, or other letters such as the spanish Enye (n with a squiggle, actually a distinct letter). English speakers often think accents aren't important but they can completely change a word's meaning.
Not a trivial job (Score:4, Insightful)
Changing all the DNS servers in the world to switch from ASCII to Unicode is NOT trivial. The fact that some societies have used non-latin characters for thousands of years is completely and utterly irrelevant. THEY didn't make the internet. They simply bolted themselves on to an existing infrastructure.
I agree that progress needs to be made to accomodate non-latin characters, but to have people whining about "how they want it, and want it now"... That's just ridiculous. It's like waltzing into a house that was built 40 years ago and having a tantrum because the stairs are too steep and the house is too squished. Major structural renovations take time, effort, and careful planning. And there is nothing you can do to avoid that, short of implementing cheap stop-gap measures that are virtually guaranteed to cause even bigger unintended headaches later on.
Re:Changing a system (Score:5, Insightful)
thousands of years? (Score:4, Insightful)
Those societies did not build an entire economic and social infrastructure using all 50,000 of those characters in a few decades, though.
Re:Can't trust your browser's address bar anymore. (Score:5, Insightful)
Cyrillic users would see www.**c******.com, latin users would see www.mi*rosoft.com?
Or better yet, put up a big warning that it's using mixed alphabets?
Re:When you've built on a foundation of straw- (Score:3, Insightful)
Re:Changing a system (Score:5, Insightful)
Even worse, although your problem is reason enough to postpone doing this change. It will break the very idea of the Internet as a common when URLs can't even be typed in on all keyboards. There are good reasons why DNS didn't even include the whole ASCII set. Least common denominator is a good design decision. Every character currently allowed is easy to generate on ALL keyboards, can be printed in an unambigious way by EVERY printing system, etc. Remember that a lot of wire services aren't even 7-bit ASCII clean, email addresses on a lot of news wires have to use (at) instead of @.
More bluntly, of what use is the parts of the Internet I can't even type the domain name for? As things now stand I CAN, and have, snarfed firmware directly from
At a minimum, unicode DNS should be restricted to IPv6 ONLY. No sense wasting scarce IPv4 resources on supporting walled off ghettos.
Re:Changing a system (Score:2, Insightful)
Re:Changing a system (Score:5, Insightful)
Let's say for instance we have an online shop for tea called "Sólo Té" (Tea Only). Both accents are due to irregular rules ("Sólo" = "Only" and "Solo" = "Alone", "Te" is a personal pronoun and "Té" = Tea). Some people would try the current www.solote.com, others would try the correct www.sóloté.com, some would try www.sólote.com and yet others www.soloté.com depending on their spelling capabilities.
What this basically means is that in order to make sure everybody finds your domain and to avoid phishing you have to register four different domains.
A solution to this problem could be what Google does right now with accents: map them to the unnacented vowel. Thus "Solo Te" and "Sólo Té" would both find the "Sólo Té" store.
Re:Changing a system (Score:3, Insightful)
Potentially, yes. But I'm not too bothered about that. Protecting people from their own stupidity is rarely a good long term strategy. However, i18n for DNS is a particularly bad idea for purely pragmatic reasons. Currently, anyone anywhere in the world can go to any URL in the world in their web browser. If we allow the full range of unicode characters, that simply ceases to be true. When URLs start containing unicode characters, many people are simply not going to be able to enter them into their computer (with current input methods, anyway). True, many of those sites will not be of interest to the average person that doesn't have a convenient way to enter the URL anyway. But there will always be those that need to grab a data sheet from a Taiwanese electronics manufacturer, or look at live results from a sporting event in the middle east. That will cease to be possible with i18n. As you say, the system currently works. Changing it for political reasons is just stupid.
Re:English, not latin languages (Score:3, Insightful)
Why does this matter? Well, one argument is that it doesn't, much: if I want to view a Chinese website I'm probably in China and can input Chinese characters on my computer. But what about a Chinese person visiting an English-speaking country and surfing at a public computer (e.g. in a web cafe)? If the computer isn't set up for input of Chinese, he/she won't be able to view certain sites if they can only be accessed by inputting a non-latin URI. Thus to serve all possible customers, the computer would need dozens of input systems installed. That simply isn't going to happen. The alternative of just inputting Unicode codes is unworkable.
Hence it makes more sense to have a requirement that any non-Latin DNS registration ALSO be accompanied by a pure ASCII one, so that any computer will be able to access it. This also helps people who don't know a given language very well: if you don't know Chinese well, and are just learning it, you may find it hard to type in a web address with unfamiliar characters, even if your computer has Chinese input enabled. That shouldn't keep you from visiting a site.
In fact, there are some Chinese systems that do this, by creating a registry of Chinese names for websites. But they involve kludgy workarounds like browser bars that are not universal and are otherwise evil.
Re:Can't trust your browser's address bar anymore. (Score:3, Insightful)
In general, browsers ought to make users more aware of the parts of their current URL, and maybe also of link destinations (also mail client).
For example, seperate the URL into its parts (scheme, host, path). Display some of the WHOIS info below the hostname, and some info from the SSL certificate if it has one.
This would help people spot phishing scams or other suspicious activity.
Reed
Re:Watch out for attacks (Score:2, Insightful)
Re:Changing a system (Score:5, Insightful)
As far as Japanese go, there are very usable technologies that allow to type in kanji. Using a standard latin keyboard. It works pretty well, and i'm not sure what other languages have such options available, but since most of Asia uses the same kanji system I'm pretty sure that at least Asia has viable typing options.
'of what use is the parts of the Internet I can't even type the domain name for?'
Its of no use... to you. But then again, can you read Japanese, Korean, Arabic, Sanskrit or any other non-latin language? no? Then your usability isn't in question here.
Re:Changing a system (Score:5, Insightful)
Just because the letters aren't printed on your keyboard doesn't mean it won't type them. Have a look at the list of keyboard layouts in your OS. Sure, it's an inconvenience for you, but less of an inconvenience than it is to the people for whom it is a barrier to entry. Or you could use Google - a lot of people don't even bother typing in domain names any more, they just search.
The whole point about this is that it avoids walled gardens, because the DNS records are still held by ICANN. The alternative is that China decides it's had enough, and creates its own root servers, causing a very real split.
Re:When you've built on a foundation of straw- (Score:3, Insightful)
Re:Can't trust your browser's address bar anymore. (Score:3, Insightful)
Registrars shouldnt accept such names in the first place though: Is there a valid reason to ever have a domain name with stray characters mixed in from different languages?
If a standard were to specify that a domain name must use a subset of unicode that is self-consistent, and that browsers should turn the address bar red to warn anytime a domain uses characters not in the users selected languages subsets, that would go a long way towards minimizing the phishing problem.
There would still be issues between users of the same orthography, but in general there is no way to prevent phishing style attacks completely, which fundamentally rely upon people to be careless. Even the current DNS system is vulnerable:
spoofing "cnn.com" with "cnn-news.com" or "cnn.newsnetwork.com" doesnt need i18n support to work at all.
Re:Changing a system (Score:2, Insightful)
I'm all for people using and having websites and domains in their native language and alphabet, however it would be very difficult for me to find traditional Persian music (which I happen to be fond of) if the domain were
On the other hand, I suppose that's how I do it now.
Re:Changing a system (Score:4, Insightful)
I wonder how you got +4 mod points... this makes no sense at all!!
Let's suppose you are are a japanese person and you travel to Brazil. Nevermind if can speak portuguese or not, but then you need to send an e-mail using your company's webmail server from a computer at the hotel. And suppose this webmail server has kanji characters in its URL. How are you going to type them? Believe me, brazilian portuguese Windows has no support for asian languages (at least not by default, and actually I don't know if it's even possible with a regular brazilian Windows XP). What now?
Re:Changing a system (Score:5, Insightful)
I must have missed where Japan conquered 51%+ of the area east of the Ural mountains.
AFAIK (and I'm not an expert), China, Japan, Korea and Vietnam used very similar writing system decended from Chinese Hanji characters. Vietnam and Korea (South Korea at least) later adopted other alphabets. So really, only China and Japan commonly use Hanji/Kanji, and even then, the CJK unification of hanji/hanja/kanji characters really annoyed a few purists when similar hanji/hanja/kanji were merged in unicode.
So, other than hanji/kanji, there is hangul (S. Korea), hana/kana (Japan -- yes, they have more than one writing system!), the Thai alphabet, the Cyrillic alphabet (former USSR), the Arabic alphabet (Middle East), Hebrew (Israel), the Brahmic scripts (India) and the Georgian alphabet. (And this is just off the top of my head, I wouldn't be surprised if there were a few more writing systems in use in Asia!).
And then, just to confuse the problem, there are the various forms of encoding. Admittedly, unicode would probably be one of the better methods, but there are a lot of pre-unicode encodings in common use.
When you expand the problem to be worldwide, there's also the Ethiopian and Greek alphabets that are used in their respective regions. There's also a ton of latin-based alphabets, which introduces many more characters than are currently used in the DNS system. (Including characters that look a lot like existing characters!)
And then you have the problem of alphabets used only by very small groups, such as Cherokee (Oh, I'm going to get flamed!). There are very few people who can write in Cherokee, but does that mean that the Cherokee language shouldn't be part of the DNS system?
Now, can you see why this is a mess?
Re:Changing a system (Score:4, Insightful)
You know, when one sees comments like that, it's not strange that non-7bit ascii countries find themselves rather exasperated with the rate of progress. If you take a few seconds to actually research the issue you'll find both a suggestive lack of multi-thousand key keyboards, as well as a whole host of solutions to that problem.
I mean, I can cut'n'paste chinese and japanese into vi, save the file with a unicode filename, and it'll just work. Earlier valid technical reasons are gone, everyone else has solved this; now the excuses start sounding really hollow.
It's time to drag DNS kicking and screaming out of the dark ages.
Re:Changing a system (Score:3, Insightful)
I think the whole idea is a mistake (Score:5, Insightful)
Instead of changing the fundamental DNS which is a programmer's and administrator's tool, not an advertising medium. It is founded, like programming languages, on a fundamental 7-bit ASCII character set, and is not intended to be used for NLS text.
A far better solution is some form of VDNS that translates NLS text names into the proper domain name at the system level. That also allows the same domain to have multiple language translations to reflect localized product and service names.
We seriously need to kick the general political community in the arse. They keep trying to impose technical decisions, and it fails as miserably as any corporate PHB's uninformed decisions. ASK the techies to propose solutions instead of shoving ill-conceived ideas down our throats.
For example -- once you mandate multibyte domains, you implicitly mandate multibyte URL components. Goodbye direct mapping of names to the directories, file systems, and servers.
Bad idea. Very bad idea.
Re:Changing a system (Score:4, Insightful)
DNS is *precisely* for NLS text (Score:3, Insightful)
It's been obvious since the Europeans got DNS for their ftp and email that there was a problem, even before they invented the web, and even aside from myopic silliness like having
DNS has a couple of restrictions that may have made sense in 1985, long before Unicode was invented. Some of them are easy to fix, especially since most DNS servers in the world use versions of one of three or four server programs, but there's a lot more resolver software out there that deliberately casefolds (though you could fix most of that in two or three generations of Microsoft releases, if you knew what you wanted it to do), and you can fix some of it administratively, by having the people who register UPPERCASE-EXAMPLE.COM also register uppercase-example.com and maybe Uppercase-example.com and do a few similar things for munged Unicode.
Re:Changing a system (Score:3, Insightful)
Why do you need to type in a character for a domain name written in a language you don't even understand the alphabet of, and certainly can't read or write?
And how do you think a chinese keyboard looks? Do you think they have hundreds of thousands of keys? There are three reasons why you can't enter chinese characters into your keyboard, and none of them has to do with hardware:
Yes, you are one of them. The people who want non-latin characters are not wanting them because they want to communicate with other english-speaking people on the Internet. They want them because they want to communicate between themselves, in their own native language. Imagine that you only had hebrew letters available for domain names in the US. The hebraic alphabet is relatively easy to learn, and most english words can be written in it. But it's cumbersome for english-speaking people to communicate with the hebrew alphabet. And that's why people speaking different languages than english, want to be able to write their domain names in different alphabets than english.
Sorry, you are not making sense.
Re:Anything's possible. (Score:2, Insightful)