ICANN Under Pressure Over Non-Latin Characters 471
RidcullyTheBrown writes "A story from the Sydney Morning Herald is reporting that ICANN is under pressure to introduce non-Latin characters into DNS names sooner rather than later. The effort is being spearheaded by nations in the Middle East and Asia. Currently there are only 37 characters usable in DNS entries, out of an estimated 50,000 that would be usable if ICANN changed naming restrictions. Given that some bind implementations still barf on an underscore, is this really premature?" From the article: "Plans to fast-track the introduction of non-English characters in website domain names could 'break the whole internet', warns ICANN chief executive Paul Twomey ... Twomey refuses to rush the process, and is currently conducting 'laboratory testing' to ensure that nothing can go wrong. 'The internet is like a fifteen story building, and with international domain names what we're trying to do is change the bricks in the basement,' he said. 'If we change the bricks there's all these layers of code above the DNS ... we have to make sure that if we change the system, the rest is all going to work.'" Given that some societies have used non-Latin characters for thousands of years, is this a bit late in coming?
Changing a system (Score:5, Insightful)
Wont this open up the system to many more phishing attacks involving addresses which include non-latin characters which look similar to latin ones?
Re:Changing a system (Score:5, Insightful)
And, of course, you need to make sure when someone types this into a browser some major DNS server someplace won't crash.
I'm all for adding non-latin characters. But I do recognize that it should be a slow process.
Re: (Score:2)
Re:Changing a system (Score:4, Informative)
If your keyboard has a compose key then you can often compose a glyph from two similar looking glyphs. For example, for an o with an umlaut, " o -> ö (though I expect Slashdot will filter that character out).
Macintosh users have an Option key that they can use to make weird glyphs (option-8 for the infinity symbol, option-g for the copyright symbol, etc). On most operating systems, various other combinations of the Ctrl/Shift/Meta/Alt/AltGr modifier keys and regular keys will allow you to type more glyphs. Most desktop environments also have an on-screen keyboard type program that ease experimentation in this area.
Users of complex (e.g, Asian) scripts have a host of input methods to choose from and configure.
Finally, if all else fails, create a text file full of your faviourite non-ascii characters and resort to the tried and tested method of copying and pasting!
Re: (Score:3, Insightful)
Why do you need to type in a character for a domain name written in a language you don't even understand the alphabet of, and certainly can't read or write?
And how do you think a chinese keyboard looks? Do you think they have hundreds of thousands of keys? There are three
Re:Changing a system (Score:5, Insightful)
Let's say for instance we have an online shop for tea called "Sólo Té" (Tea Only). Both accents are due to irregular rules ("Sólo" = "Only" and "Solo" = "Alone", "Te" is a personal pronoun and "Té" = Tea). Some people would try the current www.solote.com, others would try the correct www.sóloté.com, some would try www.sólote.com and yet others www.soloté.com depending on their spelling capabilities.
What this basically means is that in order to make sure everybody finds your domain and to avoid phishing you have to register four different domains.
A solution to this problem could be what Google does right now with accents: map them to the unnacented vowel. Thus "Solo Te" and "Sólo Té" would both find the "Sólo Té" store.
Re: (Score:2)
Re:Changing a system (Score:4, Interesting)
I don't have mediocre English spelling, and I would use the correct accented characters in English words like "naive" - except I don't know how to type those characters. Like many people, I know how to type the characters that are on the keyboard. Additionally, because there's no need for me to type characters outside the ones printed on the keys on my keyboard to make the internets come down my tubes, I have no incentive to learn how to type any differently than I already do.
It's not necessarily a matter of spelling ability.
Re: (Score:3, Informative)
In Argentina some people have keyboards with spanish language distribution (that is, with extra letters) and some learn the ASCII codes and use the ALT key (along with the code typed in the Numpad) to place accents and the letters Ñ and ñ (which are mandatory
The GNS System? (Score:5, Interesting)
Actually, DNS arguably is a giant search engine, which simply works on a 1:1 relationship and uses a distributed database (you input one piece of information, and it gives you some corresponding piece of information back). Replacing it with a 'fuzzier' search engine that would give you back a number of results, ranked by relevance, isn't that huge a leap.
Re: (Score:3, Insightful)
I think the whole idea is a mistake (Score:5, Insightful)
Instead of changing the fundamental DNS which is a programmer's and administrator's tool, not an advertising medium. It is founded, like programming languages, on a fundamental 7-bit ASCII character set, and is not intended to be used for NLS text.
A far better solution is some form of VDNS that translates NLS text names into the proper domain name at the system level. That also allows the same domain to have multiple language translations to reflect localized product and service names.
We seriously need to kick the general political community in the arse. They keep trying to impose technical decisions, and it fails as miserably as any corporate PHB's uninformed decisions. ASK the techies to propose solutions instead of shoving ill-conceived ideas down our throats.
For example -- once you mandate multibyte domains, you implicitly mandate multibyte URL components. Goodbye direct mapping of names to the directories, file systems, and servers.
Bad idea. Very bad idea.
DNS is *precisely* for NLS text (Score:3, Insightful)
Internet != Web, and other IDN technical issues (Score:3, Interesting)
The reason ICANN wants to do lots of testing (after having dragged their feet for years before getting started) is that IDNs fundamentally change how DNS works, and it's really important not to break too much when you do that (not that ICANN traditionally worried about that.) It's *not* simple, and
Re: (Score:2)
Perhaps even whole idea of encoding alphabets is a relict (and biased to phonetic alphabets, as well)? Today computers have enough power to operate on pictures as UI, so why don't we switch to shape-based data processing?
That would instantly break digital divide between present and history (think digitization of ancient documents) and between various cultures.
As bonus, we get to ditch keyboard-induced RSI, that feeling of being constrain
Re:Changing a system (Score:5, Insightful)
Re:Changing a system (Score:4, Insightful)
Re:Changing a system (Score:5, Insightful)
Even worse, although your problem is reason enough to postpone doing this change. It will break the very idea of the Internet as a common when URLs can't even be typed in on all keyboards. There are good reasons why DNS didn't even include the whole ASCII set. Least common denominator is a good design decision. Every character currently allowed is easy to generate on ALL keyboards, can be printed in an unambigious way by EVERY printing system, etc. Remember that a lot of wire services aren't even 7-bit ASCII clean, email addresses on a lot of news wires have to use (at) instead of @.
More bluntly, of what use is the parts of the Internet I can't even type the domain name for? As things now stand I CAN, and have, snarfed firmware directly from
At a minimum, unicode DNS should be restricted to IPv6 ONLY. No sense wasting scarce IPv4 resources on supporting walled off ghettos.
Re: (Score:2)
There might be some places that would like to block going to sites that don't have certain character sets as their name.
Re:Changing a system (Score:5, Insightful)
As far as Japanese go, there are very usable technologies that allow to type in kanji. Using a standard latin keyboard. It works pretty well, and i'm not sure what other languages have such options available, but since most of Asia uses the same kanji system I'm pretty sure that at least Asia has viable typing options.
'of what use is the parts of the Internet I can't even type the domain name for?'
Its of no use... to you. But then again, can you read Japanese, Korean, Arabic, Sanskrit or any other non-latin language? no? Then your usability isn't in question here.
Re:Changing a system (Score:4, Insightful)
I wonder how you got +4 mod points... this makes no sense at all!!
Let's suppose you are are a japanese person and you travel to Brazil. Nevermind if can speak portuguese or not, but then you need to send an e-mail using your company's webmail server from a computer at the hotel. And suppose this webmail server has kanji characters in its URL. How are you going to type them? Believe me, brazilian portuguese Windows has no support for asian languages (at least not by default, and actually I don't know if it's even possible with a regular brazilian Windows XP). What now?
Re:Changing a system (Score:5, Insightful)
I must have missed where Japan conquered 51%+ of the area east of the Ural mountains.
AFAIK (and I'm not an expert), China, Japan, Korea and Vietnam used very similar writing system decended from Chinese Hanji characters. Vietnam and Korea (South Korea at least) later adopted other alphabets. So really, only China and Japan commonly use Hanji/Kanji, and even then, the CJK unification of hanji/hanja/kanji characters really annoyed a few purists when similar hanji/hanja/kanji were merged in unicode.
So, other than hanji/kanji, there is hangul (S. Korea), hana/kana (Japan -- yes, they have more than one writing system!), the Thai alphabet, the Cyrillic alphabet (former USSR), the Arabic alphabet (Middle East), Hebrew (Israel), the Brahmic scripts (India) and the Georgian alphabet. (And this is just off the top of my head, I wouldn't be surprised if there were a few more writing systems in use in Asia!).
And then, just to confuse the problem, there are the various forms of encoding. Admittedly, unicode would probably be one of the better methods, but there are a lot of pre-unicode encodings in common use.
When you expand the problem to be worldwide, there's also the Ethiopian and Greek alphabets that are used in their respective regions. There's also a ton of latin-based alphabets, which introduces many more characters than are currently used in the DNS system. (Including characters that look a lot like existing characters!)
And then you have the problem of alphabets used only by very small groups, such as Cherokee (Oh, I'm going to get flamed!). There are very few people who can write in Cherokee, but does that mean that the Cherokee language shouldn't be part of the DNS system?
Now, can you see why this is a mess?
Re:Changing a system (Score:5, Insightful)
Just because the letters aren't printed on your keyboard doesn't mean it won't type them. Have a look at the list of keyboard layouts in your OS. Sure, it's an inconvenience for you, but less of an inconvenience than it is to the people for whom it is a barrier to entry. Or you could use Google - a lot of people don't even bother typing in domain names any more, they just search.
The whole point about this is that it avoids walled gardens, because the DNS records are still held by ICANN. The alternative is that China decides it's had enough, and creates its own root servers, causing a very real split.
Re:Changing a system (Score:4, Insightful)
You know, when one sees comments like that, it's not strange that non-7bit ascii countries find themselves rather exasperated with the rate of progress. If you take a few seconds to actually research the issue you'll find both a suggestive lack of multi-thousand key keyboards, as well as a whole host of solutions to that problem.
I mean, I can cut'n'paste chinese and japanese into vi, save the file with a unicode filename, and it'll just work. Earlier valid technical reasons are gone, everyone else has solved this; now the excuses start sounding really hollow.
It's time to drag DNS kicking and screaming out of the dark ages.
Re: (Score:2, Insightful)
Re: (Score:2, Funny)
"...in a row??"
Re: (Score:3, Insightful)
Potentially, yes. But I'm not too bothered about that. Protecting people from their own stupidity is rarely a good long term strategy. However, i18n for DNS is a particularly bad idea for purely pragmatic reasons. Currently, anyone anywhere in the world can go to any URL in the world in their web browser. If we allow the full range of unicode characters, that s
Re: (Score:2)
Re: (Score:2)
Its just a fact of life that the encoding scheme implemented has a limited set of characters that is readable by the technically adept people who built the thing.
Its a great idea to enable lookup by character strings using alphabets from other languages but if it takes time to implement a global standard thats just too bad.
If they are desperate to implement lookup in their own character sets then let them get on with it - b
Re:Changing a system (Score:5, Informative)
What? (Score:5, Funny)
Anyone else getting more lost every day?
Re: (Score:3, Funny)
Re: (Score:2, Funny)
Re: (Score:3, Funny)
Re: (Score:2)
Internet Layers (Score:3, Funny)
2. DataLink
6. Presentation
7. Application
8. Tubes
9. Bricks
10. Porn
11. Google
12. YouTube
13. ??
16. Profit
It was hard enough remembering them all back when there were only 7.
Comment removed (Score:5, Funny)
Re: (Score:2, Interesting)
Seriously... How many mail servers are going to freak out because they can't handle unicode?
Base-Ten BIGOTRY, I say!!! (Score:4, Funny)
Now if you'll excuse me, I need to finish reading all the new posts on 66.35.250.150.
Base-Ten CHAUVINIST!!!
What about societies that use Base 2 [binary], or Base 8 [octal], or Base 16 [hexadecimal]?
Or entire societies, like the British empire, which use no base at all?
12 inches in a foot. 3 feet in a yard. 1760 yards in a mile...
60 seconds in a minute. 60 minutes in a hour. 24 hours in a day. 7 days in a week. 52 weeks in a year [give or take]...
Or how about base 12?
12 keys in a chromatic scale: A 440, then, logarithmically [give or take a little well-tempering [amazon.com]]: A#, B, B# == C [kinda sorta], C#, D, D#, E, E# == F [kinda sorta], F#, G, G#, and finally A 880.
Except that on the continent, things are often just a little sharper - say A 443/444/445 & A 886/888/890...
And let's not even get into water freeezing & boiling at 32 & 212 versus 0 & 100...
Re: (Score:2)
So nyeeer.
(Unfortunately, SlashCode mangles IPv6 addresses, so don't bother clicking.)
Yes and No (Score:5, Insightful)
Re:Yes and No - thinking long term (Score:2)
Um... why? (Score:3, Informative)
Why... No really. You speak as if this is a good thing. Why should they be able to use their natural language rather than English? Why shouldn't they be restricted to a limited area of local language speaking people?
The reason the Internet is useful is because everyone speaks TCP/IP. Incompatible protocols are to be actively discouraged because they balkanise the network. Langu
Comment removed (Score:5, Interesting)
Comment removed (Score:4, Interesting)
Break the whole Internet? (Score:2, Funny)
Plans to fast-track the introduction of non-English characters in website domain names could 'break the whole internet', warns ICANN chief executive Paul Twomey
Luckily for us, GWB knows that we have some redundancy with the Internets, so if one breaks we can just use another.Re: (Score:2)
Maybe it's time to get rid of Bind? (Score:2)
Re: (Score:2)
That would be a good reason to get the UN in (Score:2, Funny)
Late in coming? (Score:3, Insightful)
If the fault lies with anyone, it's with the individual contributers of the tech. Or better, with the non-latin countries appearent lack of interest in some of the core projects needed to push this through ICANN ( specifically DNS, httpd ).
Re: (Score:2)
Re: (Score:2)
Yes, the Vatican State, back in the MCMLX's.
When you've built on a foundation of straw- (Score:4, Insightful)
DNS upheaval has been a long time coming, and the current anti-American sentiment worldwide isn't exactly helping to stabilize it. We're already seeing all sorts of adhoc routing setups that deal with shortcomings of an ameri-centric DNS. My guess is that within the next few years, ICANN's 'control' of the internet will be in name only as everyone else in the world will have moved on to alternative routing and domain systems.
Re: (Score:2)
Re: (Score:2)
Re: (Score:3, Insightful)
Re: (Score:3, Insightful)
Stupid question (Score:4, Insightful)
No.
Zonk either knows zero about the histories of the Internet or DNS, or is so enamored of finishing stories with questions that he'll tack on the truly ridiculous.
Re: (Score:2, Offtopic)
I like your sig... it's just not accurate. You've focused to much on a particulary component of the larger problem and have failed to recognize the actual whole of the issue. Here's a correct understanding of the problem.
Read the news and some history. Is organized humanity currently a net win, or a dead loss?
Watch out for attacks (Score:5, Insightful)
Re:Watch out for attacks (Score:5, Informative)
Re: (Score:2)
Re: (Score:2, Insightful)
Re: (Score:3, Informative)
Let's take Japanese as an example, and I will give you two reasons why it won't work.
Perhaps if you assume I am Japanese, you will assume that my "default unicode section" is the section containing the Japanese characters. So this works fine if I go to
Sure, go 'head (Score:4, Insightful)
I'd be in favor of the change just because anything that undermines the Unix Tower of Babel -- the dependency on ASCII which complicates text handling sooooo much even when Windows solved the problem soooo long ago -- is good. Even Java gets it. Even Apple (finally) get it. Unix Is Teh Problem.
And the ASCII problem isn't just bad because it forces people to use inefficient encodings like UTF-8 (THREE bytes per character?) It's bad because it allows people to write code like:
if(string[index] == '.' || string[index] == '?' || string[index] == '!') sentenceEnd = true;
(a line repeated, with subtle variations, several hundred times in the code of a certain ubiquitous editor).
And, lo and behold, the above does not work, but once it appears in a few thousand places it's impossible to fix, and a vast towering structure of fixes made by people who don't really understand why it's an issue is built.
So, even though the proposed change would be hugely inconvenient for a huge number of people, I'm in favor, because I want the world to grow the fork up and understand that text != byte array some time while I'm still alive.
Re: (Score:2)
What the hell are you talking about??
you couldnt be more wrong (Score:5, Informative)
i18n on windows is far from "solved".
I do admit that MS had a huge benefit when they started pushing unicode.
(It takes a company with microsoft's level of clout to push around national governments )
And the ASCII problem isn't just bad because it forces people to use inefficient encodings like UTF-8 (THREE bytes per character?)
Perhaps you don't realize that UTF-8 is moving on to become the most dominant character encoding,
and the legacy cruft such as UTF-16 (designed to deal with design flaws in windows) is being phased out.
Even languages that would end up as mostly 3 byte characters tend to benefit from the savings on single byte
characters for control and formatting markup.
I'm not going to harp on about it, but a few basic web searches could enlighten you here.
if(string[index] == '.' || string[index] == '?' || string[index] == '!') sentenceEnd = true;
Code like that *works* in UTF-8, which is one of the things that makes it beatiful. (among many others)
It allows you to deal with world characters sets when it matters, and allows you to ignore them when it does not.
(for example, a lexical analyzer that specifies its tokens does not want to support punctuation from every language ever conceived)
And if you think code like that doesnt exist in the windows world, you are sadly quite naive.
In my experience internationalizing applications, its typically far easier to upate unix applications, which
on occaision need nearly no changes at all, compared to the laborious grind and near total re-write often needed
for ms-windows applications.
Can't trust your browser's address bar anymore. (Score:2, Interesting)
Unicode has many characters that look almost exactly like characters in Latin-1.
For example, if "www.microsoft.com" is shown in your browser's address bar, how would you know for sure that the "c" is not from the Cyrillic alphabet, or the "o" is not from the Greek alphabet?
You simply won't be able to trust your browser's address bar anymore. The possibilities for phishing attacks are endless.
Re:Can't trust your browser's address bar anymore. (Score:5, Insightful)
Cyrillic users would see www.**c******.com, latin users would see www.mi*rosoft.com?
Or better yet, put up a big warning that it's using mixed alphabets?
Re: (Score:3, Insightful)
In general, browsers ought to make users more aware of the parts of their current URL, and maybe also of link destinations (also mail client).
For example, seperate the URL into its parts (scheme, host, path). Display some of the WHOIS info below the hostname, and some info from the SSL certificate if it has one.
This would help people spot phishing scams or other suspicious activity.
Reed
Re: (Score:3, Insightful)
Registrars shouldnt accept such names in the first place though: Is there a valid reason to ever have a domain name with stray characters mixed in from different languages?
If a standard were to specify that a domain name must use a subset of unicode that is self-consistent, and that browsers should turn the address bar red to warn anytime a domain uses characters not in the users selected languages subsets, that would go a long way towards minimizing the phishing problem.
There would still
Re: (Score:3, Informative)
You're assuming that characters belong exclusively to one language. Try telling a French guy that he can't register café.com because 'c' 'a' and 'f' are English, not French.
punicode? (Score:2)
Re: (Score:2)
It's a little late, don't you think? (Score:2, Funny)
URL goldmine. (Score:4, Insightful)
Nice for localising, sure, but how usable will Japanese, Indian, or Arabic script URLs -- for example -- be for those who do not have access to the respective sets or keyboard layouts?
compounding one mistake with another (Score:3, Insightful)
But that doesn't mean it should be done hastily and badly.
Caprican and Tauron letter systems (Score:2)
And he will not rest until the script of each of the 12 Colonies is properly represented with ICANN. I hear he's not too keen on Cyrillic, however.
English, not latin languages (Score:3, Insightful)
Let's be clear. The domain name system only uses English characters. There are lots of languages in Europe (Italian, Spanish, French...) which are closer to latin than English (which isn't really a latin language at all) which are not currently represented, because you can't use accents in domain names, or other letters such as the spanish Enye (n with a squiggle, actually a distinct letter). English speakers often think accents aren't important but they can completely change a word's meaning.
Re: (Score:3, Insightful)
Why does this matter? Well, one argument is that it doesn't, much: if I want to view a Chinese website I'm probably in China and can input Chin
Re: (Score:2)
Yes, I am an English speaker, and throughout the day I often stumble across the recurring idea that accents are of no particular use to determing the meaning of a word. I would go so far as to say that I often think accents just aren't important. I'm glad Slashdot has you around to set things straight.
Not a trivial job (Score:4, Insightful)
Changing all the DNS servers in the world to switch from ASCII to Unicode is NOT trivial. The fact that some societies have used non-latin characters for thousands of years is completely and utterly irrelevant. THEY didn't make the internet. They simply bolted themselves on to an existing infrastructure.
I agree that progress needs to be made to accomodate non-latin characters, but to have people whining about "how they want it, and want it now"... That's just ridiculous. It's like waltzing into a house that was built 40 years ago and having a tantrum because the stairs are too steep and the house is too squished. Major structural renovations take time, effort, and careful planning. And there is nothing you can do to avoid that, short of implementing cheap stop-gap measures that are virtually guaranteed to cause even bigger unintended headaches later on.
thousands of years? (Score:4, Insightful)
Those societies did not build an entire economic and social infrastructure using all 50,000 of those characters in a few decades, though.
Better idea! (Score:2, Funny)
(It was a joke... well sort of)
Huh? (Score:5, Funny)
You mean white people?
Use a simple eight dot three kludge (Score:3, Interesting)
2. Doesnt require any change to the DNS system. (other than some name policy changes)
3. Allows links to be imbedded in normalweb-pages so that they can be cut and pasted by anyone with latin functionality. So a Japanese person could cut and paste the link to some arabic site that they dont have the font for.
4. While this is a kludge it has some major advantages over rebuilding the DNS system.
Storm
Re: (Score:2)
Storm
DNS won't break (Score:3, Informative)
So DNS and Web is OK. Any breakage I can think of may appear in email systems or other domain-based forms of communication.
Re: (Score:2)
This will change the Internet into ... (Score:2)
No matter what, english-language net will continue to be *the* Internet, a global Forum, direct connection between common people from all parts of the world ( Hey there!
All the other nets will have quite a marginal significance. Nations will try to boost them in order to keep their citizens indoctrinated with own traditional values, but things that do not fly by thems
Why shouldn't this be "easy?" (Score:2)
.cn (Score:3, Interesting)
domain in their local language. Leave
Pléåsé ñø (Score:3, Funny)
Dibs on ©óm
What's this going to do for security .. (Score:4, Interesting)
"A domain name is a unique address that allows people to access a website, for example, smh.com.au"
No,a domain name is a sequence of characters mapped to an IP address. It was designed so as you won't have to remember 66.35.250.150 instead of slashdot.org. This wasn't a problem while the original Internet consisted of just four computers. DNS was never designed to provide identity. There was also the case of a stock trader hacking a DNS server and redirecting traffic from a legitimate finantial site to his own where he had duplicated the real site only with bogus information.
"He said that this could create problems where, for example, a character in Urdu looks identical to one in Arabic"
It sure could. How about totally replacing DNS with a system of online identities.
Horrible indeed (Score:3, Interesting)
no wonder the middle east (arabic) countries are especially wanting this, because the majority of the inexperienced internet users there will be more likely to easily use these domain names, hence the sites using those domains will be greater incentive for controlling what they see, because these domains will be under their control nationally.
not only this, but we as it people will be very unwilling to change all our software to adapt with the new situation because of the horrible development/testing/implementation involved, and hence wont be accepting these domains as valid in our network traffic, which will create a second internet which is as described above, less free.
this should not be allowed.
Bad for phishing (Score:3, Interesting)
Re: (Score:2)
Re: (Score:2, Troll)
Were some random non-UTF8 country to make interworking with the rest of the In
Re: (Score:2)