ICANN Under Pressure Over Non-Latin Characters 471
RidcullyTheBrown writes "A story from the Sydney Morning Herald is reporting that ICANN is under pressure to introduce non-Latin characters into DNS names sooner rather than later. The effort is being spearheaded by nations in the Middle East and Asia. Currently there are only 37 characters usable in DNS entries, out of an estimated 50,000 that would be usable if ICANN changed naming restrictions. Given that some bind implementations still barf on an underscore, is this really premature?" From the article: "Plans to fast-track the introduction of non-English characters in website domain names could 'break the whole internet', warns ICANN chief executive Paul Twomey ... Twomey refuses to rush the process, and is currently conducting 'laboratory testing' to ensure that nothing can go wrong. 'The internet is like a fifteen story building, and with international domain names what we're trying to do is change the bricks in the basement,' he said. 'If we change the bricks there's all these layers of code above the DNS ... we have to make sure that if we change the system, the rest is all going to work.'" Given that some societies have used non-Latin characters for thousands of years, is this a bit late in coming?
Re:Watch out for attacks (Score:5, Informative)
DNS won't break (Score:3, Informative)
So DNS and Web is OK. Any breakage I can think of may appear in email systems or other domain-based forms of communication.
Um... why? (Score:3, Informative)
Why... No really. You speak as if this is a good thing. Why should they be able to use their natural language rather than English? Why shouldn't they be restricted to a limited area of local language speaking people?
The reason the Internet is useful is because everyone speaks TCP/IP. Incompatible protocols are to be actively discouraged because they balkanise the network. Language is exactly the same. The reason the Internet is useful is because everyone speaks English, the more divided it becomes the less useful it becomes.
Languages are anachronisms, the only reason we have more than one is the physical distance between locations and difficulty travelling allowed them to evolve independently. Well that isn't the world we live in any more and the different languages actually make communication far more difficult now. They're no longer beneficial. So get rid of them, insist on a common language. The most popular happens to be English at the moment. I could live with Spanish, but for those of you about to suggest Chinese, read this before deciding: http://www.pinyin.info/readings/texts/moser.html [pinyin.info]
We should be using this opportunity to actively get rid of languages.
you couldnt be more wrong (Score:5, Informative)
i18n on windows is far from "solved".
I do admit that MS had a huge benefit when they started pushing unicode.
(It takes a company with microsoft's level of clout to push around national governments )
And the ASCII problem isn't just bad because it forces people to use inefficient encodings like UTF-8 (THREE bytes per character?)
Perhaps you don't realize that UTF-8 is moving on to become the most dominant character encoding,
and the legacy cruft such as UTF-16 (designed to deal with design flaws in windows) is being phased out.
Even languages that would end up as mostly 3 byte characters tend to benefit from the savings on single byte
characters for control and formatting markup.
I'm not going to harp on about it, but a few basic web searches could enlighten you here.
if(string[index] == '.' || string[index] == '?' || string[index] == '!') sentenceEnd = true;
Code like that *works* in UTF-8, which is one of the things that makes it beatiful. (among many others)
It allows you to deal with world characters sets when it matters, and allows you to ignore them when it does not.
(for example, a lexical analyzer that specifies its tokens does not want to support punctuation from every language ever conceived)
And if you think code like that doesnt exist in the windows world, you are sadly quite naive.
In my experience internationalizing applications, its typically far easier to upate unix applications, which
on occaision need nearly no changes at all, compared to the laborious grind and near total re-write often needed
for ms-windows applications.
Re:Changing a system (Score:5, Informative)
FUD ... just implement IDN everywhere (Score:2, Informative)
IDN is backwards compatible with existing DNS-servers, and has been in use for several years. Mozilla, Firefox, Safari and Opera support it. So does Internet Exploder 7.
Re:Changing a system (Score:2, Informative)
Re:Changing a system (Score:3, Informative)
In Argentina some people have keyboards with spanish language distribution (that is, with extra letters) and some learn the ASCII codes and use the ALT key (along with the code typed in the Numpad) to place accents and the letters Ñ and ñ (which are mandatory as well and can't be replaced by N or n... specially when Año means "year" and Ano means "Anus").
I know of many people that know how to place accents and are just lazy... but I consider that a sign of poor spelling as well, since the best spellers I know use all accents and get a bit of pain every time they find an omission (which normally changes the meaning of the word, makes fluent reading a bit more difficult, and it's just ugly).
Re:Why not? (Score:3, Informative)
Let's take Japanese as an example, and I will give you two reasons why it won't work.
Perhaps if you assume I am Japanese, you will assume that my "default unicode section" is the section containing the Japanese characters. So this works fine if I go to URLs that use hiragana / katakana / kanji, but what if I go to www.google.com? Or www.washingtonpost.com? Or www.citibank.com? (Yes, there are Citi offices in Japan). Are you going to throw up a phishing warning simply because I'm browsing an international site? Because if you do that, you're going to make people so used to seeing those warnings that they will just ignore them and/or turn them off.
Even if your method did work, however, this would still be easy to get around. The original 256 characters are repeated many times, and it just so happens that in the full-width forms (in the CJK sections) they are repeated again. I.e. I can use the letters a-z while still staying within the Japanese section of Unicode, and although these letters are the same visually, they are a different character in the Unicode charset, so you could easily have www.google.com and www.google.com registered entirely in the first 256 characters of Unicode or entirely in the full-width form section of Unicode, and there would be no discrepancy whatsoever.
The problem is a lot more complicated than you make it out to be.
Re:Can't trust your browser's address bar anymore. (Score:3, Informative)
You're assuming that characters belong exclusively to one language. Try telling a French guy that he can't register café.com because 'c' 'a' and 'f' are English, not French.
Re:Changing a system (Score:4, Informative)
If your keyboard has a compose key then you can often compose a glyph from two similar looking glyphs. For example, for an o with an umlaut, " o -> ö (though I expect Slashdot will filter that character out).
Macintosh users have an Option key that they can use to make weird glyphs (option-8 for the infinity symbol, option-g for the copyright symbol, etc). On most operating systems, various other combinations of the Ctrl/Shift/Meta/Alt/AltGr modifier keys and regular keys will allow you to type more glyphs. Most desktop environments also have an on-screen keyboard type program that ease experimentation in this area.
Users of complex (e.g, Asian) scripts have a host of input methods to choose from and configure.
Finally, if all else fails, create a text file full of your faviourite non-ascii characters and resort to the tried and tested method of copying and pasting!
Re:Changing a system (Score:2, Informative)