Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Security

Spoofing URLs With Unicode 433

Embedded Geek writes: "Scientific American has an interesting article about how a pair of students at the Technion-Israel Institute of Technology registered "microsoft.com" with Verisign, using the Russian Cyrillic letters "c" and "o". Even though it is a completely different domain, the two display identically (the article uses the term "homograph"). The work was done for a paper in the Communications of the ACM (the paper itself is not online). The article characterizes attacks using this spoof as "scary, if not entirely probable," assuming that a hacker would have to first take over a page at another site. I disagree: sending out a mail message with the URL waiting to be clicked ("Bill Gates will send you ten dollars!") is just one alternate technique. While security problems with Unicode have been noted here before, this might be a new twist."
This discussion has been archived. No new comments can be posted.

Spoofing URLs With Unicode

Comments Filter:
  • by donnacha ( 161610 ) on Monday May 27, 2002 @09:53PM (#3593136) Homepage


    So, what would be the cyrillic for Slashdot.org?

  • old trick (Score:3, Interesting)

    by krokodil ( 110356 ) on Monday May 27, 2002 @09:53PM (#3593137) Homepage
    It is widely used on russian-language IRC
    networks like RusNet. http://www.irc.net.ru/

  • by aoihai ( 518418 ) on Monday May 27, 2002 @09:56PM (#3593148)
    Anyone else remember using alt+255 and other special characters to make hard to open directories (idiot proof anyway) on shared command line systems?
    • There was something from way back when that would effectively "lock" a folder in explorer in Win95 and make it damn hard to open in DOS. I think you had to go to a prompt and tag an +255 or something to the leading or trailing edge of the folder name.
    • Tabs, spaces, dots and even backspaces... *sigh* those were the days directories could still be confusing, hidden and full of illegal software.. On second thought, nothing changed realy, except that now nothing on the net is hidden anymore ;-)
    • Yes! I remember making a whole tree of directories like that on my parents' 286 back when I was a kid, to keep my stuff hidden (I very conveniently ignored the fact that my dad had about 15 years of computer experience even then, because I was obviously being so clever). In order to remember where it was myself, I used the digits of pi, so the top directory had 3 Alt-255's, the next one had 1, the next 4, and so on. For a fifth grader I sure was proud of myself--my very own passworded directory, that no one could possibly guess at!

      . . .

      Sure is a good thing that computer isn't still hanging around . . .

  • by Anonymous Coward on Monday May 27, 2002 @09:57PM (#3593155)
    Should I be concerned?
  • Workaround (Score:3, Insightful)

    by neolazer ( 580790 ) on Monday May 27, 2002 @09:57PM (#3593158) Homepage
    What is InterNic and such doing in the meantime to help prevent spoofs such as this? The Legal ramifications of this are interesting. One could also post stories with false links, that most people would never even realize weren't true.
    • It doesn't take that much effort to fool people.

      I took a call from a customer who got Windows with his pc and wanted a refund because he was going to use linux.... (i think he's smart)

      While we're talking about my recent install of xfree86 4.2.0 he mentions MS is taking over linux now with MS Linux [mslinux.org] and belives it's real. (i realize he's dumb)

      I spent the required time to guide him to the domain whois to show him this page isn't owned by Microsoft. He thought the quotes on the page are real....

      sigh, anyway it's not hard to fool people
  • by Anonymous Coward on Monday May 27, 2002 @10:05PM (#3593184)
    people seem to be missing the point in this thread. Here is why this is very important.

    When you pay money, say with paypal.com, you always want to check the URL. Of course someone could have fake link like: "click here to pay with paypal" and then redirect you to their bogus site with the intention of stealing your passwords. But it would be fairly obvious from the location bar in the broswer that the URL was not paypal.com. But if unicode can be used to spoof the location bar then it will rope in even cautious users.

    • by JesterOne ( 214933 ) on Monday May 27, 2002 @11:15PM (#3593382)
      Even better... I seem to recall a scam that did just that with paypal. They sent out bulk mail about updating your account or something but the link was not paypa(lower case 'L').com but paypa(Capital 'I').com and had made a carbon-copy of paypal's website, hoping you would log in. The address in the location bar looks identical for both. This sounds like the same kind of thing but using Unicode to make the spoof.
    • I wish browsers showed IPs as well as URLs in
      the location box. Can I get Mozilla to do this?
    • But if unicode can be used to spoof the location bar then it will rope in even cautious users

      Ewww.....

      I wonder if unicode is even supposed to be allowed in domain names. If not, maybe this will prompt Microsoft, Mozilla, and the like to error when the host/domain contains unicode.

      Until then, I guess I'll retype the host/domain of any site I intend to log into before doing so. What a pain.

      If unicode is valid, heaven help us. With registrars charging as little as $8 for a domain, you know they aren't checking these things.

      You're my hero, AC.
  • by SwellJoe ( 100612 ) on Monday May 27, 2002 @10:05PM (#3593185) Homepage
    I recently received an email from a confused user who had received an email that appeared to be from Apple, and was selling Apple products using Apple logos, Apple website concepts and images, etc., but was not from Apple. He didn't sign up for the list, and though it appeared to be a legitimate Apple affiliate as far as I could tell (though perhaps one that used somewhat shaky methods to reach customers), he was confused why Apple was sending him email that he didn't ask for. It was his belief that the mail had actually come from Apple, because it looked like it was from Apple.

    Non-nerds have proven to be extremely difficult to educate on the concept that "what email claims to be is not always what email is, and where it claims to come from is not always where it really came from". During the recent Klez outbreak, I even received a message from a nerd-friend saying that he thought my machine might be infected, because he received an infected message from "me". Of course it was spoofed, because I happen to be in a lot of peoples address books, but since I haven't used Windows on the desktop in over three years, it clearly didn't actually originate with my box.

    Folks are just kinda thick about questioning the veracity of claims (hell, astrology still sells books and 900-number phone calls). And this could definitely be used for nasty purposes...and certainly will. Spammers will have a field day with this, because they can't help but seem 'fly by night' because they cannot establish a real brand name due to the disgusting nature of their busines. If they stand still, they'll get lynched. But if they can, even for a short time, hijack a real name that people trust, and offer up a too-good-to-be-true scam under that trusted name...well, you see where I'm going with this.

    Of course, everyone here knows that unsolicited "business offers" by email are always scams run by filthy people...but my grandmother doesn't know it, nor do my parents or many of my non-nerd friends for that matter.

    Just a thought. We'll see how it plays out, I reckon...

    • Re: (Score:2, Flamebait)

      Comment removed based on user account deletion
      • No offense intended, but if you associates aren't smart enough to distinguish between a scam and a legitimate e-mail, than you need to let thme get burnt a few times. Either that or get them off the Internet.


        Yep, you're right. Let's make all the grandmothers stay in their rocking chairs where they belong. The internet is for young, savvy nerds. Knitting is for old people.

        Seriously, I understand your perspective, and it isn't as though I'm suggesting legislation or something stupid like that (I'm anti-government on all issues)...I'm just saying I think people will get scammed using this method. And I think it may be damaging to legitimate companies as well. This is unfortunate on two counts...it is bad for my grandmother, and yours, and it is bad for honest businesses who would never use spam marketing or pull some kind of bait-and-switch, or just plain ol' scam.

        That's all...I don't have solutions. I'm just griping about the problem. Isn't that what slashdot is for, hand-wringing and griping?

      • if you associates aren't smart enough to distinguish between a scam and a legitimate e-mail, than you need to let thme get burnt a few times.
        This is family we're talking about, not "associates"... you let family get burnt and you're getting fruitcake for christmass... for life.
        Either that or get them off the Internet.
        Ah, but then you couldn't get the pictures of the cousin's sister's kids emailed every time they get an award at school. Or the forward of the forward of the quoted forward of the latest monster joke to wander the 'net.
  • by saveth ( 416302 ) <cww@deWELTYnterprises.org minus author> on Monday May 27, 2002 @10:10PM (#3593200)
    I develop applications for a DSP company [signalogic.com], and we've recently switched to using Unicode in our products. Unicode certainly has its quirks, and this is one of the more obvious ones. I fail to see why it has been implemented so widely, without very, very rigorous testing.

    Actions like the one described in this article could bring down a company, if a person tried hard enough. Of course, Microsoft could just call Verisign and ask them to remove the Cyrillic domain, with no problems. But, for a small company, it could be hell. An entire user group using the same character set to access a certain website would be sent to a different site. In a worst case scenario, anti-company propaganda might be posted on the spoofing site, and it would deter people from visiting the "real" site in the future.

    The only solution I can imagine is to simply prevent the translation of characters among character sets, especially in this sort of environment.

    A Russian site, such as The Moscow Times [themoscowtimes.com], could have its site spoofed in exactly the same manner, and everyone using the Cyrillic character set (obviously, widely used in Russia, for example) would be sent to some other site, possibly indefinitely, knowing how registrars have been acting lately. This would create havoc for the newspaper and significant hurt revenue.
    • "we've recently switched to using Unicode in our products."

      ...why?
      • "we've recently switched to using Unicode in our products."

        ...why?


        It started out with a weirdness in Windows 2000 we had to work around, and it involved using the Win32 API TCHAR data type, so that it could compile on both Unicode-enabled systems and ANSI character systems.

        To make a long story short, we were forced to enable Unicode in one of our products; then, we thought it a good idea to have all our products capable of internationalised data.

        Yeah. That. :P
    • Unicode certainly has its quirks, and this is one of the more obvious ones. I fail to see why it has been implemented so widely, without very, very rigorous testing.

      It was tested; this is considered acceptable, as there are no workarounds.

      There will be look alikes in Unicode, just like there are in ASCII. Prior character sets, including KOI8-R, ISO-8859-5, ISO-8859-7, and JIS X0213 - pretty much every character set with either Cyrillic or Greek in it - have the Cyrillic or the Greek A seperate from the Latin A. Besides backward compatibility, proper multilingualization calls for them to be kept seperate; what's the lowercase A look like, if
      the Greek and Latin A are merged?
  • by Sanity ( 1431 ) on Monday May 27, 2002 @10:17PM (#3593217) Homepage Journal
    Amazing how many comments betray the fact that people haven't read the article.

    At the moment these unicode domain names will not be displayed correctly by web-browsers, rather you will see a bunch of cunfusing control codes, so this threat isn't really a problem yet.

    Of course, the underlying problem is that DNS is an ugly kludge which has long-outgrown itself. The administrative cost of constructing a massive global namespace is vast, and we can all see the opportunities for cyber-squatting it creates, to the detriment of the public interest.

    These days I am more likely to go to Google and type in a few words, rather than try to guess the URL. The task of finding the website you are interested in should be left to the specialists (like Google and other search engines), we shouldn't try to maintain an ugly, broken, monopolistic, and expensive "first come first serve" architecture like DNS.

    There is no good reason why a web user should ever need to see a URL (except perhaps momentum), any more than they need to see the HTML which makes up a document.

    • Of course, the underlying problem is that DNS is an ugly kludge
      Will IPv6 use DNS or something different?

      Obviously with IPv7 we'll just have to ask lain to send us to the right site.

      • Will IPv6 use DNS or something different?
        IPv6 won't use DNS any more than IPv4 uses DNS. In other words, Neuther IPv4 nor IPv6 "use" DNS at all. DNS is just a single mechanism for resolving hostnames to IP addresses, and vice-versa. I think what you may have meant to ask was if DNS will be used to resolve IPv6 addresses/hosts, and the answer is, at least on the Internet, yes. The RFCs for DNS have included IPv6 record types (type AAAA) for a long time, and most DNS servers support them. However, anyone is still free to use DNS, NIS/NIS+ or even /etc/hosts (or any other name-resolution service you can think of) on their own networks. Just don't expect the world to be able to see it.
    • There are potentially two uses for domain names:
      1. Guessing a company's domain. Eg, I want to see if there are any recalls on my Pontiac Grand Am, so I type in pontiac.com. As you've noted, this usage of domain names is pointless - instead, I'll go to google and type in "pontiac grand am recalls."
      2. Recalling a domain name. I go to a cyber café and want to check slashdot. I type in slashdot.org into the browser and I read slashdot.

      Now, I agree that use (1) is dead. However, I don't want to have to remember 64.28.67.150 to read slashdot, nor do I want to be dependent on google to find slashdot. Think of the pontiac example, where I'm looking for a specific page: google rankings change, but domain names change less often. If google decides they don't like the American Communist Party, I may have a hard time finding their website without DNS, whereas google does not control the cpusa.org domain name.

      There are also other, less obvious, uses for DNS. For example, I can type in ftp11.freebsd.org and see if that's faster than ftp6.freebsd.org, without having to search for the FreeBSD mirrors page. You can also publish spammer's IP addresses to DNS tables, like what RBL does. That means when I write my MTA, I don't need a full HTTP engine in it along with an XML/SGML/HTML/WHATEVERML parser, but I can just do a simple "gethostbyname()" and see if that returns an error. There are lots of other creative abuses for DNS.

      Anyway, I think there's still a real need for DNS; however, DNS administration leads to so many politics...this article mentions a technical problem, but the real problems are social/political. These problems are much harder to solve.

  • From the article:

    But are international domain names even necessary? Kuhn, who is German, doesn't think so: "Familiarity with the ASCII repertoire and basic proficiency in entering these ASCII characters on any keyboard are the very first steps in computer literacy worldwide."

    That's like saying basic numeracy is the first step for computer literacy worldwide, so we should go back to using IP addresses!

    Currently email addresses and URLs are the only reason a native Chinese speaker needs to use ASCII. For someone from Germany, ASCII is pretty easy to handle, but for a lot of languages, Unicode URLs & email addresses are very necessary ...
  • IDNC3 (Score:5, Informative)

    by Russ Nelson ( 33911 ) <slashdot@russnelson.com> on Monday May 27, 2002 @10:48PM (#3593310) Homepage
    Dan Bernstein has a proposal for internationalized domain names which solves this problem and many other problems. It's called IDNC3 [cr.yp.to]. IDN stands for ``internationalized domain name.'' C3 stands for ``clean, careful, conservative.''
  • by wadetemp ( 217315 ) on Monday May 27, 2002 @10:52PM (#3593321)
    1) Some people are not good at spelling, and wouldn't know microsoft.com from microssoft.com, especially if it's just seen in a few quick glances.

    2) There are more TLDs out now, and the same name at a .biz or .info TLD does not mean it is the same company... but no doubt alot of people think that's true.

    3) There's always the old numeral "1" swapped for the lowercase "L" or the uppercase "I", trick, among other similar things that never involved Unicode, but rather human vision and high-resolutions.

    4) The "@" symbol in the URL trick, like http:\\microsoft.com\moneyfrombil@haxor.com?action =allyourmoneyarebelongtous

    So if you haven't figured out my point yet, a good percentage of people that use the internet are going to be fooled by far simpler feats of social engineering. Who needs Unicode to do it?
  • If you buy something online without using a credit card, you deserve to get scammed.

    If you buy something with a credit card, not only will you get your money back (actually never lose it in the first place), but the scammers will likely go to jail.

    Besides, why are you clicking on links in your spam anyway?

  • My friend told me that a few years ago he was looking for a domain name to register. After some poking around he discovered that microsoft.net was up for grabs. He then proceeded to go to his dad to ask for the $10-$15 (don't remember the exact amount) he needed to register the domain, needless to say his dad refused!!
  • Ok, first take microsoft.com (alternate spelling), name your mail gateways identitcal to microsoft's, and then send out emails (as balmer@microsoft.com?) to a lot of MS employees, telling them to remove IE from XP ..

    From there on, it only gets better and better. Think of the countries you would be able to influance, technology developement you could steer, and leaked memo's you could fabricate..

    Damn i wish i had thought of it ;-)
    • Just because it's a technical no-brainer doesn't mean it's legal, and doens't mean it even treads on laws that have anything to do with the internet.

      If you pretend to be someone else, or if someone registered an alternate lookalike domain for microsoft.com and used it to in any way whatsoever to benefit from the fact.. they'd be in deep sheep.
  • by ukryule ( 186826 ) <slashdot@yule . o rg> on Monday May 27, 2002 @11:41PM (#3593447) Homepage
    One way to control this would be to restrict the valid characters based on the TLD.

    So for example '.uk'/'.au'/'.us' etc. can ONLY have ASCII 2nd level domains. '.de' Can only have German characters, '.fr' only French, and so on ...

    Then for completely different character sets, you have new Unicode TLDs (Arabic, Greek, Chinese), which can only have their relevant characters.

    I guess you leave .com/.org./.net as ASCII, although they are meant to be global they are based on the Latin character set.

    Of course, this adds complexity - but you can do all the testing for validity when the domain is registered (i.e. a web client can request any URL, but dodgy mixed character set domain names cannot be registered).
  • The same risks exist today with ASCII domain names: transposed letters "1lI", "O0", playing tricks with "@" and most user agents.

    You just must not take anything for granted which you see or read on the web.
  • Certification agencies (which include VeriSign) ensure that encoded names are not misleading and that the registration corresponds with the correct real-world entity.

    Yeah, that's why a couple of Israeli college students were unable to register mirsoft.com (spelled "mi&#1089;r&#1086;soft")...oh wait a minute, what were they saying again?

  • by Corgha ( 60478 ) on Tuesday May 28, 2002 @12:56AM (#3593614)
    Verisign never ceases to amaze me. The first sentence on their website [verisign.com] is:
    VeriSign, Inc. (Nasdaq:VRSN) is the leading provider of digital trust services that enable businesses and consumers to engage in commerce and communications with confidence.

    ... so it seems safe to say that trust is the foundation of their business. Essentially, we trust Verisign to ensure that we're communicating with whom we think we're communicating, and to protect us from various forms of spoofing. They should therefore, IMHO, actively avoid even the appearance of impropriety.

    However, we all remember [slashdot.org] the Microsoft certificates they mistakenly gave out to a third party.

    Now we've got them registering another domain to someone that looks just like "microsoft.com." While it's tempting to absolve Verisign of guilt in this, I think they were asking for it. After all, even I thought of this possibility when I first heard about Unicode domain names, and I'm not the sharpest knife in the drawer. You've got to think someone at Verisign raised the possibility, but they chose not to deal with it.

    Again, one might be tempted to say that this isn't their problem, if not for the fact that they are in the trust business. As the article says, "Certification agencies (which include VeriSign) ensure that encoded names are not misleading and that the registration corresponds with the correct real-world entity." It should not be technically difficult, for instance, to build a set of lists of visually similar Unicode characters and to refuse to register domains visually identical to existing ones. Maybe they should decide to forgo a relatively small amount of revenue and to refuse to sully their reputation with such inevitably deceptive domain registrations, especially considering that they interfere with Verisign's core business.

    Of course, none of this compares to the letters they sent out [slashdot.org] trying to fool people into switching their domains over to Verisign. The other two were negligence and foolishness, but that was an active attempt to deceive from a company that's selling trust.

    It all leaves me in a bit of shock. It's not that I'm shocked to see a company doing stupid and deceitful things; it's that trust is Verisign's primary asset. Hearing about these (colossally, in my mind) stupid decisions is like hearing that GM decided to torch all its manufacturing plants and assasinate all its employees. It leaves me with two questions: "what they hell are they thinking?" and "why does anyone continue to do business with Verisign?"
    • Verisign's activites as a domain registrar are NOT the same thing as their CA business.

      They are not required to, nor do they claim to, verify domain registrants UNLESS those registrants apply for digital certificates.

      Yes, verisign are scum.. but you are barking up the wrong tree here. They are not at all requred or expected to verify domain registrars.

      Hey. I wish they were. Imagine how many domains would have to be revoked? Literally millions.

  • We have a http filter here that protects against things such as www.sex.com (being PC, it also stops theOnion and fuckedcompany). They even filter out the fish, in case you use it as a proxy to get non-PC pages.

    Unfortunately, it doesn't protect against 'cekc' (I can't be bothered to get type this in Cyrillic here).

  • This issue was also discussed in my book Secure Programming for Linux and Unix HOWTO [dwheeler.com]. Look at the section on semantic attacks [dwheeler.com].
  • Paper Online (Score:5, Informative)

    by AstroMage ( 566990 ) on Tuesday May 28, 2002 @02:20AM (#3593772)
    Inspite of what the heading says, the original paper is online- you can find it on Evgeniy Gabrilovich's homepage [technion.ac.il].

    That is, if you are interested in the dry, technical details... ;-)

  • unless they run thier own servers, hosting is gonna be a little hard to get. I run a web hosting company. When a user signs up for hosting they are immediately ushered to the credit card processor, then after that it askes them what passowrd they wish to use on the system. after that the domain name, password, and other stuff are stuck into a database and an email is fired off to me to let me know someone signed up, containing the url of the page that will give me the details. anyway, i open up an ssh session to the server and start setting it up. when i enter the domain name into the httpd.conf i am not typing in cyrillic. I simply fire up vi, and type the domain name in there using regular latin characters. Same when I set up the DNS zone files, email, and other such stuff. Sure they can get the domain name there, but actually getting the page to show up is another matter all together. I believe even russian ISPs would assume the letters were latin characters and not thier cyrillic counterparts if they are used to spell english words (as in known company names to be used in some sort of scam)
  • I'm trying not to sound like a lingual elite-ist by any means, but can anyone really say that we shouldn't standardize on English/ASCII? Just about every country where English is not the native language, English is taught to their school children from early on.

    The internet has shrunk the barrier to exchange information, which has made diverse languages even more significant of a barrier. If we use UNICODE and just let accept that everyone wants to use their own language, then the internet will end up as a group of national islands of information. Each group will surf their set of native language web sites. When you search the web, the information on that Nokia phone might not be readable by you (Babblefish isn't a solution).

    Language has always been a barrier, and I hope the internet will be the tool by which that barrier is torn down; not the tool which escalates the problem.
    • by dvdeug ( 5033 ) <dvdeug@@@email...ro> on Tuesday May 28, 2002 @04:35AM (#3593939)
      I'm trying not to sound like a lingual elite-ist by any means, but can anyone really say that we shouldn't standardize on English/ASCII?

      The 5 billion people in the world who don't have English as their native language might. Some would argue that language is a cornerstone of culture, and that when a society loses their language, they lose a significant part of their culture. I've read parts of Shakespeare in German, and was very unhappy about the destruction of the writing. I know several poets of my native tongue (Poe, in particular) would be lost completely in translation. I have no interest in condeming other people to reading the great literature of their cultures in translation.

      In any case, ASCII isn't good enough for English writing. French accents are used in English writing, as well as the ae and oe ligatures. Even in modern writing, proper quotes and apostraphes are needed, and footnote daggers often show up in English writing. For specialized work, mathematics, linguistics (even of English), historical English writing and APL all have thier own body of characters outside ASCII that need supported.
    • I'm trying not to sound like a lingual elite-ist by any means, but can anyone really say that we shouldn't standardize on English/ASCII?

      Yes. It's ridiculous to ask people to learn (admitedly a small part of) a new language to use a computer. Just because English is taught in a lot (not all) of schools around the world, it doesn't mean that everyone is comfortable using it. A truely usable computer should be one which allows you to interact with it 100% in your own langauge.

      The internet has shrunk the barrier to exchange information, which has made diverse languages even more significant of a barrier.

      The main barrier to computer usage in a large part of the world is that it is still an elitist medium - only useable (and affordable) by the well-educated. If you are actually interested in making it easier for everyone to communicate, then the main technical issue to be solved is how to make the internet useable by anyone from any background.

      If we use UNICODE and just let accept that everyone wants to use their own language, then the internet will end up as a group of national islands of information. Each group will surf their set of native language web sites.

      This already happens. Of course people surf websites in their own language! Because you (and I) only surf the English-speaking fraction of the web, you don't see it. All that international domain names adds is that a Russian accessing a Russian website can do so via a Russian URL. What could be more sensible or obvious than that?

      If no standard is agreed upon, proprietory standards will pop up all over the place, and it'll be a huge mess. In fact this is already happening - although he's the current anti-Christ of Slashdot [slashdot.org], the big selling point of RealNames was for non-English languages, and if you believe Keith Teare's [teare.com] account, he was shafted by Microsoft because they wanted to control (via their browser) the translation of non-ASCII names to ASCII URLs.

What is research but a blind date with knowledge? -- Will Harvey

Working...