Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
Check out the new SourceForge HTML5 internet speed test! No Flash necessary and runs on all devices. ×

Your 'Clickprint' Gives Away Your Identity Online 76

Krishna Dagli writes to mention an article at the Guardian site about an increasing interest in the possibility of identifying users by their 'clickprint', or online access habits. The article discusses a new paper on online identification written by two American professors. The piece posits that not only is nailing down individual users by their habits useful for advertisers looking to sell products, it may be possible to use this information to flag stolen identities. From the article: "'Our main finding is that even trivial features in an internet session can distinguish users,' Padmanabhan told the Wharton Review. 'People do seem to have individual browsing behaviors.' The duo found that anywhere from three to 16 sessions are needed to identify an individual's clickprint ... In one example, they found that from just seven aggregated sessions they could distinguish between two different surfers with a confidence of 86.7%. Given 51 sessions, the confidence level rose to 99.4%."
This discussion has been archived. No new comments can be posted.

Your 'Clickprint' Gives Away Your Identity Online

Comments Filter:
  • Shameless Weka Plug (Score:5, Interesting)

    by eldavojohn ( 898314 ) * <eldavojohn&gmail,com> on Friday September 29, 2006 @11:59AM (#16247497) Journal
    So I can already anticipate people being concerned about their identities being tracked through clicks online.

    You don't have to worry about this, however, as it is easy to distinguish two different users but probably difficult to pick you out of a crowd. Furthermore, if they're tracking your clicks, they probably already know your IP address. The number of sessions probably raises to a problematic number if you are trying to identify one user out of one thousand. Therefore, this will only be useful in identifying different behavior between two users -- or specifically identifying when it is highly likely that someone who is logged in is significantly different from the click profile associated with that account (as the article states).

    There's a lot of discussion about this in the paper. Mentioning that the priors are set at 50% for 2 users but at 1% for 100 users (obviously). And also that:
    In an experiment involving 42 user profiles, Monrose and Rubin (1997) shows that depending on the classifier used, between 80 to 90 percent of users can be automatically recognized using features such as the latency between keystrokes and the length of time different keys are pressed.
    They go on to say that the method they suggest for detecting a fradulent user "do not require that users have truly unique profiles."

    I read a bit of the paper and I identified Weka's decision tree method being used to classify the users (if you've ever used the ID3 algorithm [wikipedia.org] or its brethren C4.5 [wikipedia.org] in classification, imagine exploring methods of developing different decision trees).

    Indeed the paper states:
    We chose weka's J4.8 as the classifier since classification trees in general have been shown to be highly accurate classifiers.
    I'll take this opportunity to recommend two open source projects. Torpark [torrify.com] for those of you concerned about your identity and also Weka [waikato.ac.nz] -- the easy to use collection of data mining software in Java! Also something to note is that Weka has recently become part of Pentaho [pentaho.org], a project of open source business intelligence products. Explore the valuable tools that are out there and enjoy!
    • Re: (Score:2, Interesting)

      by balsy2001 ( 941953 )
      "They go on to say that the method they suggest for detecting a fradulent user "do not require that users have truly unique profiles." " This could be problematic for two individuals who use the same account. For example, my wife and I use the same account for some financials but we have drastically different habits and paterns while using the computer.
    • by Lord_Dweomer ( 648696 ) on Friday September 29, 2006 @12:25PM (#16247921) Homepage
      Since you seem to be knowledgeable on this topic...I have a question for you.

      If they're talking about using this for identifying fraudulent users...how much would changing news/services on the internet affect that? I can think of several news items and new services that instantaneously and permanently caused me to alter my browsing and internet using habits. Wouldn't those sorts of behavior altering agents increase false positives?

      Please bear in mind I have absolutely zero background in this kind of stuff ;)

      • by eldavojohn ( 898314 ) * <eldavojohn&gmail,com> on Friday September 29, 2006 @12:29PM (#16247993) Journal
        If they're talking about using this for identifying fraudulent users...how much would changing news/services on the internet affect that? I can think of several news items and new services that instantaneously and permanently caused me to alter my browsing and internet using habits. Wouldn't those sorts of behavior altering agents increase false positives?
        To the best of my knowledge, the idea is that you wouldn't change drastically. And if you did, it might falsely accuse you of being a fraudulent user and then you mearly need to straighten things out.

        The odds are low and this is a variable to be tweaked. But the assumption is that you will still visit your old sites and exhibit your behaviors on them. If you found say one new site a week, it would actually slowly be incorporated into your routine (if they used regression properly and allowed the model to train on your data -- old and new). But if you suddenly stopped going to your old sites and started visiting new ones, you would probably be flagged. And that's the trade off of trying to repress fraud.

        I should point out that there's a lot of play with the variables here and that actual implementation of this theoretical paper could be either well done or badly done.

        Excellent point, though. Sometimes these new technologies turn out to be more cumbersome than helpful and we need to watch out for that!
        • Re: (Score:3, Insightful)

          And if you did, it might falsely accuse you of being a fraudulent user and then you mearly need to straighten things out.

          Because we all know that the process of straightening things out when you've been flagged as a fraudster is always a quick and easy process that works 100% of the time.

          Thanks for answering my question though!

      • Tell me the sites you visit, I'll tell you who you are :-) Basically, if your browsing habits changes based on news/services you read/consume, its very possible that users like you will change their behaviour too. I don't know the english term for that, but its something like the "standard deviation". So, the answer is: it should have *no* effect in false positives. False positives will always exists, but if you are the target for some product/service, and this product/service detects a change in your behav
    • It doesn't seem to me like the 'one in a thousand' users is the actual problem. That's probably just a matter of scaling the computation thrown at the problem. On the other hand, not having false positives seems like it would be particularly tricky given that while you might be able to easily "print" someone and identify their pattern for a given point in time, people are not static. The way I use the web during the week is particularly different than on the weekends, or even at night. The way I use the web
    • Where the heck are the system requirements?
    • Pentaho? (Score:3, Interesting)

      What is that, a five-sided prostitute??
      • no, that's a "pentagon-ho", otherwise generally referred to as "general". pentaho simply means 5 hookers.
    • You don't have to worry about this, however, as it is easy to distinguish two different users but probably difficult to pick you out of a crowd.

      Actualy I'm quite easy to identify from my clicks. I typicaly use a bookmark and go to Slashdot where I log in. From there I click on Technician and check if I have any replies to my posts. Now in reality, how many people log into Slashdot and click Technician. I hope it is not very many of you...

      Maybe a lot of you log in as Technician and click Technician. May
  • How about this? (Score:3, Interesting)

    by Conspiracy_Of_Doves ( 236787 ) on Friday September 29, 2006 @12:03PM (#16247555)
    How about a program that sits in the background and randomly hits sites while you are browsing?
  • by Anonymous Coward on Friday September 29, 2006 @12:03PM (#16247571)
    Great! Finally we'll be able to distinguish between the two guys who use the Internets... most of the time.
  • I'm the guy who can read; I get the "slow down cowboy" message constantly.

    But I'm used to living among dyslexics, illiterates, and dumbasses. Sigh.
    • by eldavojohn ( 898314 ) * <eldavojohn&gmail,com> on Friday September 29, 2006 @12:06PM (#16247641) Journal
      But I'm used to living among dyslexics, illiterates, and dumbasses. Sigh.
      Go kcuf yourslef! I am not living among you! I may be dyslexic, I may be illiterates and I may be a dumbass but I am definitely not a sigh.
    • Are you the third kind? That filter os there to prevent spam not hold you up.
    • I'm the guy who can read; I get the "slow down cowboy" message constantly.
      But apparently not the guy who can think through his posts... or take the time to post something a little more informative.

      What I hate is when I browse back to double-check what someone (like a grandparemt post) said before I submit a comment... then I browse back forward, copy my text, and then have to wait a few minutes...
      • by sm62704 ( 957197 )
        But apparently not the guy who can think through his posts... or take the time to post something a little more informative.

        Informative enough to get mod points. Interesting enough for you to reply to.
  • In one example, they found that from just seven aggregated sessions they could distinguish between two different surfers with a confidence of 86.7%.

    Well, I know I'm one of the websurfers. Who's the other one?

    • Wow!@ So they could find me easily since I only go to /. and f**kingmachines.com. Man oh man, I guess I should go to more web sites. On the other hand, I bet it's even easier to find that Congressman Foley from Florida, who only visits the "pretty little boy" sites (which I'll bet is owned by a neocon company).
  • by Anonymous Coward
    Install AdBlock + NoScript and do not allow cookies unless you need them and you will reduce the chances of someone on the web identifying you significantly.
    • Please please please, read TFA and the paper :-)

      Directly from the paper, specially to you: "It is important to note that the research presented in this paper discuss the possibility of identifying users based on their online behavior. However this 'identification' is still anonymous, and even perfect methods will only be able to indicate that some current session belongs to the 'same user' as some previous session. These methods cannot identify users by 'name'."

      What you said makes complete no sense in regar
      • by miyako ( 632510 )
        However...
        If a website implements this secretly, then gets information about your usage while having some sort of login information with which to associate this information, then they would be able to connect future sessions to that session, which they could then connect back to a user profile.
        Although this seems to be focused on usage within a single website, it seems a reasonable extrapolation to think that someone could develop a less effective but more general algorithm that would help to identify on
      • by topham ( 32406 )


        The AOL Search database which was releases didn't identify the users by name either, didn't mean they were identifiable.

  • by Vellmont ( 569020 ) on Friday September 29, 2006 @12:18PM (#16247821) Homepage
    I haven't read the full paper, but the article makes this sound extremely preliminary as a usefull tool. It says they can distinguish between two users with 99% accuracy. That's all well and good when you only need to distinguish between two people, but what about when you need to distinguish between a million people?

    I can distinguish between a person with blond hair and a person with brown hair given only the hair color 100% of the time. But that doesn't mean hair color is something that's a very usefull tool at positively identifying people. The key is how different peoples "click profiles" are. If there's only 1000 different possibilities (evenly distributed) that's not terribly good at idenfification. If there's 10^10 possible profiles, evenly distributed among the populace, that would certainly be usefull. Also, what's the false positive rate? If you try to use this at identifying fraud and you have a 1% false positive rate, you'll end up pissing off 1% of your customers. That's probbably not acceptable.
    • I could be usefull if all you wanted to do was determine if the person "logged in" was the actual person who created the account. Then you just have the two person problem. Yes, two people out of the many people online could have a very similar profile, but what are the odds that the guy hacking your account has the same profile (not very good).
    • Read the full paper. Its very interesting. As stated in the paper, people can be distinguished by their handwriting, by their fingerprints, etc etc. New studies states that people can be distinguished by their mouse movements, strokes in a keyboard (the time between strokes in two different keys). Even their usual movements in a city can be tracked by a handheld device and used to distinguish people. This paper is just another way to distinguish people: the way people browse the internet. I'm sure I'm the o
      • > Imagine that being used by a bank.

        Some credit card issuers already do this. Consequently when you suddenly have to fly to Seattle to take care of your sick mother after years of no travel you get off the plane, try to rent a car, and find that your credit card has been frozen because buying a plane ticket is "abnormal activity".
    • I don't think the goal is an ability to map anonymous clickprints onto a domain of known users---guessing right 99% of the time (optimistically speaking) with a population of only 2 users does not seem very good for that application. However, if gmail or yahoo or whoever alerted me when my access habits suddenly changed dramatically, or prompted for identification confirmation more often when my usage patterns changed, that might be somewhat useful.
  • Defense (Score:2, Interesting)

    by Led Nudd ( 1004881 )
    How about a Firefox extension that, at random time intervals, randomly requests one of the page links? It wouldn't have to even load the page in a tab. That might introduce enough noise to cover a "clickprint." (Implementation is left as an exercise for the reader.)
    • How about a Firefox extension that, at random time intervals, randomly requests one of the page links?

      Yeah, that would be cool. The "randomly chosen page links" would include advertising, of course, so I'd be earning AdSense click revenue every time someone just visits my site, even if they never actually click on something.

      I just wonder what Google might say about that...

  • The use of tabbed browsing (specifically, the ability to bookmark a series of tabs) in Mozilla greatly increases one's identifiability, even without persistent identifiers such as cookies.

    Even if you run something in the background that submits random search queries or random spidering, the instant you open up a bookmark full of tabs, you've identified yourself.

    User 12345: the clickstream consists of completely random clicks on flickr, delicio.us, and Digg links, except that (at least) once a day, someo

    • mod parent up. add in factors such as gmail account auto checkers and other extensions that login automagically and it's a trivial excersise.
    • You can turn off Javascript so the server end doesn't pull your local IP address behind the $39 Netgear or Linksys router. A pretty dead-giveaway for environments with a 1:1 correspondence between users and machines.

      And I can pretty easily imagine some of the bad guys not showing their hand with new exploits until they start seeing big ol' ripe, leaking internal IP addresses like 172.31.1.155 or 10.10.1.180 (as opposed to 192.168.1.3 or 192.168.2.2)

  • This is similar to the SSH exploit reported here on Slashdot a few weeks back where data could be determined via statistical/timing analysis done on the packets sent during an SSH session.

    It sounds like if these types of timing and statistical analysis attacks become common, a simple solution would be a firefox extension that would randomize the timing of the input from the mouse and the keyboard. I suspect that randomly delaying a keystroke or a mouse click anywhere between (0-100ms) would be enough to d

  • I'm sure that recognizing return anonymous users wouldn't be that important to the marketing people behind the scenes.

    Isn't this a graduate research paper by two individuals at different Business schools? Hmmmmm.
  • Pr0n (Score:1, Funny)

    by Anonymous Coward
    Thats the only pattern apart from Slashdot most users here will have!
  • It would probably be possible to distinguish between users, depending on the part of the link they click. Top, bottom, left, right, edge, center. Something must be fairly common.
  • Am I the only one (Score:5, Insightful)

    by TubeSteak ( 669689 ) on Friday September 29, 2006 @12:39PM (#16248167) Journal
    Who doesn't like clicking on Tiny Urls?

    Tiny Urls just don't compute as part of my safe surfing habits.
    Example:
    Tiny Url --> my redirect --> paper
    After it hits the front page
    Tiny Url --> my redirect --> 0-day exploit

    There really is no need for them in Slashdot Submissions.

    Here's the direct link to the paper
    http://knowledge.wharton.upenn.edu/papers/1323.pdf [upenn.edu]
    • by rob1980 ( 941751 )
      Yeah, I don't like em unless somebody I know is sending me a link that is 5 miles long and may get broken up in an IM window.
    • by rthille ( 8526 )
      If you're that paranoid about tinyurl, just turn on the preview feature at tinyurl.
    • Re: (Score:2, Informative)

      by gladed ( 451363 )
      I agree. If you are concerned about this, TinyURL allows you to enable "previews" now. When enabled, clicking on a tinyurl link will direct you to a page that shows you the link, where you can decide to click or not. See http://tinyurl.com/preview.php [tinyurl.com].
    • by dyftm ( 880762 )
      You're not the only one. If you go to tinyurl.com, you can turn on the 'Preview Feature' (left menu). This shows you the full address of the page you are being redirected to before going there. However this relies on cookies.
    • by Ash-Fox ( 726320 )
      Not to mention that tinyurl.com doesn't even resolve for me. Just another point of failure, I don't like it.
  • thiss it tsht wruostt thingti everurheard of assoson ii sober up ima gonanagjoigewhtesdqwhiu yerrsmy bests frenns u nme ginst worlds
  • by cyberworm ( 710231 ) <cyberworm&gmail,com> on Friday September 29, 2006 @12:46PM (#16248301) Homepage
    Follow them to their myspace page.
  • Perhaps this will help spark more interest in anonymous [eff.org]web [freshmeat.net] browsing [noreply.org].
  • One thing I've noticed about my family's computer use (they all use XP) is the way that they launch their browser. My mom clicks the destop icon. I like the quicklaunch button. My sister uses the recent items menu, and my dad likes to open a folder and type an address in the address bar (despite my attempts to get him to use Firefox). One possible way to make clickprinting much more effective would perhaps be to monitor the methods people use to get form one page to another. Some people like to click a butt
  • Yeah, this is all fine and good if the account is single user access.
    It would be interesting to see what a "clickprint" analysis of an account shared with bugmenot.com would look like.

    The idea of using this sort of technology as a security feature sounds absolutely horrible.
    I mean, a change in your browsing habits on a site gets you locked out? That's not Orwellian, it's just plain stupid.
  • I remember talking to a vendor 20 years ago. His company had a way of identifying people by their typing habits. Time between keys, spelling, etc. So you've added the mouse to it, and are tying it in to surfing habits.. big deal. Why did it take 20 years?

    It'll be tied to cookies, bluetooth, and that proximity chip in your head pretty soon. This isn't really news, it's the logical progression of technology. Tech works best when you know who it's aimed at, especially advertising and remote controlled guns. (S
  • In Neil Stephenson's Cryptonomicon, he introduced the idea that the operators sending the encrypted messages were distinguishable by their "hand" (the subtleties in how they transmitted their messages). Stephenson even went on to say that they used professional pianists for their adroitness in mimicking various enemy operators to avoid detection. I don't know how much of that is rooted in actual history, but it was an intriguing idea that bears a resemblance to the method these guys are using.
  • The article describes another form of clickstream analysis. However, I wonder whether user behavior couldn't also, and perhaps better, be identified by content interaction. There are a number of products that show Web page heatcharts ostensibly to identify layout problems. But there are not many products that show what a person actually did on a page. The article used sample data for a year, but I wonder how much of that data was skewed by changes in content layout and promotion. For example, I monitor
  • thinking of an analogy: the birthday paradox [wikipedia.org]:

    Just because it is easy to distinguish between 2 users does not mean that this has much practical use:
    In most applications (without the user's consent) this is going to be used remotely (server-side), which means that it is going to be totally useless at tracking users if there are more than say fifty users (someone do the math - assuming that the users follow enough links on that same site).

    You can safely stow away the tin-foil hat^W^W browsing pattern disguise
  • I see this was described as a "working paper" on the 20th. It doesn't show up anywhere as being "under review". I wonder if they've just blown their publication chances given that it is already "pre-published" at this point?

    It'll be interesting to see how this shakes out.

Ever notice that even the busiest people are never too busy to tell you just how busy they are?

Working...