Your 'Clickprint' Gives Away Your Identity Online 76
Krishna Dagli writes to mention an article at the Guardian site about an increasing interest in the possibility of identifying users by their 'clickprint', or online access habits. The article discusses a new paper on online identification written by two American professors. The piece posits that not only is nailing down individual users by their habits useful for advertisers looking to sell products, it may be possible to use this information to flag stolen identities. From the article: "'Our main finding is that even trivial features in an internet session can distinguish users,' Padmanabhan told the Wharton Review. 'People do seem to have individual browsing behaviors.' The duo found that anywhere from three to 16 sessions are needed to identify an individual's clickprint ... In one example, they found that from just seven aggregated sessions they could distinguish between two different surfers with a confidence of 86.7%. Given 51 sessions, the confidence level rose to 99.4%."
Shameless Weka Plug (Score:5, Interesting)
You don't have to worry about this, however, as it is easy to distinguish two different users but probably difficult to pick you out of a crowd. Furthermore, if they're tracking your clicks, they probably already know your IP address. The number of sessions probably raises to a problematic number if you are trying to identify one user out of one thousand. Therefore, this will only be useful in identifying different behavior between two users -- or specifically identifying when it is highly likely that someone who is logged in is significantly different from the click profile associated with that account (as the article states).
There's a lot of discussion about this in the paper. Mentioning that the priors are set at 50% for 2 users but at 1% for 100 users (obviously). And also that: They go on to say that the method they suggest for detecting a fradulent user "do not require that users have truly unique profiles."
I read a bit of the paper and I identified Weka's decision tree method being used to classify the users (if you've ever used the ID3 algorithm [wikipedia.org] or its brethren C4.5 [wikipedia.org] in classification, imagine exploring methods of developing different decision trees).
Indeed the paper states: I'll take this opportunity to recommend two open source projects. Torpark [torrify.com] for those of you concerned about your identity and also Weka [waikato.ac.nz] -- the easy to use collection of data mining software in Java! Also something to note is that Weka has recently become part of Pentaho [pentaho.org], a project of open source business intelligence products. Explore the valuable tools that are out there and enjoy!
Re: (Score:2, Interesting)
Re:Shameless Weka Plug (Score:4, Interesting)
If they're talking about using this for identifying fraudulent users...how much would changing news/services on the internet affect that? I can think of several news items and new services that instantaneously and permanently caused me to alter my browsing and internet using habits. Wouldn't those sorts of behavior altering agents increase false positives?
Please bear in mind I have absolutely zero background in this kind of stuff ;)
Answer to Your Question (Score:4, Informative)
The odds are low and this is a variable to be tweaked. But the assumption is that you will still visit your old sites and exhibit your behaviors on them. If you found say one new site a week, it would actually slowly be incorporated into your routine (if they used regression properly and allowed the model to train on your data -- old and new). But if you suddenly stopped going to your old sites and started visiting new ones, you would probably be flagged. And that's the trade off of trying to repress fraud.
I should point out that there's a lot of play with the variables here and that actual implementation of this theoretical paper could be either well done or badly done.
Excellent point, though. Sometimes these new technologies turn out to be more cumbersome than helpful and we need to watch out for that!
Re: (Score:3, Insightful)
Because we all know that the process of straightening things out when you've been flagged as a fraudster is always a quick and easy process that works 100% of the time.
Thanks for answering my question though!
Re: (Score:1)
Re: (Score:1)
pentaho (Score:2)
Pentaho? (Score:3, Interesting)
Re: (Score:1)
Re: (Score:2)
Actualy I'm quite easy to identify from my clicks. I typicaly use a bookmark and go to Slashdot where I log in. From there I click on Technician and check if I have any replies to my posts. Now in reality, how many people log into Slashdot and click Technician. I hope it is not very many of you...
Maybe a lot of you log in as Technician and click Technician. May
How about this? (Score:3, Interesting)
Re:How about this? (Score:5, Funny)
Re: (Score:1)
This is modded as funny, but it's true. That would be powerful, valuable information to have.
Re: (Score:2)
The only two guys on the internets (Score:3, Funny)
Re: (Score:2)
However, as we don't know how many internets there are in total....
Re: (Score:1)
My "clickprint" is easy on slashdot... (Score:2, Funny)
But I'm used to living among dyslexics, illiterates, and dumbasses. Sigh.
Oh No You Didn't (Score:5, Funny)
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
What I hate is when I browse back to double-check what someone (like a grandparemt post) said before I submit a comment... then I browse back forward, copy my text, and then have to wait a few minutes...
Re: (Score:1)
Informative enough to get mod points. Interesting enough for you to reply to.
And I thought the Intarweb was more popular (Score:2)
Well, I know I'm one of the websurfers. Who's the other one?
Re: (Score:1)
AdBlock + NoScript + no cookies = reduced tracking (Score:1, Informative)
Re: (Score:1)
Directly from the paper, specially to you: "It is important to note that the research presented in this paper discuss the possibility of identifying users based on their online behavior. However this 'identification' is still anonymous, and even perfect methods will only be able to indicate that some current session belongs to the 'same user' as some previous session. These methods cannot identify users by 'name'."
What you said makes complete no sense in regar
Re: (Score:2)
If a website implements this secretly, then gets information about your usage while having some sort of login information with which to associate this information, then they would be able to connect future sessions to that session, which they could then connect back to a user profile.
Although this seems to be focused on usage within a single website, it seems a reasonable extrapolation to think that someone could develop a less effective but more general algorithm that would help to identify on
Re: (Score:2)
The AOL Search database which was releases didn't identify the users by name either, didn't mean they were identifiable.
Potentially useless.. (Score:3, Interesting)
I can distinguish between a person with blond hair and a person with brown hair given only the hair color 100% of the time. But that doesn't mean hair color is something that's a very usefull tool at positively identifying people. The key is how different peoples "click profiles" are. If there's only 1000 different possibilities (evenly distributed) that's not terribly good at idenfification. If there's 10^10 possible profiles, evenly distributed among the populace, that would certainly be usefull. Also, what's the false positive rate? If you try to use this at identifying fraud and you have a 1% false positive rate, you'll end up pissing off 1% of your customers. That's probbably not acceptable.
Re: (Score:1)
Re: (Score:1)
Re: (Score:2)
Some credit card issuers already do this. Consequently when you suddenly have to fly to Seattle to take care of your sick mother after years of no travel you get off the plane, try to rent a car, and find that your credit card has been frozen because buying a plane ticket is "abnormal activity".
Re:In other news ... (Score:1)
Defense (Score:2, Interesting)
Re: (Score:1)
How about a Firefox extension that, at random time intervals, randomly requests one of the page links?
Yeah, that would be cool. The "randomly chosen page links" would include advertising, of course, so I'd be earning AdSense click revenue every time someone just visits my site, even if they never actually click on something.
I just wonder what Google might say about that...
Tabbed Browsing anyone? (Score:2)
Even if you run something in the background that submits random search queries or random spidering, the instant you open up a bookmark full of tabs, you've identified yourself.
User 12345: the clickstream consists of completely random clicks on flickr, delicio.us, and Digg links, except that (at least) once a day, someo
Re:Tabbed Browsing anyone? MPU (Score:2)
Re: (Score:1)
And I can pretty easily imagine some of the bad guys not showing their hand with new exploits until they start seeing big ol' ripe, leaking internal IP addresses like 172.31.1.155 or 10.10.1.180 (as opposed to 192.168.1.3 or 192.168.2.2)
Similar to ssh exploit a few weeks back (Score:2, Informative)
This is similar to the SSH exploit reported here on Slashdot a few weeks back where data could be determined via statistical/timing analysis done on the packets sent during an SSH session.
It sounds like if these types of timing and statistical analysis attacks become common, a simple solution would be a firefox extension that would randomize the timing of the input from the mouse and the keyboard. I suspect that randomly delaying a keystroke or a mouse click anywhere between (0-100ms) would be enough to d
I'm sure that checking for fraud is the intention (Score:1)
Isn't this a graduate research paper by two individuals at different Business schools? Hmmmmm.
Pr0n (Score:1, Funny)
Other kind of print (Score:2)
Am I the only one (Score:5, Insightful)
Tiny Urls just don't compute as part of my safe surfing habits.
Example:
Tiny Url --> my redirect --> paper
After it hits the front page
Tiny Url --> my redirect --> 0-day exploit
There really is no need for them in Slashdot Submissions.
Here's the direct link to the paper
http://knowledge.wharton.upenn.edu/papers/1323.pd
Re: (Score:1)
Re: (Score:2)
Re: (Score:2, Informative)
Re: (Score:1)
Re: (Score:2)
OMGGGOGOGG (Score:1)
an even simpler solution... (Score:4, Funny)
TORiffic! (Score:2)
Clickprinting through multiple pathways (Score:2)
BugMeNot (Score:2)
It would be interesting to see what a "clickprint" analysis of an account shared with bugmenot.com would look like.
The idea of using this sort of technology as a security feature sounds absolutely horrible.
I mean, a change in your browsing habits on a site gets you locked out? That's not Orwellian, it's just plain stupid.
Re: (Score:2)
Old News (Score:1)
It'll be tied to cookies, bluetooth, and that proximity chip in your head pretty soon. This isn't really news, it's the logical progression of technology. Tech works best when you know who it's aimed at, especially advertising and remote controlled guns. (S
Cryptonomicon? (Score:1)
Re: (Score:1)
User Behavior (Score:1)
90% os statistics are made up ;) (Score:2)
Just because it is easy to distinguish between 2 users does not mean that this has much practical use:
In most applications (without the user's consent) this is going to be used remotely (server-side), which means that it is going to be totally useless at tracking users if there are more than say fifty users (someone do the math - assuming that the users follow enough links on that same site).
You can safely stow away the tin-foil hat^W^W browsing pattern disguise
proxies abound (Score:1)
Hmmm. (Score:2)
I see this was described as a "working paper" on the 20th. It doesn't show up anywhere as being "under review". I wonder if they've just blown their publication chances given that it is already "pre-published" at this point?
It'll be interesting to see how this shakes out.