Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
The Internet

How to Stop Digg-cheating, Forever 217

The following was written by frequent Slashdot editorial contributor Bennett Haselton. He writes "Recently author Annalee Newitz created a bit of a stir with the revelation that she had bought her way to the front page of the story-ranking site Digg. Since Digg allows any registered user to go to a story's URL and "digg it" in order to push it upward through the story-ranking system, it was inevitable that services like User/Submitter would come along, where a Digg user can pay for other users to cast votes to push their story up to the top. User/Submitter says they are currently backlogged and not taking new orders, but they say the service will return and will soon feature services for manipulating similar sites like Digg competitor reddit. Even if the new U/S features are vaporware, it probably won't be long before other companies offer similar services. But it seems like all of these story-ranking sites could prevent the manipulation by making one simple change to their voting algorithm."

Before getting to that though, what's at stake? The revelation that Digg could be trivially manipulated did not cause the site to be overrun with bogus stories all at once -- most of the links on the front page still look interesting. Newitz said that her story, which was deliberately chosen to be as lame as possible, got buried by users soon after it hit the front page, which is how Digg cleans spam stories out of the system. However, she also said that in the time that the story was on the front page, the story got about 35,000 hits, whereupon her server crashed and the traffic was thereafter divided with two other mirror sites; presumably if the server had stayed up, she would have gotten about 100,000 hits, all for an initial expenditure of $100, which is orders of magnitude cheaper than buying advertising any other way. (If she had done the same thing with a good story instead of a deliberately lame one, presumably the traffic gains resulting from word-of-mouth and repeat visitors would have been even higher.) As long as the benefits outweigh the cost, more and more unscrupulous users are likely to pay for such services, and since the service provided by User/Submitter is easy to copy, probably similar services will spring up to drive the price down even further. If nothing changes, then eventually sites like Digg and reddit will be flooded with nothing but paid stories. Most of the stories on the front page will probably still be interesting (why would you pay to promote a link, unless it was good enough to draw repeat visitors and get the most value for your money?), but everybody who didn't pay for votes would eventually get crowded out.

One Good Samaritan, Jim Messenger, managed to shut down one Digg manipulation service called Spike The Vote, by buying it out (for a paltry $1,275 - they must have wanted to get out fast) and then turning over to Digg. He warned people that the moral was: Don't sign up for Digg manipulation services, since Digg might get your information from them and then you'll be banned. Actually, I think the moral is simpler: if you're going to try anything like that, do it from a throwaway account that you don't care about losing if you get caught. (Or, only sign up with manipulation services which publish a privacy policy promising never to share your information, especially not with sites like Digg. Then if Digg buys them out, then the site has violated their privacy policy and Digg as the new owner inherits the liability for that, so you can sue them, right?) But as the idea spreads, it will probably become impractical to play whack-a-mole by shutting down manipulation services as they keep springing up. Any time the cost of providing a service (clicking on a few buttons) is small compared to the benefits of receiving the service (100,000 hits in 24 hours), a market will exist for it one way or another, whether you're talking about drug-smuggling, prostitution, or selling Digg votes.

However, I think there's a way to fix it, and here it is. Have you ever seen people put a link in their profile to their HotOrNot picture, saying "Go here and vote me a 10!!"? Similar to the people who send links to their friends and say, "I just posted this, please Digg this for me!" The difference is that on HotOrNot, it doesn't work. On HotOrNot, you can cast votes for a picture in one of two ways. The first way is to go directly to the URL for someone's picture; the second way is to load the front page, where a random picture from the database is selected at random, and vote for whatever picture comes up. The catch is that the votes that you cast by going directly to someone's picture, are simply ignored in calculating the average score for that photo. The only votes that are counted are the votes cast for random pictures displayed on the front page. So if you want to manipulate the voting for your own photo, you'd have to load the front page hundreds of thousands of times waiting for your own picture to come up repeatedly, which is hard to do without being detected.

To enable an algorithm like this on Digg and reddit, the sites could present users with a sidebar box that displays random stories from the pool of recent submissions. (reddit already has a serendipity feature that users can use to select a random story from the available pool, which could be leveraged for this purpose.) Once a story has collected, say, 100 votes -- or whatever number is considered sufficient to provide a representative random sample of how the story appeals to people -- then on that basis the story can either be buried or promoted to the top, where it would be seen by, say, 100,000 people. The elegance of this system is that bad content would only be seen by 100 people on average before it's buried, whereas good content would be seen by all the 100,000 people who view it on the front page, so the average user sees 1,000 pieces of good content for every 1 piece of crap. Even if 75% of users ignore the random story box completely, that just means you have to display it to 400 users instead of 100 before you have enough data points for a good random sample.

I suggested essentially the same algorithm for how an open-source search engine could work without being vulnerable to gaming even by those who understood all of its inner workings. The main difference, of course, is that Digg and reddit actually exist now. Digg declined to comment on the possible merits of such an algorithm; reddit's Steve Huffman said that the idea sounded interesting, although even if the idea got full buy-in, naturally any proposed change would take a long time to bring to fruition.

But it seems that an algorithm similar to this one would be the only way to prevent cheating on sites like Digg that sort content based on user votes. So it's ironic that HotOrNot, the only site I know of that is using a variation of this algorithm and hence is probably the most secure against cheating, is also the one where cheating is least likely to be a problem. Getting a high placement on Digg might enable you to make some money, but getting a highly rated picture on HotOrNot isn't going to make you rich (unless it helps you meet a millionaire who is using the site to find his third wife). Also, making HotOrNot meritocratic doesn't give people an incentive to improve the "content" that they submit, because up to the limits of what can be done with hair and wardrobe, you can't make yourself that much more attractive. With Digg and reddit, on the other hand, I might work harder at submitting a good story, if I knew that it worked in a perfectly meritocratic fashion that pushed good stories right to the top.

If you do this, you don't need any of the other countermeasures listed in Annalee Newitz's follow-up piece "Herding the Mob", such as analyzing user account history for suspicious behavior. As long as most users in the system are legitimate, most of the users in your random sample will be legitimate as well, and their voting will be representative of what most of the community would think. A story could also get a high score within a specific sub-area of the site like the sports page, but kept off of the main site front page, if the story got a high score from a random sampling of sports-oriented users but a low score from a sample of everyone else.

You could even sub-divide the topical areas further, down to a level of granularity like "Would Barack Obama make a good president?" A site called Helium is currently trying something like this -- users can submit essays on subjects like "Racial inequality or oppression: Do they truly exist in todays society?", and vote on how to rank other essays against each other. The voting works on the random selection principle that I'm advocating here -- users are presented with a pair of randomly chosen essays from a given category (not necessarily the same category for which you submitted an essay) and told to vote for the better one, so there's no way to tell all your friends to go to the link for your essay and give it a high rating. The main limitation though is that while the votes can push you to the top of a particular sub-category, that won't cause your article to "break out" and get to the front page of the site -- Helium says that those front-page articles are chosen at random by employees from the among those articles that are highly rated within their narrow category, so just being good is not enough. And if you want to write something that doesn't fit into any existing categories, you have to create a new category for your essay like I did, which will then be a category containing one essay that nobody else ever sees. Perhaps both of these limitations could be overcome by adding the option to rate randomly selected essays on a scale of 1 to 10 -- thus providing a way to rate essays that exist alone in their own category, and also a way to find the best essays across the entire site, rated against each other.

If Digg or reddit adopts a model that uses the random-voter-selection method, then there's the issue of how to handle the votes cast by users under the current system -- the ones who go to a story link and click "digg it", which is what makes the existing system vulnerable to gaming. Digg could do what HotOrNot does, and just ignore those votes outright, but users would probably view this as deceptive. Perhaps Digg could say that votes cast by self-selected users (the ones who go straight to the story link) are counted along with votes from randomly-selected users, unless the average of the self-selected votes is significantly different from the average from the randomly-selected votes, in which case the self-selected votes are ignored. Hopefully this would satisfy most users and preserve the "community" feel of the site, and only a spoilsport would point out that counting the self-selected votes only if they agree with the randomly-selected votes, is exactly the same thing as ignoring the self-selected votes entirely.

I asked the owner of User/Submitter what he thought about this. He was willing to talk with surprising candor (except about things like his real name) and spoke as if he'd like nothing better than for Digg to make changes to their service that would block his system from working. To both Annalee Newitz and me, he said, "We find it interesting that Digg still allows anybody to view any user's diggs. By way of this 'feature,' User/Submitter is able to verify that our users actually digg the stories they're given. Without this feature, Digg users are given complete digging privacy, and User/Submitter cannot exist." Some have expressed skepticism that the Digg cheaters really want Digg to fix the problem. But as a security tester, I can understand that mentality. If you report a problem, and a company doesn't fix it, eventually you get tempted to publicize the problem to draw attention to it. And if they still don't fix it, and it's a fairly benign security hole that merely enables some pranksters to get some undeserved attention, why not build a service around exploiting the hole, if will highlight the problem and encourage it to get fixed?

So I'm going to go out on a limb and say the U/S guy sincerely wants Digg to be more secure. However I disagree with him about his proposed fix, that of hiding a user's digg history. First of all, it won't stop anyone who creates a multitude of accounts all under their control -- you can use Tor to make it appear that you're coming from many different IP addresses, and build up a history of "legitimate" votes before using your votes to push sites deliberately. (Be sure to use different browsers, or vary your User-Agent header if you know how to do that, so that a series of votes from identical browser types doesn't give you away.) If your service does work by paying other users to cast votes, then you could still audit whether they're casting their votes honestly -- for example, create a test story, use 5 sockpuppet accounts to digg it 5 times, then tell your confederate to digg it. If the number of diggs doesn't go up to 6, then you know they're not honoring their end of the deal, and kick them out of the system. As long as most confederates think there might be some chance of getting caught if they don't play along, most of them would probably cast the votes that they were paid for, since it costs them nothing to do so and they wouldn't want to jeopardize their stream of easy money.

I asked the owner of User/Submitter if his service could defeat the random-sampling algorithm I described. "It would slow down our service," he answered, "but certainly wouldn't eliminate it because eventually a U/S User will have an opportunity to vote on a U/S Submission by way of chance." But I don't see how this would beat the algorithm -- some U/S voters would still get to vote on the story, but as long as there are far more legitimate voters than U/S voters, then a random sampling will almost always contain far more legitimate voters. The U/S owner also said, "Randomized voting privileges would be unnecessarily confusing, frustrating, and fragmenting. Not to forget: unfair and undemocratic." Well, you could keep it from being "confusing" or "frustrating" by keeping the existing interface (with the possible addition of a randomly-selected-story box), so that the only changes would be in how the votes are handled under the hood. "Fragmenting"? If anything, it seems to me that the existing Digg/reddit algorithms would be more fragmenting, keeping users within their existing communities of friend who vote for each others' stories; a random-selection box would give stories with "crossover appeal" a greater chance of success, bringing them to the attention of users who might otherwise never have seen them. As for "unfair and undemocratic", presumably this is a reaction to the fact that the votes of 100 users decide what everyone else sees. But it's already the case with Digg that the votes of a small number of users decide what content becomes popular. At least with a random sample of users, it would be the case that the vast majority of the time, the voting outcome would be the same as it would have been if the entire site had voted, due to the magic of representative sampling.

So, I'm putting this suggestion out there for the same reason that Jim Messenger bought out Spike The Vote -- because I don't want sites like Digg and reddit to be manipulated by the abusers. In fact, if they used this algorithm, they would become more meritocratic than they are now, because the systems would strictly favor the highest-rated content, instead of content written by people who have informal networks of friends who can all go digg their stories for them. If I were to design the user rating system to make it cheat-proof, these are the exact details of what I would do:

  • Wherever they decide to post the "random story sampling" box (on the front page, or on a link off to a separate page, etc.), have it work so that as soon as new stories are submitted, they can be rotated into that box and displayed to a random set of users, until it's reached its total of 100 votes or however many are required to get a random sample.
  • You can have "shutout voting" to kill off stories early that are obvious spam or otherwise really useless, without going through the full 100 votes. (For example, if 90% of the first 10 votes are negative, then stop collecting votes.) This decreases the number of users "inconvenienced" by really obvious spam and other garbage.
  • For someone to submit content that gets rotated into that voting process, have them submit a Turing test (read numbers off of a graphic and type them in), or something similar. This prevents spammers from submitting spam content over and over just to have it viewed by those initial 10 voters. If they have to type in a number each time, it's not worth it.
  • When users give votes to a story, give them the option to say why they voted the way that they did. (This is especially valuable if they're giving negative votes, then the submitter would know what to improve.) Personally I think the comments would be more valuable if each user can't see other users' comments, at the time they submit their own comments; this prevents the "me too" effect where everybody echoes the first two commenters. (When I ask for independent comments from people, and they almost all say the same thing without seeing each other's comments, that's when I know they have a point!)
  • To prevent an attacker from having their own username hit the random-voting page over and over in hopes of voting up their own content, make sure that each user account is only allowed to vote on a given piece of content once (even if they found the content through the random-story page).
  • Require a Turing test for new user signups. This would prevent an attacker from registering a huge number of accounts just to hit the random voting page with different users over and over, in hopes getting to vote on their own submitted content eventually.

Then after running this system for a while, look through some collected data to determine if the system could be more efficient. For example, do you really need a sample of 100 votes every time? Suppose you determine that in 99% of cases, you get the same result just from tabulating the first 50 votes, as you would have gotten from tabulating all 100 votes. Then you could modify the system to collect only the first 50 votes, and then make a decision.

Suggestions for improvement? Flaws (hopefully not fatal)? Everyone who cares about keeping community sites like Digg free from abuse, and who wants to create a path for the best content to rise to the top, let's put our heads together and see what we can think of. The above is intended merely as a jumping-off point, and although I've worked it over and I can't see any specific points to improve efficiency, that's probably just because I've been looking at it too long. And if you Digg this story for me I'll give you 1,000 times as much cash as I gave my Mom last Mother's Day.

This discussion has been archived. No new comments can be posted.

How to Stop Digg-cheating, Forever

Comments Filter:
  • Re:A good design (Score:5, Interesting)

    by networkBoy ( 774728 ) on Monday April 30, 2007 @11:16AM (#18927903) Journal
    Or, to combat the moderation services, just sell the top two positions (ala google) and be done with it.
    People can preference it out as it could have the topic :paid: or dis-allow that and risk alienating part of your user base.
    revenue and it neuters the outside manipulators.
    -nB
  • by Minter92 ( 148860 ) on Monday April 30, 2007 @11:21AM (#18927955)
    I was really big on digg back in it's early days. It started as a small community of web devs and techy people. The early digg was dominated by good tutorial, cheap deals, and other stories of interest to web devs. Sadly it quickly went down hill once it became popular. It got overun with myspace kids and weird political conspiracy theorists. By about Sept 2005 Digg was dead. All my friends had left and there are only soo many videos of some dude getting hit in the groin, and so many "Bush is a moron who happens to be an evil genius" that intelligent people can take. Digg is the wasteland of the internet a collection of myspace level users posting stupid content. Intelligent users fled digg over a year ago.
  • Digg is lame anyway. (Score:5, Interesting)

    by bcrowell ( 177657 ) on Monday April 30, 2007 @11:24AM (#18927985) Homepage
    Digg is pathetic. The concept is democracy gone crazy, like those idiotic TV shows where the audience votes for who's the best performer. Whatever slashdot's shortcomings in other areas, at least they have paid editors who work at it like a real job. Reading the typical comments on digg, they all seem to be by high school students who think they know the secret to tabletop nuclear fusion, and they're all voting each other's posts up and down like crazy, based on nothing but their own biases. At least with slashdot moderation, posts are likely to be moderated by randomly chosen people who didn't get handed a license to go around voting their friends up and their enemies down.
  • Re:tl;dr (Score:5, Interesting)

    by mabhatter654 ( 561290 ) on Monday April 30, 2007 @11:47AM (#18928253)
    For starts this is a big "told you so" to Kevin from CmdrTaco... Kevin originally approached Taco about adding what would become Digg to Slashdot and Taco said it would never work... Both were wrong! Digg does work but would be much better if it took from Slashdot's experiences. This whole post is about the Digg/Slashdot p*** match. oh well, it gets page hits!

    back OT, Digg's focus is to "Digg" around on the internet and find interesting stuff. Where Slashdot is about Quality, Digg is all about Quantity and cleverness. I find myself on Digg more (to follow the links than to discuss) because Slashdot has slowed down.. every time somebody post Lego robots, or weird news, etc (the "news for nerds" part of the slogan!) it gets flamed as "stupid" or not a "relevant" issue to the Slashdot (stuff that matters) "agenda". I've noticed from Firehose that clever quirky stuff just isn't making front page anymore even if it gets hits.

    I'd say each site has it's place. In a lot of ways Digg "gaming" doesn't hurt if it doesn't happen too much as it's usually interesting stuff if somebody wants it there enough to go to all that work. On the other hand, Slashdot has better discussion. Slashdot has that cool factor of actual industry insiders that will show up and post... that's way cool and they only do it because Slashdot has that history. If Slashdot wants to remain "relevant" there's the ticket... get more actual people from the industry to post, answer questions, etc... that's where the NEWS really is.

  • Re:A good design (Score:3, Interesting)

    by Lockejaw ( 955650 ) on Monday April 30, 2007 @11:48AM (#18928269)

    Slashdot's readership tends to mod up posts they agree with and mod down those they don't, regardless of whether the posts actually further the discussion on the topic.
    I won't say Slashdot doesn't have that tendency, but... have you ever been to Digg?
    Any time I start to feel like Slashdot's moderation system is messed up, I either go to metamod (and do what I can to fix it) or to Digg (and then run screaming back to Slashdot).
  • DIGG - dead to me (Score:3, Interesting)

    by Stavr0 ( 35032 ) on Monday April 30, 2007 @11:55AM (#18928363) Homepage Journal
    If a DIGG's popularity is set with bought-and-paid-for schemes and/or astroturfed, then it has zero value to me as an aggregator. If they don't fix the problem, more an more people will realize this and it will die.
  • by sponga ( 739683 ) on Monday April 30, 2007 @01:49PM (#18929965)
    It has its highs and lows.

    For awhile it was chaos, than the spammers showed up, than the guys advertising their blogs than the Ubuntu zealots showed up; it got bad when the page was almost filled with Ubuntu articles they knew they had to do something which is why we don't get spammed by every new taste of Linux also. There was a time when it was first released we would get great articles from places like hackaday.com and other cool DIY stuff.

    I still enjoy the site because I can quickly glimpse over a retarded article that obviously has no interest to me; I think of it like the spam I get in the email and how quickly I can look past it to go on with my life.

    Here are some of the articles up there now and actually are pretty good stuff; I mean they got the Google love that used to go on around here, IT stuff, censorship and other nerd stuff.

    Digg CommentSpy
    Top 5 Creepiest Robot Clones
    Fatsecret: A Site for Fat People
    New York Times Confirms Google Phone
    10 Unexpected Uses of the iPod
    MySpace China, a place for censorship
    Top 100 Most Influential People in IT
    An interesting way to explore DIGG by walking through the site-Visually
    Powering 4000 Homes: One Wind Turbine (PHOTOS)
    Dodge Challenger preproduction body shells caught on camera [PICS]
    The RIAA's worst nightmare: computers that understand music
    12 Ways to Be A Security Idiot - A Slideshow.
    150 Photos of Various Spiral Shapes (ooh pretty)
    PyDigg - A Python Toolkit for the Digg API
    The Travails of Tracking Web Traffic

    Kind of get sick of Slashdot and having to hear how awful the U.S. is today, how many of my rights are being stripped away and how the socialist over in Europe have it so much better than America.
    DIGG is agenda free majority of the time when selecting its articles which I enjoy, but also like to hear the deeper discussion with /. on the issue.

"The one charm of marriage is that it makes a life of deception a neccessity." - Oscar Wilde

Working...