Forgot your password?
typodupeerror
The Internet

Journal leviramsey's Journal: A meditation on Slashdot and Kuro5hin 3

A meditation on commenting systems, user bases, databases, rating systems and other topics.

In the debate between partisans of Scoop and Slash over which system is better, one topic comes up regularly: scalability. Scoop partisans will point to Slashdot, often specifically pointing to its additive moderation scheme, and argue from there that Slash cannot scale to the level of a large site.

This certainly seems to be the case. One need only look at the crap that gets rated to 5 or the gems that get -1'd to see that something seems wrong. However, does Scoop really scale as well as Slash?

Although neither site makes hard specifics readily available, K5 and Slashdot (by far the leading sites using Scoop and Slash, respectively, and the sites that spawned their codebases) seem to be running on fairly similar hardware: multi-processor Intel servers with good quantities of RAM, disk space, and network connectivity, most likely utilizing at least one or two database servers, on top of Apache/mod_perl servers.

K5's userbase is almost definitely smaller than Slashdot's. The highest realistic number of current K5ers (distinct people who have done something on the site in the past few months), is probably on the order ot 25,000. Slashdot almost certainly has at least 3 times that number. However, K5 seems to be the one having scalability problems from a hardware perspective. How long has comment search been dead? How often has rusty had to make articles static? How many times in recent months have we received the dreaded messages that the mod_perl server is down?

Slashdot does not seem to have these issues, not to the same extent as K5. Yes, Slashdot does die occasionally. Occasionally all you can get is a static page. But these seem to happen less frequently than general slowdowns and deaths of K5. Further, K5 uses the vaunted InnoDB table system of MySQL, while Slashdot uses MyISAM, which are uniformly considered to be poor.

However, this strengthens that argument that Scoop doesn't scale as well as Slash. Even with the advantage conferred by using a superior database engine, Scoop seems to hit a choking point, in terms of number of users, [at least] three times faster than Slash. If we levelled the playing field, by removing the database variance, how much worse would Scoop fare?

The major difference, from a system standpoint, between Scoop and Slash is the comment rating system. Scoop's system is routinely held to be far better than Slash, as averaging ratings works better than additive ratings. However, it seems reasonable to conclude that the comment ratings, by their sheer numbers, are hurting K5's performance and scalability. It would not surprise me to see that K5 has, in it's ~3 years of operation, accumulated significantly more moderations than Slashdot has since the moderation system was instituted prior to K5's creation. On any given day, I would not be surprised to see K5 having more ratings than Slashdot has moderations.

Rob Malda and the Slash coders have used this as part of the explanation of why they have no plans to implement an averaging rather than additive rating scheme. However, this does not change the fact that, in general, averaging schemes will scale better (in the sense that they provide a better predictor of comment quality as the number of ratings increases) than additive schemes, or at least the type of scheme preferred by Slashdot.

On further examination, I see no reason that an additive scheme cannot scale as well as an averaging scheme, if properly implemented. The problem is that Slash's additive scheme is destroyed by one single feature: the narrowness of the rating band.

Since the moderation system arose out of primordial Perl, Slashdot has rated comments on a strict spectrum of integers between -1 and 5, inclusive. The stated goal of the system is to have the gems rise towards 5 and the crap to sink towards -1. However, lots of crap gets to 3 and higher, while gems remain stuck at -1, 0, or 1.

The core problem with Slashdot's moderation system is that it only requires four more people to like it than to dislike it to be considered a gem. Conversely, a controversial post only needs 2 more people to dislike it than to like it. As the population of those who vote grows, it is a natural consequence that the position of a given post on the scale will become more and more random, and thus unsuitable as a predictor.

This is known at K5; one need only look at the portion of the site that uses additive rating (the submission queue) to see that the post/kill threshold is growing. Scoop's coders seem to realize this fact. It seems to have dawned on the Slash coders as well. However, the Slash team's response seems to be the stupidest possible means of responding: they decided to cut the moderator pool down. Metamoderation is obviously designed to identify "bad" moderators and remove them from the system. When metamod stopped being able to filter the bad, the editors decided it was because of trolls infiltrating the metamod system. They began revoking the metamod rights of metamodders who "metamoderate badly". In one swoop, they blacklisted over a hundred moderators for moderating "incorrectly" in one thread.

The problem with doing that, in such a public manner (users like sllort and FortKnox quickly discovered what was going on), the editors invited a backlash. Save for rare comments in Slashdot journals, they have not responded in any way to the complaints and criticism.

Until recently. Apparently, many who were banned from metamoderating have received privileges again. Those who lost moderation ability seem to have regained it. What happened?

I suspect that this is a continuing realization that the rating system is broken. Slashdot's editors (who do a large portion of the moderation, by even their own accounts) have seen that even cutting down the moderator pool has little effect. They are in the process of junking the additive scheme. There are no signs of a Scoop-esque averaging scheme in the works. What will take the current system's place?

Approximately one year ago, Slashdot instituted a whitelist/blacklist system: the Friend/Foe system. Users can create lists of friends and foes, giving bonuses or deductions in comment score to members of either list. In essence, killfiles have been added to the system, through a hack to the additive scheme.

However, a friend/foe system as originally implemented doesn't have a huge effect. Presumably, one would only make someone a friend if they notice a large number of quality posts from you. Enter collaborative whitelists and blacklists: friends of friends and foes of friends. At this point, no means exists to assign a modifier to friends or foes of your friends. However, why else would Slashdot implement a feature which adds lots of DB queries unless they are planning to use it for something. The only use I can see is to provide collaborative whitelist/blacklisting. The existence of accounts whose sole purpose is to list trolls is confirmation of this plan.

So Slashdot is abandoning its old system and becoming like Advogato, only with more comments. But is this really necessary? Wouldn't it just be more effective to retain the current setup but remove arbitrary limits on comment scores?

This article has been crossposted to Kuro5hin. Discussions may go on at either site.

This discussion has been archived. No new comments can be posted.

A meditation on Slashdot and Kuro5hin

Comments Filter:
  • Slashdot does not seem to have these issues, not to the same extent as K5. Yes, Slashdot does die occasionally. Occasionally all you can get is a static page. But these seem to happen less frequently than general slowdowns and deaths of K5. Further, K5 uses the vaunted InnoDB table system of MySQL, while Slashdot uses MyISAM, which are uniformly considered to be poor.

    Slashdot uses InnoDB as well. The scalability problems go far beyond trivialities like the comment rating system or MySQL table type. In my experiments, Slashcode actually utilises more CPU per dynamic pageview than Scoop, primarily because it uses TemplateToolkit and a crufty backend which utilises a "slash daemon" in another process. The only reason Slash scales better than Scoop is that it makes heavy use of caching and static page generation.

    90% of a weblog's traffic comes from anonymous users who only read the frontpage and maybe a handful of articles. Slashdot was built around this assumption, and for Anonymous Coward index.pl and article.pl URLs redirect to pregenerated static pages, which obviously produce little or no load on the server. Scoop on the other hand dynamically generates everything; while each individual page may be cheaper to generate than a Slash page, it has to generate about five times more pages to serve everyone, both logged-in and anonymous.

    Scoop has the foundations for static page support, but it's still very much in its infancy. If static page generation were as mature and well-integrated as it is in Slashcode, Scoop would very like scale even better than Slash.

  • I don't see page wideners, I don't see fr0st pissers, and I don't see goatse links.

    Moderation implemented to remove trolls, trolls removed, problem solved. (this is going by the /. definition of a troll, what the rest of the world calls crap flooders.)

    As for scalability, I would like to see Kuro5hin handle all the crap that is posted onto /. on a daily basis!!! Then again some of klerk's crap flooding sprees would have brought down almost any database out there. . . .

A year spent in artificial intelligence is enough to make one believe in God.

Working...