How to StumbleUpon StumbleOver and StumbleOn

Journal RyoShin's Journal: How to StumbleUpon StumbleOver and StumbleOn

Journal by RyoShin on Monday September 17, 2007 @01:54AM

Like many on the internet, including other /. members, I am a user of StumbleUpon. For those who don't know what StumbleUpon is, the short and simple is that it is a gateway to the internet at large. It's great for lazy afternoons when you just want to find new webpages. You set some preferences, some content filters, and boom, you're off. It covers topics from architecture to zoology, and most everything in between. A great way to read new and interesting scientific discoveries or watch sleeping cats fall off of whatever shelf they happen to be sleeping on. Or both, if that's your thing. But not at the same time.

However, StumbleUpon reveals one of the larger annoyances of the internet: data redundancy. Site after site, blog after blog will host the same content (usually video or pictures, but it can even be word-for-word text), meaning that you'll wind up Stumbling Upon it time and again- and it really gets grating after you see the eighteenth LOLcat collection. To my knowledge, SU has no way to deal with this. You can rate things up or down and perhaps have less of a chance of seeing them, but that's not always the case.

To this end, I feel that StumbleUpon would do well to introduce two new features: StumbleOn and StumbleOver. Both features would be user preferences. You could choose to StumbleOn, StumbleOver, both, or neither (seeing the internet in a pure, unadultered form).

StumbleOn is a feature that would reference all citing pages to the main page or site that the citing pages talk about. This is the harder of the two features to implement. The idea is that instead of stumbling upon a page that is either a rehash or just a quick blog entry about another page (usually done for ad hits), you would instead be redirected to (or On) the original page. Slashdot will see things like this- a summary for an article will contain a link to a blog that contains a link to the actual article. StumbleOn would cut out the blog entry, giving focus where it is rightly due: the original authors.

As stated, this is harder to do. Some things see circulation for so long that pinpointing the "original" is tedious (assuming it still exists). Then there are sites that jump up simultaneously, such as the smattering of lolcat sites that appeared within a few days/weeks of each other. Content can give some help. For instance, if a blog entry directly links to the original, you know you can StumbleOn to that original. Perhaps the video being shown lists a URL to use; failing that, you could StumbleOn to where it's hosted on Youtube/MediaCafe/whatever.

Part of the problem here is ballot stuffing. Someone might get a bunch of friends/paid hacks to all say that that person's site is the "original", though it would clearly be just a lame blog entry for ad hits. But, as with most systems like this, it can be overcome with other user adjustments. Then there's the risk that a blog entry that is actually useful, like dissecting a video or giving further insights, gets marked as StumbleOn. A second level might be introduced for these, but that would start making this very complex.

StumbleOver is likely easier to implement, and, in my opinion, far more useful of the two. In the case of StumbleOver, you don't care what the original site is. You only know that you've seen it before and, even if you liked it, don't want to see it again from another site. Whereas StumbleOn would be pictorially represented as a tree, with one main site (the "root" site) being lead to from many others, StumbleOver would be seen as a nice, round circle. By seeing one part of the circle you've seen them all, so you don't need to see them again. This would lead to a lot less repetitiveness in your stumbles.

However, this is not without it's own problems- how specific should content be measured? Most would agree that a word-for-word copy, a single image or set group of images, or a video would all be easy to StumbleOver. But what about a blog entry that restates the original text in the user's own words? Is one lolcat page with 10 images the same as another with 15? (This case can be kind of solved with StumbleOn, using something like icanhaschezburger as the main source) What if someone has a higher quality version of another's video (quite unlikely, but possible)?

These aren't perfect ideas, and I have no idea how to submit them to StumbleUpon, but I think they would make great strides in making StumbleUpon a better product and the internet easier to browse.

This discussion has been archived. No new comments can be posted.