Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Slashdot.org

Journal James A. A. Joyce's Journal: Slashbot Script Use 3

My slashbot script works by using curl to repeatedly download the 'light' version of the Slashdot front page at regular intervals. The thing is, how frequently can you do this before you get banned?

I originally set out to use the Slashdot RDF file. Unfortunately, you can only get away with about 60 viewings of it per IP per day before being banned for 72 hours. This is about every 25 to 30 minutes, making it useless for checking new story postings.

Then I figured I might as well use the front page, despite the fact that it's a lot more bloated, but one can get away with refreshing it every 30 seconds for an hour or two nonstop. (I know this from bitter experience.) So now the front page looked a logical choice. And it still does; use the light version and that's only 30KB a pop.

But how frequently can one refresh the front page continuously for a day before one gets banned? I only have a 56K modem link to the Internet which I disconnect every hour or two, so there's not much scope for consistent hammering of the Slashdot servers. But if the script gets longtime use on a broadband link, then hypothetically it could retrieve the page every 20, 10 or even five seconds, and that would be 20 downloads per minute totalling 600KB. Is there any point in doing so, and could you get away with it?

I suspect that you probably would. The default setting for slashbot is to reload the page every two minutes until a Mysterious Future story message appears, at which point it reloads every one minute. This is almost certainly a setting which is safe to use for 24 hours nonstop; Slashdot hands out literally a million pages a day with a good number of those probably for the front page; in this case 800 page views from one IP wouldn't be too extraordinary. On the other hand, if slashbot checked three or four times as frequently I think it would be at risk of getting you banned. The current 120/60 time settings are a nice compromise which allow a small buffer zone without risking your access.

Notably, the default homepage only includes some stories from subsections, such as Games or Ask Slashdot. As far as I can tell, there is no GET variable you can pass to index.pl to get it to display stories from all sections. Nor, I believe, can you change the number of stories displayed to just the first few as opposed to a dozen or two without setting up a user. To get around this, I recommend just signing up a new dummy user for yourself, changing its settings to "collapse sections" and to only display three or four stories. Then copy or manufacture a cookie for that user and then you can use that cookie with slashbot to make it only get the new, desired home page.

Another thing I'm considering is the Slashdot page for Palm pilots and other such small-screened devices. This page would be delightful for use due to how bare it is - it fits comfortably inside one kilobyte! The problem? Since the Palm portal is rarely mentioned anywhere, I don't know how tolerant of abuse the Slashcrew are regarding it. Oh, and the HTML's all on two lines, making it harder to process with just egrep and sed.

This discussion has been archived. No new comments can be posted.

Slashbot Script Use

Comments Filter:

Work without a vision is slavery, Vision without work is a pipe dream, But vision with work is the hope of the world.

Working...