Forgot your password?
typodupeerror
User Journal

Journal the_mad_poster's Journal: Alpha Code: Journal Archive Script 14

Get it here and rename the extension to .pl (the server would try to execute a .pl file).

It's perl, so if you don't have Perl, you'll need to get it. If you're on Windows, ActiveState Perl is easy to install as any other Windows program.

To invoke it:

perl getslash.pl -a

If you don't include the -a switch, I'll not be held liable for where it puts the archived files...

Optionally, you can specify an archive location with the -a switch right on the command line:

perl getslash.pl -a/home/user/slashstuff

You can also specify your user name on the command line:

perl getslash.pl -Nyou -a/home/user/stuff

So, to archive all my entries, I ran:

perl getslash.pl -Nthe_mad_poster -ac:\foobar

If you have a space in your username, you'll need to wrap the entire chunk with -N in double quotes. If you don't specify the archive location or your name, you'll be prompted (but you still need the -a switch regardless). It SHOULD work on Linux, BSD, or Windows, but I've only tried it on Windows at the moment. It will only archive comments that are above the 2 threshhold right now - the rest is coming in beta ;)

Complaints, requests, and free beer can be sent to my public email address or posted here.

Update: Comment Ripping

Alright, I've been studying the HTML that /. uses for commenting. What do people think about just pulling a seperate page named, for example, 45833(comments).jrnl that would be comprised of a flat mode listing of all of the comments in the JE at a -1 (default - a command line switch could specify a threshhold) threshhold?

This discussion has been archived. No new comments can be posted.

Alpha Code: Journal Archive Script

Comments Filter:
  • taco's checks for script usage; esp if you decide to rip comments as well (which is what I think you just said in your last section there?)
    • I don't think this will be a problem. CmdrTaco hunting for scripts that actually threaten slashdot's performance, and I don't think this qualifies. I've run a script [slashdot.org] that would use more of /.'s resources than this one and haven't had any problems yet.
    • As long as people don't abuse it individually, it shouldn't be a problem. It's set up not to rip the entire journal every time, so the only big hit would be the first one. After that, it will only pull the new entries that you haven't archived unless you force it to do otherwise (makes it cron-friendly).

      Maybe I'll add a "stealth" type of functionality (a la nmap) that delays each action. Maybe put a 5 second delay in between page hits for really big jobs.

  • Use of reserved word "our" is deprecated at getslash.pl line 9.
    Global symbol "$environment" requires explicit package name at getslash.pl line 9.
    Execution of getslash.pl aborted due to compilation errors.
    Using perl 5.005_03 on Linux
  • Wow. So far 5 out of 6 comments in this JE are from me.

    Anyway, a suggestion: add leading zeroes to the (shorter) filenames so they sort in the correct order.
    • Heh, that's just because you're a Slashdot fogie and have shorter SIDs than the rest of us whippersnappers ;)

      I'll set the default filename to be 12 characters long so that it allows for 1 trillion SIDs (assuming 0 is a valid SID).

      • Is the current version up to date with the changes you've discussed in the article?

        Personally, I think you should let sourceforge host the project, and save yourself some bandwidth;)

        As far as comments, I prefer nested myself. Perhaps user toggleable, as well as threshhold?

        If anybody will trip the sensors on overuse on the first run, it would probably be em emalb or Sam the Butcher. (Sorry Em, too lazy to go back and fix capitalization on your name.)

        • The problem with using anything other than flat mode is that you have multiple page hits. Deep nested structures - which are common in TechnoLust's, Em's, and StB's entries - would ramp the page hits up on the Slashdot server like you wouldn't believe. I mean, Slashdot pisses me off and all, but I still wanna be nice to their server :)

          I'll work on instituting a Nested mode that actually pulls at -1 Flat and then reconstructs the proper order in Nested format using the parent links and CIDs of the pulled co

  • by gmhowell ( 26755 )
    662 journals. Can I still post?

    (Oh, and any chance of making the extension .html instead of .jrnl? Or at least explain choice and/or tools that use .jrnl)
    • Wow.... how many megs of data did it pull, out of curiosity?

      .jrnl is going to be the file that gets passed into another script which spits out valid HTML files. It's kinda like an object file for the HTML sanitizer that doesn't exist yet ;)

      • I forgot to look to see how much data it grabbed.

        But, us low UID bastards have been spewing journals for a long time:)

        You might want to talk to TechnoLust; I think he was going to work on a similar project.

Karl's version of Parkinson's Law: Work expands to exceed the time alloted it.

Working...