Comment Re:ECHELON (Score 1) 321
I set up allot of robots.txt deals that pointed to certain dirs, and within those dirs set up meta tags to discourage robots from following them. sorta an endless trail; in those docs i took snags of things like the communist manifesto, certain key words, and hid the links so noone sane could find them. even a big blatent warning explaining what those pages were for. i was learning how to do mod_rewrite for ISAPI, and decided to have some fun with a huge dictionary database.
each page access trips a logger that shoves everything about the computer, even tcp fingerprint, who hits these auto-generated pages in a database. once a day a job just blocks the ip addresses that have tripped more than 4 requests to those pages.
some really interesting stuff, like KBR's network is in there, lots of "interesting" original ip blocks. then... looking at access logs that other websites have open publically, you can see where these common crawlers go.
if you are looking where the spies are spying, you might not want to be seeing what's there.
each page access trips a logger that shoves everything about the computer, even tcp fingerprint, who hits these auto-generated pages in a database. once a day a job just blocks the ip addresses that have tripped more than 4 requests to those pages.
some really interesting stuff, like KBR's network is in there, lots of "interesting" original ip blocks. then... looking at access logs that other websites have open publically, you can see where these common crawlers go.
if you are looking where the spies are spying, you might not want to be seeing what's there.