Unusual HTTP Requests For robots.txt? 17
Fooster asks: "I edit several (mostly) unrelated Web sites hosted on a Linux virtual hosting machine running Apache. Often in an idle moment between edits, I'll watch my logs with a 'ail -f access &'. Today, I started to get bursts of requests for robots.txt from several different major service provider IP blocks that were almost simultaneous. Some time later, I'd get another burst, with some of the requests coming from different IPs. All in all, I had over 100 times more requests for robots.txt today than ever before in one day. Unlike most search engine robots.txt requests, there was no info in the referrer field and a reverse DNS lookup did not lead me back to a search engine info provider. I found the requests to be coming from blocks owned by ISPs like Qwest, AT&T, BBN and others. A cursory examination of the literature revealed no reports of exploits based on robots.txt, so I decided to 'Ask Slashdot.' Have any other Webmasters noticed this? Am I just being paranoid? Take a look at the logs yourself, and let me know please."
That's normal. (Score:1)
There's been a retro-80's movement going on lately, so everyone's looking for that 'robots.txt' mp3; I think it's by Styx.
Just put up a notice on your pages that says in big letters "We don't have the 'robots.txt' mp3; look for it on eBay". That should do it.
batten down the hatches, cap'n (Score:1)
Either that, or someone's abusing robots.txt by culling its info, and noting it for interesting things for manual perusal at a later date.
shields up, red alert.
Here's some more info (Score:1)
It turns out that all the requests were for the robots.txt file in the default web space my host sets up for every account. I have five domains registered and working under that account, but had never paid any attention to, published any links to, or placed any files into that default directory. What's more, I never even made it world readable, thus the 403s. I've since fixed all of that, and placed a redirection page in that directory to shuffle requests off to my vanity page, but I haven't seen any more requests like those. I have seen a few browser requests from /. readers, but no more request bursts like those.
Thanks for all your suggestions, even the stupid ones gave me a laugh.
Re:Here's an actual robots.txt (Score:1)
Re:Looks like IP Spoofing (Score:1)
Here's an actual robots.txt (Score:1)
User-agent: * /snapshots/ /cvsweb/ /cgi-bin/ /pub/ /doc/
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Re:That's normal. (Score:1)
who knew....
Count me in.... got the wierdass logs too (Score:1)
208.51.235.81
4.20.90.81
206.229.153.81
206.64.105.81
206.191.170.226
206.98.113.81
12.27.166.81
[snip]
68 hits total. Most of the addresses seem to belong to AT&T or Internap.
Re:Some days ago I suffer from the same (Score:2)
/var/log/httpd/access_log.4:204.123.28.10 - - [20/Sep/2000:02:10:05 -0400] "GET
/var/log/httpd/access_log.4:208.47.242.41 - - [20/Sep/2000:02:23:02 -0400] "GET
/var/log/httpd/access_log.4:12.27.166.41 - - [20/Sep/2000:02:23:02 -0400] "GET
/var/log/httpd/access_log.4:206.229.153.41 - - [20/Sep/2000:02:23:02 -0400] "GET
/var/log/httpd/access_log.4:206.98.113.41 - - [20/Sep/2000:02:23:02 -0400] "GET
/var/log/httpd/access_log.4:4.20.90.41 - - [20/Sep/2000:02:23:02 -0400] "GET
/var/log/httpd/access_log.4:206.64.105.41 - - [20/Sep/2000:02:23:02 -0400] "GET
/var/log/httpd/access_log.4:216.52.254.37 - - [20/Sep/2000:02:23:02 -0400] "GET
/var/log/httpd/access_log.4:216.52.254.37 - - [20/Sep/2000:02:23:02 -0400] "GET
/var/log/httpd/access_log.4:208.47.242.41 - - [20/Sep/2000:02:23:02 -0400] "GET
/var/log/httpd/access_log.4:207.95.133.41 - - [20/Sep/2000:02:23:02 -0400] "GET
I think it would be useful to blackhole any attempt to get robots.txt from anybody who doesn't give a referrer string. Not just give them a 404, but just don't respond at all to the request. Is this possible in Apache?
Spoofing is mostly impossible with TCP (Score:2)
Say host A is connecting to host B. This needs to happen in order to have a successful connection:
So, I would say a bunch of hosts really are requesting robots.txt for some weird reason (still perhaps security-related, but not spoofing). Someone correct me if I'm wrong, but I'm pretty sure about this.
Re:That's normal... (Score:2)
I've seen this too.. (Score:2)
Some days ago I suffer from the same (Score:3)
206.229.153.121 - - [19/Sep/2000:15:14:01 -0300] "GET
206.64.105.121 - - [19/Sep/2000:15:14:01 -0300] "GET
206.98.113.121 - - [19/Sep/2000:15:14:01 -0300] "GET
208.47.242.121 - - [19/Sep/2000:15:14:01 -0300] "GET
208.47.242.121 - - [19/Sep/2000:15:14:01 -0300] "GET
12.27.166.121 - - [19/Sep/2000:15:14:01 -0300] "GET
route.ocy.pnap.net - - [19/Sep/2000:15:14:05 -0300] "GET
route.ocy.pnap.net - - [19/Sep/2000:15:14:05 -0300] "GET
207.86.73.121 - - [19/Sep/2000:15:14:08 -0300] "GET
4.20.90.121 - - [19/Sep/2000:15:14:17 -0300] "GET
Seems to be pretty similar.
Basically it was repeted every hour.
a test for a DOS ?
Bye
OverLord
IE "Make Available Offline" (Score:3)
When a user bookmarks a page, they age given an option to "Make Available Offline" which, if selected, pops up some configuration dialog boxes (where they get to choose how many layers deep, etc). It essentially grabs all the code, graphics, etc. and saves it locally.
Personally, I use this function when I don't know if the content is likely to be around for a while. As it is processing, it shows that it is grabbing all sorts of robots.txt files from all over the damned place (especially if it follows a number of links deep).
It's not the brightest of MS's "wizards", so i probably keeps requesting the same one repeatitively when links follow to the same server. Try to check what the HTTP_USER_AGENT
says about that robots.txt file.
If your logs can't tell you, Make php process
to a db or text file, etc.
The HTTP_USER_AGENT
-Andy
Most likely (Score:3)
Better double check your security.
incident list (Score:3)
Hope it helps.
Looks like IP Spoofing (Score:5)
Also, since your robots.txt file says what not to index, that's frequently the list of directories with tasty things that people would most like to hack into. Think about it. What's in your robots.txt file? Things that change too often to be listed in search engine results, or the sorts of things that you don't want out there.
I think you're being probed. Make sure your backups are up to date, and that the box is secured.