Millions of Pages Google Hijacked using ODP Feed 427

Posted by CmdrTaco on Wednesday March 23, 2005 @11:36AM from the well-this-isn't-going-well dept.

The Real Nick W writes "Threadwatch reports that millions of pages are being Google Hijacked using the 302 redirect exploit and the ODP's RDF dump. The problem has been around for a couple of years and is just recently starting to make major headlines. By using the Open Directory's data dump of around 4 million sites, and 302'ing each of those sites, the havoc being wreaked on the Google database could have catastrophic effects for both Google and the websites involved."

This discussion has been archived. No new comments can be posted.

Millions of Pages Google Hijacked using ODP Feed

Load All Comments

Search 427 Comments Log In/Create an Account

Comments Filter:

Ugh. This is so not true. (Score:2, Informative)

by GoogleGuy ( 754053 ) * writes:

This is a placeholder. I'll include more details of why you shouldn't listen to Threadwatch.org in a bit, and debunk this some. Let me get this posted and I'll follow up.

(Yes, I am GoogleGuy.)
- Re:Ugh. This is so not true. (Score:5, Funny)
  
  by Solder Fumes ( 797270 ) writes: on Wednesday March 23, 2005 @11:43AM (#12024117)
  
  This is a placeholder rebuttal, I'll post why your arguments are COMPLETELY STUPID after you actually post them.
  
  Parent Share
  twitter facebook
  - Re:Ugh. This is so not true. (Score:5, Funny)
    
    by ari_j ( 90255 ) writes: on Wednesday March 23, 2005 @11:46AM (#12024185)
    
    This is a placeholder troll. I'll post why you are an idiot and why Google r0x0r5 after you post your rebuttal but before I read it, as well as before I read the argument you are rebutting or the article.
    
    Parent Share
    twitter facebook
    - Re:Ugh. This is so not true. (Score:3, Funny)
      
      by NanoGator ( 522640 ) writes:
      
      This is a placeholder karma whore. I'll post about how this is really part of Microsoft's grande evil plan. Best part is, I'll get a higher score than any of you!
    - - Can anybody provide a working example? (Score:3, Interesting)
        
        by turnstyle ( 588788 ) writes:
        
        Is there a specific search that someone can suggest that would demonstrate this problem?
- Re:Ugh. This is so not true. (Score:2, Troll)
  
  by yagu ( 721525 ) writes:
  
  This is a placeholder.... I'll include more details of why you shouldn't believe the NEXT slashdot article.... Let me get this posted.... and I'll follow up! (Hey, if the other guy can get modded informative for that.... this, since it's for a future article ought to be insightful). And, no, I'm NOT a GoogleGuy.
- Re:Ugh. This is so not true. (Score:2, Insightful)
  
  by Anonymous Coward writes:
  
  Wow, getting modded up just for leaving a message on our answering machine! I guess it's true, just like with Wil Wheaton, if you claim to be (or are) someone of alleged importance, you too can get +5 Informative on every post, no matter what you say (or don't)!
- Re:Ugh. This is so not true. (Score:5, Informative)
  
  by GoogleGuy ( 754053 ) * writes: on Wednesday March 23, 2005 @12:14PM (#12024599) Homepage
  
  Okay, I'll talk about this whole "millions of webpages hijacked! Film at 11!" piece of scaremongering. If you RTFA, the author (and the submitter of the story?) claims that some scraper sites have pulled down a copy of the dmoz RDF, gotten the urls, and are doing 302 redirects to sites in an attempt to hijack them. Note that this does not mean that lots of pages were hijacked at all.
  
  Here's the skinny on "302 hijacking" from my point of view, and why you pretty much only hear about it on search engine optimizer sites and webmaster forums. When you see two copies of a url or site (or you see redirects from one site to another), you have to choose a canonical url. There are lots of ways to make that choice, but it often boils down to wanting to choose the url with the most reputation. PageRank is a pretty good proxy for reputation, and incorporating PageRank into the decision for the canonical url helps to choose the right url.
  
  A lot of sites that try to spam search engine indices get caught, and their PageRank goes lower and lower as their reputation suffers. We do a very good job of picking canonical urls for normal sites; sites with their PageRank going toward zero are more likely to have a different canonical url picked, though, and to a webmaster I understand that it can look like "hijacking" even though the base cause is usually your reputation declining. For a long time, it was hard to get anyone to report canonicalization problems, because the site that got "hijacked" would be free-cheap-texas-holdem-plus-viagra-and-payday-loa ns-as-well.com type sites. In fact, I had to offer to ignore the spamminess of any reported sites in order to get people to send in any real data.
  
  But even though I suspected that this issue affected very few sites, we still wanted to collect feedback to see how big of a problem it was, and to see if we could improve our url canonicalization. So starting a while ago, we offered a way to report "302 hijacking" to Google; I mentioned the method on several webmaster forums. You contact user support and use the keyword "canonicalpage" in your report. Then I created a little mailing list with some engineers on it, and user support passes on emails that meet the criteria to the mailing list.
  
  So how much reports has all this work (including posting multiple times on lots of webmaster boards to request data) gotten me? The last time I checked, it was under 30. Not a million pages. Not even a hundred reports. Under 30. Don't get me wrong, we're still looking at how we can do better: one engineer proposed a way that might help these sites, and he's got a testset of sites that would be affected by changes in how we canonicalized urls. A few of us have been looking through it to see if we can improve things, but please know that this is not a wildfire issue that will result in the web melting down.
  
  As a side note, I'm getting a little tired of debunking the source of this story (NickW at threadwatch). For example, he claimed that Google had removed Greg Duffy from Google's index. When I pointed out that he was making an assertion of fact without evidence, he started out revising the story by sprinkling in words like "appears" and eventually pulled the story at http://www.threadwatch.org/node/1822 off his front page. But given that this is the third link to NickW's site from Slashdot in the last couple weeks, I'm guessing that he's tasted the Slashdot effect and wants more.
  
  Parent Share
  twitter facebook
  - Re:Ugh. This is so not true. (Score:5, Insightful)
    
    by Dynamoo ( 527749 ) writes: on Wednesday March 23, 2005 @12:38PM (#12024915) Homepage
    
    You contact user support and use the keyword "canonicalpage" in your report.. So how much reports has all this work gotten me? The last time I checked, it was under 30
    Well shucks GG, not every webmaster is glued to WMW and other forums.. and even if they did the signal/noise ratio on this topic is so low that you probably couldn't find the information even if you were looking. It's hardly an obvious reporting mechanism. Although posting it on /. should help some, so that's appreciated. Thanks.
    But look - what we have here are a whole bunch of webmasters who have been nuked off the face of the earth by 302 redirects and just don't have the technical knowledge to try and fix it. Mom and Pop stores, hobbyists, nonprofits etc etc. These people are just gonna get pasted.. they'll just be wondering why they don't get any visitors any more.
    This is a HUGELY serious problem - and it's getting worse all the time as more and more people deliberately try to exploit the 302 bug. I've been hit by this bug myself, and let me tell you that unless you know EXACTLY what to look for you'd be stuffed - all you'd see is your traffic flatlining.
    The key issue here - and it's the kind of issue that will really, really hit the headlines when it's exploited is redirection. Sure, I can use a 302 and send Googlebot to the correct page.. so first of all I basically 0wn the content of that page not the publisher. *Then* I insert an exploit into the 302 redirect.. and hey presto, I've 0wned hundreds of thousands if not millions of computers. *That's* going to make unpleasant reading for Google when it hits the headlines - "Use Google and Get Owned". Nasty.
    
    Parent Share
    twitter facebook
  - Kindly extract your head from wherever it is (Score:5, Informative)
    
    by ites ( 600337 ) writes: on Wednesday March 23, 2005 @12:38PM (#12024918) Journal
    
    This story does not need "debunking".
    
    What it needs is a rapid and satisfactory answer or Google will find themselves at the receiving end of more angst than they even know is possible.
    
    A concrete example. My company's web site has been in existence since 1995. So we have pretty good page ranking. Our main page has one phrase, very distinct, unique.
    
    When I search for this phrase (in quotes), Google reports hundreds of matches. These sites (except our own) do not contain the phrase but are sites that sell traffic boosting.
    
    The 302 problem is real.
    
    Incidentally, I just spent 15 minutes at Google.com looking for a way to report the problem. Where is that mention of "canonicalpage"? In the bottom shelf of a filing cabinet, behind a locked door that says "beware of the tiger"?
    
    I'm not surprised you got only 30 reports. What I am surprised at is that you appear to speak for Google yet have such an inane response to what is a real (and for many people, a terrifying) problem.
    
    Parent Share
    twitter facebook
    - Re:Kindly extract your head from wherever it is (Score:3, Informative)
      
      by alphakappa ( 687189 ) writes:
      
      Here's where you can file a report [google.com].
    - - Re:OK, an example (Score:3, Insightful)
        
        by That's Unpossible! ( 722232 ) * writes:
        
        The problem you are describing here is not a 302 hijacking. Those sites don't do any redirecting, and they aren't duplicating your site page causing you to be bumped out of the loop. They just happen to have a link to your site and your "motto" on their page. The fact their page comes up before yours does seem stupid, but is unrelated from the 302 hijacking issue.
        
        Re:OK, an example (Score:4, Informative)
        
        by GoogleGuy ( 754053 ) * writes: on Wednesday March 23, 2005 @03:09PM (#12026865) Homepage
        
        Thanks for the concrete example. As someone else pointed out:
        - for the search imatix [google.com] I see you at number one.
        - for the search "Strategic solutions for a complex world" [google.com] I see you at number one.
        - for the search allinurl:imatix.com [google.com], that search (and it's sister operator inurl:) only look for the words in the url. So it's perfectly fine to show results like "real-imatix.com/" because they contain the word imatix. These results are not hijacking results--this is expected behavior for inurl and allinurl.
        
        Hope this helps,
        GoogleGuy
        
        Parent Share
        twitter facebook
        
        Re:OK, an example (Score:3, Informative)
        
        by GoogleGuy ( 754053 ) * writes:
        
        Just to follow-up, I saw your email come through the queue from user support. The engineer who checked it out basically said "They appear at the top of the results when I do a search. Still, just because their website only has that one phrase on it doesn't guarantee that their site will appear at the top of the results." So this isn't a "302 hijacking," but I hope our user support will reply in addition to my post. :)
  - Re:Ugh. This is so not true. (Score:5, Informative)
    
    by Anonymous Coward writes: on Wednesday March 23, 2005 @12:48PM (#12025073)
    
    But even though I suspected that this issue affected very few sites, we still wanted to collect feedback to see how big of a problem it was, and to see if we could improve our url canonicalization. So starting a while ago, we offered a way to report "302 hijacking" to Google; I mentioned the method on several webmaster forums. You contact user support and use the keyword "canonicalpage" in your report.
    
    I'm sorry, but this is a flat-out lie. If you are the GoogleGuy, then there were 1000+ post threads on WebmasterWorld where people were begging you for input, and you essentially disappeared. I think I might remember seeing one post from you about this "canonicalurl" on a short, almost unrelated thread. You certainly didn't make it clear where to send problem reports, at least not on any of the threads that people were actually reading.
    
    The fact is, this is a huge problem, and has totally fucked a lot of legitimate site rankings. I honestly believe Google was doing everything in their power to ignore the problem up until now, hoping that it was just a figment of people's imagination, or worse, that it would help increase advertising revenue. And now that it's turning out to be a PR disaster for you, you're in damage control mode.
    
    I run one of the sites that was affected by the 302 bug. I sent a message to Google about it, and got a canned response essentially telling me there was nothing wrong. I read through no less than 10 threads on WebmasterWorld about this, many with hundreds or even thousands of posts. I saw maybe, maybe, two or three from GoogleGuy. Where were you? Did you somehow miss those threads that spanned 80+ pages??? Why weren't you posting on those threads about this "canonicalurl" thing.
    
    Luckily there was only one site 302-ing me, and they were doing it by accident and were happy to remove me from their directory. Now I'm back up at the top of the rankings. But I know it's going to be nowhere near as easy for many of the thousands of people who are still affected by this.
    
    Seriously, that you would come on here and try to discredit someone for bringing attention to a very big problem with Google is pretty distasteful. To me it indicates either a cover-up or having your head buried firmly in the sand. Either way, it doesn't bode well for the future of Google. Instead of flaming people now that the problem is getting mainstream press, why not try and actually fix things.
    
    Parent Share
    twitter facebook
    - - Re:Ugh. This is so not true. (Score:3, Insightful)
        
        by 1u3hr ( 530656 ) writes:
        
        I wish I had mod points for you. If this was MS, everyone here would be screaming bloody murder. Instead GoogleGuy gets moded +5 Informative
        It's EXTREMELY informative, because it tells you what Google's offical position is. Whether you like it or not, you need to know that. "Informative" doesn't mean "good".
        If Bill Gates posted here in defence of some MS policy, it would hopefully similarly be modded "informative".
  - You got an email from me! (Score:3, Informative)
    
    by pastepotpete ( 870178 ) writes:
    
    And I know two other people who sent one. Maybe you should check again? I doubt me and my mates account for 10% of your responses. If you believe that the people affected by this are all "spammers" then perhaps the problem is false positives for your spam detection filters. In fact you should probably take a look at your spam detection filters anyway. Last time I checked--probably much more recently than you checked for canonicalpage emails, there was a bunch of scraper sites running AdSense where good re
  - Re:Ugh. This is so not true. (Score:5, Interesting)
    
    by metamatic ( 202216 ) writes: on Wednesday March 23, 2005 @01:21PM (#12025496) Homepage Journal
    
    Frankly, I'd like to see Google start blocking content-free traffic-boosting sites from the page results entirely.
    
    Google has login accounts, so let logged-in users have a link saying "report spam site". Track who files the most reliable reports, and if a few of those people all agree that a site is spam, nuke its pagerank.
    
    See how OpenRatings does reliability calculations for more info. Or buy them :-)
    
    Parent Share
    twitter facebook
    - Re:Ugh. This is so not true. (Score:4, Insightful)
      
      by glesga_kiss ( 596639 ) writes: on Wednesday March 23, 2005 @08:15PM (#12030486)
      
      Google has login accounts, so let logged-in users have a link saying "report spam site".
      As an alternative, I'd love a cookie based version of this that you could click "ignore all results from this domain". After a couple of weeks you'd get rid of most of them on your personal browser. Make the lists sharable even. All the pagerank wannabies can do is start from scratch with new URLs.
      
      Parent Share
      twitter facebook
  - OK, I'll bite ... (Score:4, Insightful)
    
    by isometrick ( 817436 ) writes: on Wednesday March 23, 2005 @01:33PM (#12025636)
    
    Look, there *was* circumstancial evidence for the "Greg Duffy" thing ... i.e. just enough to make it a discussion. I agree that fearmongering is not the way to go. I appreciate that you looked into the issue (and my first instinct is to trust your explanation, that is was a DNS issue).
    
    However, if this is Google's PR method, I think you are kind of asking for it! In the absence of information, the internet community will speculate until the cows come home. I'm not saying it's right, I'm just saying that's reality. Even though I said on my site that I thought Google didn't do anything underhanded I bet a lot of people were still not convinced. Google can do a little better than this, and although you have been fairly nice to me (thanks) this response is a little flamebaity for PR. Please understand that I mean no offense, it's just constructive criticism. Even if everything you say is true, a representative of the company should always at least attempt to sugar coat something like your last paragraph.
    
    Also, on a more personal note, maybe Google should embrace the people that are involved [clsc.net] in researching [gregduffy.com] these problems instead of using this broken communications policy. I know that in my case I contacted you guys 5 *months* ago about the Google Print problem I described and never got any followup except for my t-shirt (which I really like). I have some great ideas about possible solutions to the problem I described, and as far as I can see Google has not fixed the root of the problem. When are you guys going to contact me?
    
    -Greg Duffy
    
    Parent Share
    twitter facebook
  - - Re:Ugh. This is so not true. (Score:4, Informative)
      
      by GoogleGuy ( 754053 ) * writes: on Wednesday March 23, 2005 @03:52PM (#12027358) Homepage
      
      One example is http://www.doi.org/, because people want to have a persistent url like dx.doi.org/10.1226/1588290972, but then be able to have that url do a 302 redirect to a destination page like http://doi.contentdirections.com/mr/humana.jsp?doi =10.1226/1588290972 The destination urls might change, so it's handy to have a persistent digital object identifier on doi.org.
      
      Parent Share
      twitter facebook
Robot.txt (Score:3, Insightful)

by superpulpsicle ( 533373 ) writes: on Wednesday March 23, 2005 @11:40AM (#12024068)

I am really extremely entirely confused about the article altogether. Is the hijacking more or less about Google digging into your site even when your robot.txt crawler robot is refusing google entrance?

Share
twitter facebook
- Re:Robot.txt (Score:5, Informative)
  
  by wizbit ( 122290 ) writes: on Wednesday March 23, 2005 @11:44AM (#12024144)
  
  No, it means Google has indexed a page that appears (to googlebot) to contain something legitimate, and visiting the actual page by clicking the link silently redirects you to an illegitimate site (usually phish/scam copy of same, etc).
  
  Parent Share
  twitter facebook
  - No, it's not about redirecting the user... (Score:5, Informative)
    
    by ites ( 600337 ) writes: on Wednesday March 23, 2005 @12:49PM (#12025096) Journal
    
    It's about pushing unrelated sites up in the rankings.
    
    For instance: I have a site with excellent page ranking. Now a new site will set up, and do a 302 to my site. Google now gives this new site my page ranking. When the new site is indexed, it removes the 302 redirection.
    
    When you search for my site, you now find these new sites instead. There is no redirection when you click on a link, the the "cached text" that Google shows is wrong.
    
    Basically this technique allows people to get high page rankings without earning them. It's very widespread - I counted over 60 such parasites for my company's web site (which has excellent page ranking).
    
    Parent Share
    twitter facebook
- Re:Robot.txt (Score:5, Informative)
  
  by pluggo ( 98988 ) writes: on Wednesday March 23, 2005 @11:46AM (#12024183) Homepage
  
  There was an article a little while back on /. that talked about this exploit.
  
  Site A can return a 302 HTTP redirect to site B when Googlebot crawls their site. The googlebot will then index site B as site A. Site A could have no affiliation whatsoever with Site B; people could be clicking on SesameStreet.com and get AsianHookers.com, etc.
  
  I do think the figure of millions of pages being hijacked is a little steep, though.
  
  Parent Share
  twitter facebook
  - Re:Robot.txt (Score:5, Insightful)
    
    by PornMaster ( 749461 ) writes: on Wednesday March 23, 2005 @11:49AM (#12024219) Homepage
    
    I do think the figure of millions of pages being hijacked is a little steep, though.
    
    Why? It can be completely automated. A million is no harder than four.
    
    Parent Share
    twitter facebook
    - - Re:Robot.txt (Score:5, Informative)
        
        by PornMaster ( 749461 ) writes: on Wednesday March 23, 2005 @12:43PM (#12024992) Homepage
        
        A million may be no harder than four to hijack, but a million dummy sites that would actually fool people is much harder than four.
        
        This isn't about fooling people, it's about fooling a flawed technology to get false listings in the search engine results pages. It's about getting a lot of traffic. Yes, some people will be really pissed off when they get redirected to an affiliate program or something of the sort, but some small percentage of people will buy. If the cost to bring in a million visitors is miniscule because you're stealing search engine placement, and you get 50 people to sign up to something that pays you $50 a person, then you're up $2500 minus your hosting costs.
        
        $2500 to someone in Malaysia is a lot of dough for a little coding... they could work for $200/mo in some kind of outsourcing plan or make a year's wages in their spare time. What do you think they're going to do?
        
        Parent Share
        twitter facebook
  - Re:Robot.txt (Score:5, Funny)
    
    by catalina ( 213767 ) writes: <jmattclark&gmail,com> on Wednesday March 23, 2005 @12:13PM (#12024582) Homepage Journal
    
    .....and get AsianHookers.com, etc.
    
    couldn't you have made that a link so I can just click on it?
    
    Parent Share
    twitter facebook
  - Re:Robot.txt (Score:2)
    
    by bill_mcgonigle ( 4333 ) * writes:
    
    Site A can return a 302 HTTP redirect to site B when Googlebot crawls their site. The googlebot will then index site B as site A. Site A could have no affiliation whatsoever with Site B; people could be clicking on SesameStreet.com and get AsianHookers.com, etc.
    
    Isn't the fix then to provide preference to the real URL over 'copies' when culling duplicate data and/or pageranking the results? This seems easy, so the problem must be that Google isn't storing HTTP response codes with their page indexes such
    - Re:Robot.txt (Score:5, Informative)
      
      by arkanes ( 521690 ) writes: <arkanes@gm[ ].com ['ail' in gap]> on Wednesday March 23, 2005 @12:25PM (#12024763) Homepage
      
      One problem is that people use 302s when they should be using 301s, like directory sites. No doubt this is because they want to get referral counts up.
      A 302 is a "temporary redirect". Basically, it says that the content normally lives at the URL you requested but that, just this once, you should look at this other URL for the content. Googles response to a 302 is actually very reasonable. I suppose the best thing they could do is just not follow 302s.
      A 301 is a permanent redirect, indicating that the page isn't at the original URL and that all future requests should be made to the new one. I don't know what Googlebot does in this case but I assume it discards the original URL, which is what the standard recommends.
      
      Parent Share
      twitter facebook
  - - Re:Robot.txt (Score:2)
      
      by mopslik ( 688435 ) writes:
      
      So if some phisher has access to put a redirect on sesamestreet.com, he could simply upload the content of asianhookers.com
      
      My understanding is that it doesn't work this way at all. I believe what happens is that the hijacker sets up a page/site that redirects to your own site. Google then crawls the link, and erroneously indexes the content from your page with the URL of the redirecting page. From there, it's trivial to change the redirect on the fake page to someplace else, and maintain the appearan
      - Re:Robot.txt (Score:5, Informative)
        
        by ReverendLoki ( 663861 ) writes: on Wednesday March 23, 2005 @01:54PM (#12025890)
        
        The key is that they are using a 302 redirect, which is used to signify that the redirect is temporary only. In a completely honest and trustworthy Internet, this is used to indicate that for whatever reason (HW failure, slashdotting, etc), the requested pages were temporarily unavailable on the main site and were being hosted elsewhere until the issue can be resolved. This is telling Google et al that the content being redirected to (Sesame Street, for example) is normally hosted on the redirecting site (Asianhookers). From then on, whenever Google returns the result of the Sesame Street pae, it is listed with the URL pointing to the Asianhookers page. It does this under the assumption that once the issue requiring the redirect is resolved, people will want to go to the "original" page, and will still be redirected to the content in the meantime.
        Aside from a filter on Google's end to resolve this, it would be nice if the practice of using 302 redirects also included a means of confirmation of the setup on the site being redirected to. If the site actually hosting the data does not in some way confirm the redirection, either through a tag in the header of the html, or perhaps in a third, predictably place file (much like a robots.txt file). Of course, this would first require te standard to be rewritten, and then would require people to actually abide by it.
        
        Parent Share
        twitter facebook
    - Re:Robot.txt (Score:3, Informative)
      
      by AssHatAnonymous ( 869725 ) writes:
      
      No, it's the other way around. Someone has access to asianhookers.com and redirects to sesamestreet.com. When googlebot then correlates asianhookers.com with sesamestreet.com and depending on some unknown formula decides which domain is the actual owner of the page. So that if the formula decides that asianhookers.com "owns" the pages on sesamestreet.com (because of the redirect) then when google is building their links they print, in the text of the page, sesamestreet.com, but in the html of the page they
    - - Re:RTFA (Score:5, Insightful)
        
        by Zeinfeld ( 263942 ) writes: on Wednesday March 23, 2005 @12:27PM (#12024794) Homepage
        
        Read the fucking article - you don't have to have any access to the victim site to do this - you only need to have a higher pagerank than them.
        The article is confused and baddly written. It does not explain the exploit being used ever. So stop dumping on people. It is not at all surprising that people don't get what is going on when the description is crud.
        What is really going on has nothing to do with 302, or at least very little. What these people are doing is to set up fake web sites using content filched from genuine Web sites. This allows (or is beleived to allow) them to climb the google rankings.
        I don't see why someone would use a 302 response when they can just copy the entire content unless there is some sort of bug in Google's pagerank that is not being explained. Copying the entire content is much simpler.
        So what the attacker does is to set up their site so that when the googlebot comes round it publishes some legitimate content, then when other folk follow the site from a google search they get pages infested with spyware or the like.
        This would certainly explain the number of times I have done a Google search and ended up at an idiotic 'search site' that does nothing for me.
        
        Parent Share
        twitter facebook
        
        Re:RTFA (Score:5, Informative)
        
        by mla_anderson ( 578539 ) writes: on Wednesday March 23, 2005 @01:09PM (#12025347) Homepage
        
        No, the way it works is with the 302, but only for the googlebot.
        
        Googlebot goes to scammer's site
        
        Googlebot is given a 302 (redirect) to the victim's site
        
        Googlebot indexes the victim's site as belonging to the original URL
        
        Googlebot goes to the victim's site
        
        Googlebot realizes this URL is already indexed and "belongs" (according to the Google code) to the scammer.
        
        The victim's site get's lower rankings as the page is not even indexed, the scammer's site gets a higher ranking.
        
        For this to work the scammer has to give the 302 only to the googlebot, all other browsers need to get the content of the scammer's page. If you google for "cheapest car insurance" (IIRC) you can find an example of this. Change your User Agent accordingly and click on the top Google link, you'll end up at another site. Change back to Mozilla and you'll get the scammer's site.
        
        Parent Share
        twitter facebook
        
        Re:RTFA (Score:3, Interesting)
        
        by Zeinfeld ( 263942 ) writes:
        
        My apologies, but the details of this exploit were linked-to in a previous article as well as this one, and you can't move for explanations of how it works.
        If I find both articles confused and confusing then it is a bit much to expect other people to follow them, I am listed as an original contributor to the design of HTTP.
        The real problem here is not the 302, its a bug in the googlebot. fortunately a realtively easy one to fix. When googlebot sees a 302 redirect to a page it treats the actual page and
- Re:Robot.txt (Score:5, Informative)
  
  by northcat ( 827059 ) writes: on Wednesday March 23, 2005 @01:48PM (#12025827) Journal
  
  This is more like one site hijacking the ranking of another site. Suppose you're Ferrari and I'm the hijacker. You have ferrari.com and I have irule.com. Since you're ferrari.com you get very high rankings when people search for "ferrari" on Google. You're probably the first site displayed. And in the results page on Google, it displays a summary probably like "the official home page of ferrari cars". On my website I set up a 302 redirect to your website. It means, when someone visits my irule.com, they get redirected to ferrari.com. I don't do anything to your website, I don't have access to your website. I hope you know that Google indexes web pages by visiting those webpages with the user agent string "googlebot" and, of course, Google's IPs which are known to people. When Google sees that my page is 302 redirecting to ferrari.com, for certain reasons, it replaces ferrari.com in its index with irule.com. So when someone searches for "ferrari" the get irule.com as the first result instead of ferrari.com, and the summary still says "the official home page of ferrari cars". Now, I only 302 redirect irule.com to ferrari.com when googlebot visits my page. When anyone else visits irule.com, I give them something else, probably lots of ads, or I redirect them to some other site like LotsOfSmut.com. So I'm "hijacking" any references to ferrari.com on Google and its ranking. And when someone searches for "cars", instead of ferrari.com as the ninth result, irule.com is displayed. So... I profit (you do the math).
  
  (Sorry for dumbing down my post so much, too much experience explaining things to my grand mother)
  
  Parent Share
  twitter facebook
I've had it with Google! (Score:5, Funny)

by Trolling4Columbine ( 679367 ) writes: on Wednesday March 23, 2005 @11:41AM (#12024080)

This is the last straw! I'm going back to MSN, where I know that my data and privacy are being protected!!

*duck*

Share
twitter facebook
- Google Cookie last until 2038! (Score:2)
  
  by NoSuchGuy ( 308510 ) writes:
  
  You are right, MSN only sets cookies that last the lifetime of their current OS.
  - Re:Google Cookie last until 2038! (Score:5, Funny)
    
    by Oxy the moron ( 770724 ) writes: on Wednesday March 23, 2005 @12:11PM (#12024544)
    
    Considering the timespan between Windows re-formats/re-installations, that isn't really all that unreasonable...
    
    Parent Share
    twitter facebook
- Re:I've had it with Google! (Score:2, Informative)
  
  by 33degrees ( 683256 ) writes:
  
  I know you were trying to make a joke, but if you'd RTFA you would know that MSN is as susceptible to this as Google is. Only Yahoo has addressed the issue.
- - How and when Yahoo fixed it (Score:3, Informative)
    
    by clsc ( 730336 ) writes:
    
    Sorry for not writing this in the article - it's pretty long already and you just have to cut somewhere, but here goes:
    
    Yahoo was exactly as vulnerable as the rest of the search engines. In fact this problem was pretty bad with Yahoo at one point. What Yahoo did was simply to fix it by implementing some internal rules about how to interpret redirects.
    I believe it was fixed around June 2004 - at that time the problem had already been known (and aboused) for a long time, but use was not widespread yet. The
Easy to prosecute, hmmm? (Score:5, Interesting)

by r00t ( 33219 ) writes: on Wednesday March 23, 2005 @11:41AM (#12024090) Journal

Google has the records, and probably the original
site exists with behavior dependent on browser name
being GoogleBot or not. The replacement site will
generally have some way of making money, which can
be tracked via financial transactions.

Share
twitter facebook
- Re:Easy to prosecute, hmmm? (Score:5, Insightful)
  
  by jridley ( 9305 ) writes: on Wednesday March 23, 2005 @11:59AM (#12024379)
  
  Prosecute for what? Is there a law against redirecting web pages? I think this would be a pretty difficult prosecution. Google's going to have to take technical steps on this one.
  
  Parent Share
  twitter facebook
  - - Re:fraud, copyright, phishing, decency laws (Score:3, Insightful)
      
      by gl4ss ( 559668 ) writes:
      
      and that prosecuter has to get pretty imaginative to get jurisdiction over the people in some countries.
      
      prosecution can't fix this problem.
Law of the Internet (Score:5, Insightful)

by Cytlid ( 95255 ) writes: on Wednesday March 23, 2005 @11:43AM (#12024124)

For every Good Thing, there are at least 100 different ways to abuse it.

Share
twitter facebook
302 (Score:5, Informative)

by auralrothko ( 836578 ) writes: on Wednesday March 23, 2005 @11:43AM (#12024135)

I wasn't sure what a 302 hijack was, so here's the obligatory lowdown for those who didn't rtfa (from article linked page) This exploit allows any webmaster to have his own "virtual pages" rank for terms that pages belonging to another webmaster used to rank for. Successfully employed, this technique will allow the offending webmaster ("the hijacker") to displace the pages of the "target" in the Search Engine Results Pages ("SERPS"), and hence (a) cause search engine traffic to the target website to vanish, and/or (b) further redirect traffic to any other page of choice.

Share
twitter facebook
- Re:302 (Score:5, Informative)
  
  by SassyDave ( 557868 ) writes: on Wednesday March 23, 2005 @11:52AM (#12024264) Homepage
  
  For the full details of the exploit, TFA [clsc.net] gives a pretty decent recipe:
  
  The technical part: How it is done Here is the full recipe with every step outlined. It's extremely simplified to benefit non-tech readers, and hence not 100% accurate in the finer details, but even though I really have tried to keep it simple you may want to read it twice: 1. Googlebot (the "web spider" that Google uses to harvest pages) visits a page with a redirect script. In this example it is a link that redirects to another page using a click tracker script, but it need not be so. That page is the "hijacking" page, or "offending" page. 2. This click tracker script issues a server response code "302 Found" when the link is clicked. This response code is the important part; it does not need to be caused by a click tracker script. Most webmaster tools use this response code per default, as it is standard in both ASP and PHP. 3. Googlebot indexes the content and makes a list of the links on the hijacker page (including one or more links that are really a redirect script) 4. All the links on the hijacker page are sent to a database for storage until another Googlebot is ready to spider them. At this point the connection breaks between your site and the hijacker page, so you (as webmaster) can do nothing about the following: 5. Some other Googlebot tries one of these links - this one happens to be the redirect script (Google has thousands of spiders, all are called "Googlebot") 6. It receives a "302 Found" status code and goes "yummy, here's a nice new page for me" 7. It then receives a "Location: www.your-domain.tld" header and hurries to your page to get the content. 8. It heads straight to your page without telling your server on what page it found the link it used to get there (as, obviously, it doesn't know - another Googlebot fetched it) 9. It has the URL of the redirect script (which is the link it was given, not the page that link was on), so now it indexes your content as belonging to that URL. 10. It deliberately chooses to keep the redirect URL, as the redirect script has just told it that the new location (That is: The target URL, or your web page) is just a temporary location for the content. That's what 302 means: Temporary location for content. 11. Bingo, a brand new page is created (never mind that it does not exist IRL, to Googlebot it does) 12. Some other Googlebot finds your page at your right URL and indexes it. 13. When both pages arrive at the reception of the "index" they are spotted by the "duplicate filter" as it is discovered that they are identical. 14. The "duplicate filter" doesn't know that one of these pages is not a page but just a link (to a script). It has two URLs and identical content, so this is a piece of cake: Let the best page win. The other disappears. 15. Optional: For mischievous webmasters only: For any other visitor than "Googlebot", make the redirect script point to any other page free of choice.
  
  Parent Share
  twitter facebook
  - Re:302 (Score:2)
    
    by Qzukk ( 229616 ) writes:
    
    So let me get this straight... If I have www.crappywebsite.com and I want to pump up its pagerank, all I need to do is have "www.crappywebsite.com" redirect googlebot to www.cnn.com, and suddenly www.crappywebsite.com is a font of highly-ranked information?
    
    The REAL answer would be to have google not index redirects (which is pretty stupid, all things considered. Why link searchers to the "wrong" URL, instead of the destination URL of the redirect?)
  - Re-re-explained (Score:5, Informative)
    
    by fizbin ( 2046 ) writes: <martinNO@SPAMsnowplow.org> on Wednesday March 23, 2005 @01:03PM (#12025291) Homepage
    
    Okay, so basically this is the problem: when Google encounters a status 302 redirection (as opposed to the status 301 redirection) it then indexes the content as belonging to the initial URL, not the URL at the end result of the 302 redirection. Other things happen later because of google's design.
    
    302 redirections are temporary redirections - the idea is that a 302 is supposed to be used when someone needs to be redirected to a new page, but should still use the original URL if they want to come back later. As an example, the page http://purl.oclc.org/OCLC/PURL/CONTRIBUTORS [oclc.org] performs a 302 redirect to http://purl.oclc.org/docs/contributors.html [oclc.org]. This means that although your web browser needs to go to some other URL for the content at the moment, they really should remember the first url as the permanent one.
    
    Contrast this with what happens when your browser visits http://snowplow.org/martin [snowplow.org] - you get sent a 301 redirect to http://snowplow.org/martin/ [snowplow.org]. (Note the extra slash) In this case, the server is saying "the url with the slash on the end is the real location, and you should not try to come back here without the final slash in the future."
    
    Ideally, if every web browser behaved according to spec., bookmarks (remember bookmarks?) would get automatically updated to the new URL when you selected them and the redirect was a 301 redirect. However, for a 302 redirect, the bookmark would stay as is.
    
    302 redirects can be very useful when you want to set up a hierarchy of "logical" URLs that will permanently point to the correct location. 301 redirects are useful when you're obsoleting an old URL and wish people to go and use the new URL from now on.
    
    Okay, so how does this relate to google? Well, let's suppose that you have a great site on fruitbats. I can set up http://www.example.com/topics/fruitbats to be a 302-style redirect to your site, essentially saying "The information at http://www.example.com/topics/fruitbats is temporarily being hosted by http://www.yoursite.com/". Now, google when it spiders pages will see that, will go retrieve the text from your page and will then index it under http://www.example.com/topics/fruitbat, since after all I just gave a temporary (302) redirect.
    
    But it gets worse, because a final part of google's indexing process is to compare pages for identical text, and throw out all but one of the URLs. Apparently this stage has nothing to go on other than the text and the recorded URLs, and so your URL stands a fifty-fifty chance of being thrown out.
    
    Except that I've not just redirected http://www.example.com/topics/fruitbats to your site, but also http://www.example.com/topics/fruitbat, http://www.example.com/topics/fruit_bat, and http://www.example.com/topics/fruit_bats. Now your lone URL doesn't stand much of a chance of being the one kept by the "throw out duplicates" processor, does it?
    
    In a sense, of course, there's little google can do to prevent this, because even if they weighted 302-redirects lower in their "throw out duplicates" stage, I could always just go snag a copy of your website each time googlebot visits, in essence doing the redirection myself. (How? Just search the apache mod_rewrite guide [apache.org] for "Dynamic Mirror") However, doing it through 302 redircts means that google pays for the bandwidth to go get your page, not me. (Not that this is necessarily a signficant amount of bandwidth, since we're only talking about basic google here and not images. Depending on the revenue you get by misdirecting google queries it might be economical)
    
    Of course, for this to really work, I'd need a list of websites sorted by category to build up my redirect db. But wait! The ODP feed provides exactly that.
    
    I am a little bit wary of doi
    Read the rest of this comment...
    
    Parent Share
    twitter facebook
    - - Re:Re-re-explained (Score:3, Insightful)
        
        by yulek ( 202118 ) writes:
        
        If, for example, I use redirects to distribute traffic between multiple servers on multiple hosts, the GoogleBot's behaviour of treating the redirecting host as the website's canonical host is correct. I want users to use the referring host so that I can change physical hosts with impunity.
        
        well, a bunch of people have suggested that 302s should only be honored by crawlers if the domain is the same. i think that's a pretty good idea.
        
        It's not Google that's broken--it's the web. It's just that the two-leg
  - Re:302 (Score:5, Interesting)
    
    by Ryan Stortz ( 598060 ) writes: <ryan0rz.gmail@com> on Wednesday March 23, 2005 @02:33PM (#12026444)
    
    I think a resonable solution to this would be for Google to send a second spider to the site for every 302 Redirect they find, with a user-agent indicating its IE or any other browser. Then compare the data.
    
    Although, they could probably still figure out it's google by their IP, but it's a step in the right direction.
    
    Parent Share
    twitter facebook
- Re:302 (Score:2, Interesting)
  
  by ari_j ( 90255 ) writes:
  
  I'm still not seeing any explanation of how it works, only what happens when it does work.
  - Re:302 (Score:5, Informative)
    
    by StrongAxe ( 713301 ) writes: on Wednesday March 23, 2005 @12:19PM (#12024665)
    
    I'm still not seeing any explanation of how it works, only what happens when it does work. 1. Phisher creates (say) cïtïcorp.com and makes the home page redirect to the real citicorp.com page. 2. Googlebot browses cïtïcorp.com and gets a redirect to the real citicorp.com, and indexes its contents 3. User does a Google search looking for Citicorp, and finds cïtïcorp.com page that appears to contain the valid data (and it might be the only such page, if the legitimate page gets removed through the duplicate-removal process) 4. User clicks through to cïtïcorp.com expecting to see the valid web page 5. Phisher's server sees that the request is not from a Googlebot, so it serves up a fake page rather than redirecting to the legitimate real one. 6. User believes he is at the real citicorp.com web site, when he is in fact at the bogus cïtïcorp.com website, legitimized by Google. 7. Identity theft. 8. Profit. (OB. Slashdot joke.)
    
    Parent Share
    twitter facebook
    - Re:302 (Score:3, Insightful)
      
      by ari_j ( 90255 ) writes:
      
      Thanks. And remember, identitiy theft is not a joke, unless you steal the identity of a clown.
- Re:302 (Score:2, Informative)
  
  by windowpain ( 211052 ) writes:
  
  Thanks. Both the /. article and the linked story were utterly uninformative. Sometimes it seems that a lot techies disdain even the merest explanation as baby talk. Even when you're addressing a largely technical audience a little explanation helps because not everybody knows every technical detail about an entire field.
- But what's the point? (Score:2)
  
  by hawk ( 1151 ) writes:
  
  To continue having the victim's hits redirected, the redirect needs to stay in place, doesn't it?
  
  What in the world does the hijacker gain by having google point him, only to then load the victim's page?
  
  hawk
  - Re:But what's the point? (Score:4, Informative)
    
    by micromoog ( 206608 ) writes: on Wednesday March 23, 2005 @12:28PM (#12024802)
    
    The hijacker's script watches to see who's coming. If it's googlebot, redirect. If it's an actual user, do [insidious thing].
    
    Parent Share
    twitter facebook
301 redirects (Score:3, Interesting)

by Anonymous Coward writes: on Wednesday March 23, 2005 @11:45AM (#12024158)

A few months ago, I rearranged my website. To make sure people could still find things, I put 301 redirects on all the old pages that I moved.

I noticed in my logs that search engines have repeatedly requested the 301 pages, but often don't follow the links to the new pages. And when searched with google, the pages still show up with the old urls. Should I be using 302 redirects instead?

Share
twitter facebook
- - Wrong (Score:5, Informative)
    
    by PornMaster ( 749461 ) writes: on Wednesday March 23, 2005 @12:09PM (#12024526) Homepage
    
    301 is a permanent redirect, 302 temporary.
    
    This is why the "302 hack" works. If the redirect is only supposed to be temporary, the search engine keeps the URL of the 302 as the URL for the document, but indexes the content of the page to which the redirect is directed.
    
    301 is what you should be using to point the SEs to your new pages if you've moved them. The behavior is supposed to be for the SEs to replace the old URL in their index with the new one, and furthermore count all links to the 301ed URL as being towards the new one. I don't know why it's not working for the grandparent poster, but it's the way that the functionality is "advertised" for Google and Yahoo, and it should work.
    
    Parent Share
    twitter facebook
Why? (Score:2, Insightful)

by dep01 ( 730107 ) writes:

Why is it seemingly man's mission to "bring down" something that seems to provide such a great service for everyone?
"Oh! Look! Something beautiful! Something impressive! I must destroy it!"
pah. feeling jaded today, i guess.
- Re:Why? (Score:2, Insightful)
  
  by a16 ( 783096 ) writes:
  
  In this case, it's more a case of "I must make money from it".
  
  The people using this exploit to get fake listings (just like all of the spam pages we see in search engines) aren't doing it for the fun of it.
- Re:Why? (Score:2)
  
  by xsbellx ( 94649 ) writes:
  
  Well, the obvious answer can be parphrased from Dune, "The ultimate control of something is the ability to destroy it". The more subtile answer deals with our species desire for "more".
  
  In a far off time, the Internet was a wonderfull place devoid of such mundane things as commerce. Now, fastforwarding a few years to the present, people are making significant sums of money off of the internet selling "products". One of the best to get somebody to buy something is to make them aware of a "need" they have
Do what I'm going to do... (Score:4, Insightful)

by Not_Wiggins ( 686627 ) writes: on Wednesday March 23, 2005 @11:46AM (#12024181) Journal

buy GOOG on the dip as many non-techie investors panic sell. 8)

Share
twitter facebook
- Re:Do what I'm going to do... (Score:3, Funny)
  
  by ceejayoz ( 567949 ) writes:
  
  Yeah, 'cause the non-techie investors read Slashdot...
- Re:Do what I'm going to do... (Score:2)
  
  by 2short ( 466733 ) writes:
  
  Right, as long as Google is priced right now, and not insanely overblown. I don't have any idea what their stock (or more to the point their price/earnings) is at; but I know which way I'd bet.
  
  Free investment tip: Avoid buying stock in any company if an unsophisticated investor, for reasons unrelated to profitability, would think that company is Way Cool.
  
  It appears Google has a sound business plan and competent management. Which probably justifies some particular, perfectly healthy stock price. But I'
Web presence pressure (Score:5, Insightful)

by gitana ( 756955 ) writes: on Wednesday March 23, 2005 @11:47AM (#12024195) Homepage

As web presence -defined as within about the first 10-20 results of a search- becomes more and more important to "success," black hat techniques such as this, to eliminate competitors, will become more and more common. Google, or any other search tool needs to be able to stay above the fray and not be subject to hacks such as this.

Share
twitter facebook
- Re:Web presence pressure (Score:3, Insightful)
  
  by filmmaker ( 850359 ) * writes:
  
  Exactly. And if they'd just stop giving PageRank credit to the redirect destination, it'd all be over. In fact, the algorithm should check to see what the link density is between to disparate domains if it's going to even cache 302'ed content. Because in these scam cases, the perpetrator never has an inbound link from the victim domain and Google could "grade" this relationship as being very one-sided and not generally very trustworthy. The more interlinkages, the more trust. But assigning Pagerank on 302's
Gopher (Score:5, Funny)

by one_i_blind ( 613513 ) writes: on Wednesday March 23, 2005 @11:48AM (#12024199) Homepage

This is why Gopher will always be better than your feable world wide web junk.

Share
twitter facebook
- Re:Gopher (Score:5, Funny)
  
  by ari_j ( 90255 ) writes: on Wednesday March 23, 2005 @11:51AM (#12024236)
  
  Dude - the single biggest difference between Gopher and the web is that Gopher contains far fewer spelling errors. I hear that there are differences regarding interactivity, graphics, layout, and so forth; but those are all immaterial.
  
  Parent Share
  twitter facebook
- Re:Gopher (Score:2)
  
  by ajs ( 35943 ) writes:
  
  Gopher is part of the World Wide Web, as are several other protocols that pre-date the Web. You meant to say, "This is why Gopher will always be better than an HTTP server."
  
  The World Wide Web is the meta-index of (mostly) Internet-accessible content which can be addressed by URI (almost always more specifically by URL).
  
  Since Gopher can be addressed via the URI scheme, "gopher", it's part of the Web.
- - Re:Gopher (Score:3, Interesting)
    
    by ari_j ( 90255 ) writes:
    
    IE doesn't support gopher:// URLs any longer, so assume that demand for Gopher would drive market share of Firefox et al. The problem is driving the demand for Gopher when IE doesn't support it.
    - Re:Gopher (Score:2)
      
      by John Hasler ( 414242 ) writes:
      
      > IE doesn't support gopher:// URLs any longer...
      
      An excellent reason to use Gopher.
Wait... (Score:5, Funny)

by dark-br ( 473115 ) writes: on Wednesday March 23, 2005 @11:51AM (#12024235) Homepage

Damn Google!!! Do you mean this is not www.kuro5hin.org ??

Share
twitter facebook
The super-slashdotting (Score:5, Funny)

by kunkie ( 859716 ) writes: on Wednesday March 23, 2005 @11:54AM (#12024304)

I can imagine it now... The slashdotting to end all slashdots. If every site in google was 302 redirected to RIAA.com How amazing would that be...

Share
twitter facebook
- Re:The super-slashdotting (Score:3, Insightful)
  
  by Nessak ( 9218 ) writes:
  
  I think that is the RIAA wet dream -- to have every web page point to it. Don't they belive the only way to save music is to kill the web?
How to check if your site is being hijacked... (Score:5, Informative)

by ites ( 600337 ) writes: on Wednesday March 23, 2005 @11:59AM (#12024370) Journal

1. search Google [slashdot.org] for 'allinurl:', e.g. 'allinurl:slashdot.org'.

2. copy and paste any dubious URLS into this tool [thinkhost.com] and check whether they're using 302 redirects or not.

3. Panic! /me notices that my company's web site has been thusly hijacked... and yes! Doing a Google search on the main text on my company's web site shows dozens of unrelated sites high in the ranking. None of these actually have the text on their pages.

One example: http://www.tradedoubler.it.

Luckily, the phrase in question is complete gibberish and no-one ever finds our site through Google, only by reputation and word of mouth.

Still, I think it's clear Google have a serious problem here...

Share
twitter facebook
- And how to report this to Google... (Score:3, Interesting)
  
  by ites ( 600337 ) writes:
  
  Email to webmaster@google.com with the keyword "canonicalpage".
  
  Google are not taking this problem seriously.
  
  I'd suggest that if your website is affected, you send an email as above.
Good explanation about 302 hijacking (Score:5, Informative)

by angio ( 33504 ) writes: on Wednesday March 23, 2005 @12:02PM (#12024408) Homepage

Someone posted a nice explanation of the phenomenon [webmasterworld.com] at webmasterworld.com.
302 hijacks work because Google goes to http://bad.site/ and gets redirected to http://good.site/. It then treats the contents of the bad.site as identical to that of good.site. The effect seems similar to if somebody simply copied an entire page off of your site (I'm not sure if it's actually more serious than this), but it's easier to do because you're just keeping a small table of redirections.
How serious is it? Don't know. It's pretty easy for a webmaster to check for hijacking and have her pages de-hijacked (see aforementioned article). It's probably not as screamingly awful as the threadwatch.org article suggests, but the redirector sites are rather annoying. Several of the comments in the webmaster article suggest that Google has already started moving on the problem.

Share
twitter facebook
- Re:Good explanation about 302 hijacking (Score:3, Informative)
  
  by Col. Klink (retired) ( 11632 ) writes:
  
  > The effect seems similar to if somebody simply copied an entire page off of your site (I'm not sure if it's actually more serious than this), but it's easier to do because you're just keeping a small table of redirections.
  
  The key here is that only googlebot is redirected. If you simply copied someone else's site, everyone would still get the info they were looking for. However, if you only redirect googlebot, you can redirect others to whatever you want.
- Comment removed (Score:4, Informative)
  
  by account_deleted ( 4530225 ) writes: on Wednesday March 23, 2005 @01:23PM (#12025517)
  
  Comment removed based on user account deletion
  
  Parent Share
  twitter facebook
Not a surprise (Score:5, Interesting)

by faust2097 ( 137829 ) writes: on Wednesday March 23, 2005 @12:04PM (#12024439)

For at least the last 18-24 months it's been increasingly difficult to find non-spam/redirect/affiliate program links for a search on any popular consumer product on Google. Maybe they have too much faith in their current PageRank and think it needs to be tweaked instead of overhauled. Maybe they think they have enough momentum and don't care. They certainly should have the talent and resources to do something about this and it's kind of sad that they haven't. I predict we'll see another whizzy side project in a few months instead.

The thing is that all they have to do is keep it just good enough that people won't leave. Remember, AdWords is Google's product, everything else [gmail, orkut, etc] they've got is just a way to show you those ads. Google's success is entirely because they had clearly better search results than anyone else. If another company can clearly best them then Google may be in trouble.

Share
twitter facebook
- Re:Not a surprise (Score:5, Insightful)
  
  by GoogleGuy ( 754053 ) * writes: on Wednesday March 23, 2005 @12:36PM (#12024895) Homepage
  
  Hey, if you've run across spammy sites, have you filled out a spam report and used the keyword slashdot? I mentioned in a earlier comment from a different story [slashdot.org] that you can do this. We got eight reports last time, and the responses are on their way. We do check that data to look for new tricks that spammers are trying.
  
  Parent Share
  twitter facebook
Bleh... (Score:4, Funny)

by Patrick Mannion ( 782290 ) writes: <patrick.mannion@gma[ ]com ['il.' in gap]> on Wednesday March 23, 2005 @12:10PM (#12024542) Homepage Journal

I was thinking that some major crisis had broken out and a million pages were hijacked at once creating something bigger than any other Internet event other, and it caused Google's stock to tank and force to them go private again, lay off workers and go bankrupt. But that's crazy. But still, word it right. Damn it.

Share
twitter facebook
My site is affected (Score:5, Interesting)

by barcodez ( 580516 ) writes: on Wednesday March 23, 2005 @12:11PM (#12024559)

My site the humor archives [thehumorarchives.com] has been affected by this. I can tell because if you do the following search [google.co.uk] you can see a bunch of sites that are/were 302ing to my domain. I'm pretty pissed off and I seriously hope Google act soon to rectify the matter.

Share
twitter facebook
- - Re:My site is affected (Score:4, Informative)
    
    by GoogleGuy ( 754053 ) * writes: on Wednesday March 23, 2005 @02:46PM (#12026598) Homepage
    
    Yeah, this is a common misconception. allinurl: and its sister operator inurl: look for terms matching in the url. For a search like [allinurl:thehumorarchives.com], a result like www.stumbleupon.com/url/www.thehumorarchives.com/f orums/ is a fine result, and doesn't have anything to do with this.
    
    Parent Share
    twitter facebook
From the Google "Information for Webmasters" (Score:5, Informative)

by YouMakeMeSoANGRY ( 641079 ) writes: on Wednesday March 23, 2005 @12:12PM (#12024568)

Google claim [google.com]...

Fiction:A competitor can ruin a site's ranking somehow or have another site removed from Google's index.
Fact:There is almost nothing a competitor can do to harm your ranking or have your site removed from our index. Your rank and your inclusion are dependent on factors under your control as a webmaster, including content choices and site design.

How about adding "Fiction: Google information for webmasters contains any facts"?

Share
twitter facebook
pure FUD the submitter is a spammer (Score:4, Informative)

by Anonymous Coward writes: on Wednesday March 23, 2005 @12:20PM (#12024681)

what major headlines ? millions of pages !! the world is coming to an end !!!!

a quick whois on threadwatch.org (the submitters site) reveals its hosted by search engine spammers
platinax.co.uk which is registed to a UK "company" called BriteCorp
http://www.britecorp.co.uk/ [britecorp.co.uk]

who offer all the usual SE spamming methods
coincidence ?
a whois on britecorp's platinex [platinax.co.uk] site reveals they have removed their address from the whois db, and their websites contact details are a mobile phone number (07963 808470)
further investigation on britecorp reveals they are not a "real" company but trading as "Brian Turner" (pic [platinax.co.uk]) and companies house [companieshouse.gov.uk] dont seem to have any records of any of these companies, though iam sure further investigation could find out more

so why would a supposedly reputable marketing company have a cell phone as a primary contact point ?
something to hide egh ?
or perhaps local trading standards would like to hear about them and their "services" ?

northern scum by any other name

Share
twitter facebook
- Absolute hilarity (Score:4, Informative)
  
  by brian_turner ( 851711 ) writes: on Wednesday March 23, 2005 @05:53PM (#12028900)
  
  Absolutely Roflmao!!
  
  I guess some people have never heard of the term "sole trader". :)
  
  My internet business is barely a year old - almost everything is communicated with other webmasters via e-mail - phone support is provided as a last option, but it means that if anyone really needs to use it, then they can have my immediate attention wherever I am, to have their concerns addressed immediately. :)
  
  As for spamming - well, this is one of those "anonymous cowards" some of us are familiar with, who believes that if you purchase a link from another site, or become involved in a link exchange, or register your site in a directory - then you're a spammer. :)
  
  Thanks for the heads up on the Platinax registration details, though - hadn't realised they'd been left out. I had a run in with some Belgian Nazis last year, after I booted them from a forum I admin, when they tried to use it for promoting Neo-nazi propaganda. They've tried a few times to get back at me since, so I've been trying to reclaim some privacy online. Platinax reg details should be public, though - I'll put something online, then try and fine a PO Box for the hate crap.
  
  Parent Share
  twitter facebook
Search engines should devalue redirects (Score:5, Insightful)

by Animats ( 122034 ) writes: on Wednesday March 23, 2005 @12:24PM (#12024734) Homepage

Redirects to a page should be treated as having far less PageRank value than the page itself. That will fix the problem.
It will also break many "click trackers", "portals", "directory sites", "search engine optimizers", and other annoyances, which is probably a plus for Google users. You know, those sites where you click on some phrase in Google and, three redirects later, you're at some irrelevant porno site.

Share
twitter facebook
Doesn't seem like the end of the world (Score:3, Insightful)

by Hornsby ( 63501 ) writes: on Wednesday March 23, 2005 @12:26PM (#12024767) Homepage

Why not just fix the bug and then recreate the rankings index? Googlebot hits my sites all the time, so I know that it covers the rest of the internet quite often as well. With their amount of hardware, it probably wouldn't take long.

Share
twitter facebook
treat redirects as one-link pages (Score:3, Insightful)

by wotevah ( 620758 ) writes: on Wednesday March 23, 2005 @12:28PM (#12024804) Journal

It seems that when page A redirects to B, Google not only considers that a hit for A, but also assigns B's content to A (I just skimmed through all the posts here so maybe that's not what happens).

In that case, it seems to make more sense to just ignore A altogether since the hit and content rightfully belong to B.

This could be done by treating redirects as empty one-link pages, thus unifying the handlers and defeating this practice.

Share
twitter facebook
Why This is Such a Big Deal (A Summary) (Score:5, Informative)

by Anonymous Coward writes: on Wednesday March 23, 2005 @12:57PM (#12025201)

This was originally posted the first time a story about this ran, but since a lot of people are still confused, here it is again...

There seems to be a lot of confusion as to why exactly this is such a big deal. A lot of people saying there's no problem or that this is nothing new... basically just not understanding the issue. Let me explain:

Suppose you have a small business under the domain http://xyz.com/, and search engines bring you a lot of traffic because you rank high for keywords in your market. You have a lot of people out there linking to you, a lot of satisfied customers, good content on your site. You're always in the top 10 somewhere when people search for "xyz widgets".

Well, this issue with Google makes it very easy -- incredibly easy -- for someone to knock your site out of the rankings entirely. And I mean for *everything*, to where searching for your own company name in quotes literally buries you hundreds of pages deep in the results. We're talking sites going from getting 1000 unique hits to 10 overnight.

And here's the kicker: It requires absolutely no technical knowledge, no time investment, and is perfectly legal...

All I have to do is have another domain handy that is roughly as popular as yours. And I make a "links" page, like one of those directory services, that lists your website. But instead of being a normal hyperlink, it's a CGI (or PHP or ASP or whatever) script that generates a 302 redirect to your domain... Now, these are very simple, common scripts. One-liners that you can download from cgiscripts.com and stick on your server. The original intent of these scripts is to track which links are being clicked on your site. But now they've found a new use, because when Google gets that 302, all hell breaks loose.

See, according to the HTTP spec, 302 is a *temporary* redirect, which means Google is supposed to interpret whatever content it finds at the 302 target (your site) as really belonging to the URL of the source (my site). Google is just obeying the spec strictly here, and with devestating results. Why? BECAUSE THE DUPE FILTER NOW KICKS IN! You see, Google has a "dupe filter" that says if the same exact content is found for two unique URLs, then one of the URLs is obliterated in the rankings. Because after all, searchers don't want to be finding the same content over and over. If that happens, they'll start using a different search engine. But Google, sticking strictly to the HTTP spec, doesn't know who the content really belongs to when it gets a 302.

So Google essentially flips a coin. And if it comes up tails, say bye-bye to your domain in the rankings. Your *entire* domain. Because the dupe filter isn't limited to just the page that the 302 is pointing to -- it applies across your entire domain.

These 302 "exit-link-trackers" are all over the web. They've been used by webmasters for years. But it's just recently that Google has started treating 302 this way, so it didn't have any bad effect before. But now it kills you.

The funny thing is, the solution seems pretty simple: Just stop treating 302s this way if they point to a different domain. But for whatever reason Google isn't listening. Hopefully the press that's being generated now will give them the kick in the ass that they need.

Share
twitter facebook
Doesn't effect Yahoo (Score:5, Interesting)

by X ( 1235 ) writes: <x@xman.org> on Wednesday March 23, 2005 @01:30PM (#12025606) Homepage Journal

I'm surprised nobody has mentioned that Yahoo has already closed the 302 hole.

Share
twitter facebook
Simple Answer (Score:5, Insightful)

by rabtech ( 223758 ) writes: on Wednesday March 23, 2005 @01:38PM (#12025694) Homepage

There is a simple solution for Google: Only honor 302 redirects when the original and target domains match (or points to a subdomain of the original domain.)

In all other cases treat a 302 (temporary) as a 301 (permanent) redirect, thus giving credit for the content to the actual hoster of the content.

This allows webmasters to continue using 302s to setup logical URLs to mask the organization of underlying content but eliminates the ability to hijack completely.

Share
twitter facebook
clsc.net seems to be down... (Score:4, Interesting)

by luap2000 ( 314919 ) writes: on Wednesday March 23, 2005 @02:25PM (#12026337) Homepage

here's my write-up on the problem from early February called Google and the Mysterious Case of the 1969 Pagejackers [kuro5hin.org]. the problem has been around for a long, long time.

personally, i'm ready to give up google maps or something else (autolink?) if they would 'fix' this or at least be more transparent about what's going on. ;)

btw, the word on the net is that the googleguy posting here isn't the real one. anybody have details on this?

-kpaul

Share
twitter facebook
I don't get it... (Score:3, Insightful)

by jafiwam ( 310805 ) writes: on Wednesday March 23, 2005 @03:19PM (#12026967) Homepage Journal

Why all the yammering and discussion on this?

It's pretty simple; 302 redirects allow bad guys to exploit Google.

It doesn't matter that it's the wrong way to use a 302 redirect. They are the BAD GUYS. Remember the "spammers lie" truism?

It's the Google rule that is broken. 302 should be treated as "cant find site" in their search rankings rather than assuming the the data sent by the web server is honest. It sucks that some legit users of 302 won't get ranked as well because of it, but boo hoo. Let anybody that has hardware or software problems get better equipment in the first place if their freaking world ends when they don't get ranked in their keyword group. I have NO SYMPATHY for someone that shoestrings their vital revenue stream infrastructure and then wonders why things go bad. It reminds me of my job too much.

Buy Google ADs if you need to make money off your site traffic.

Google will change the rule or they won't. If they want to stay relevant, they'd better. I find myself getting irritated with Google's crappy search results a lot now days, sooner or later I will find one of the little startup to use and they can kiss off if it keeps up. So I figure they will get to it. They are Google, they are good at what they do.

Now what I think they should do is download snippets of pages via the Google toolbar which then sends the data to Google to make a massively distributed bot-net spider that is indistinquishable from the web-using masses. At that point, as far as exploiting Google via IP of the bot or user agent of the bot IT IS ALL OVER.

Move along, nothing to see here but a bunch of people that don't understand redirect and HTTP protocols.

Share
twitter facebook
- Mod parent up. (Score:3, Insightful)
  
  by MyLongNickName ( 822545 ) writes:
  
  This is hilarious! Someone please mod up! Hope I get the above mods in M2.
- - Re:A Real Question (Score:2, Informative)
    
    by justforaday ( 560408 ) writes:
    
    According to the previous article (posted a few days ago, and linked to in TFS), a page utilizing this redirect exploit essentially supplants the original page in Google's pagerank listings...
- - Re:Exactly. (Score:4, Insightful)
    
    by loraksus ( 171574 ) writes: on Wednesday March 23, 2005 @01:26PM (#12025560) Homepage
    
    I sort of agreed, it was really bad about a month or two ago, but has been getting better for most of the "commonly searched" terms. Some fairly obscure searches still turn up a bit of crap, but you can't do it for everyone.
    A "Don't show me any results from this subnet + domain from now on" feature would be nice, as would google banning some of the worst offenders (which it seems to have done).
    
    Parent Share
    twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Ugh. This is so not true. (Score:2, Informative)

Re:Ugh. This is so not true. (Score:5, Funny)

Re:Ugh. This is so not true. (Score:5, Funny)

Re:Ugh. This is so not true. (Score:3, Funny)

Can anybody provide a working example? (Score:3, Interesting)

Re:Ugh. This is so not true. (Score:2, Troll)

Re:Ugh. This is so not true. (Score:2, Insightful)

Re:Ugh. This is so not true. (Score:5, Informative)

Re:Ugh. This is so not true. (Score:5, Insightful)

Kindly extract your head from wherever it is (Score:5, Informative)

Re:Kindly extract your head from wherever it is (Score:3, Informative)

Re:OK, an example (Score:3, Insightful)

Re:OK, an example (Score:4, Informative)

Re:OK, an example (Score:3, Informative)

Re:Ugh. This is so not true. (Score:5, Informative)

Re:Ugh. This is so not true. (Score:3, Insightful)

You got an email from me! (Score:3, Informative)

Re:Ugh. This is so not true. (Score:5, Interesting)

Re:Ugh. This is so not true. (Score:4, Insightful)

OK, I'll bite ... (Score:4, Insightful)

Re:Ugh. This is so not true. (Score:4, Informative)

Robot.txt (Score:3, Insightful)

Re:Robot.txt (Score:5, Informative)

No, it's not about redirecting the user... (Score:5, Informative)

Re:Robot.txt (Score:5, Informative)

Re:Robot.txt (Score:5, Insightful)

Re:Robot.txt (Score:5, Informative)

Re:Robot.txt (Score:5, Funny)

Re:Robot.txt (Score:2)

Re:Robot.txt (Score:5, Informative)

Re:Robot.txt (Score:2)

Re:Robot.txt (Score:5, Informative)

Re:Robot.txt (Score:3, Informative)

Re:RTFA (Score:5, Insightful)

Re:RTFA (Score:5, Informative)

Re:RTFA (Score:3, Interesting)

Re:Robot.txt (Score:5, Informative)

I've had it with Google! (Score:5, Funny)

Google Cookie last until 2038! (Score:2)

Re:Google Cookie last until 2038! (Score:5, Funny)

Re:I've had it with Google! (Score:2, Informative)

How and when Yahoo fixed it (Score:3, Informative)

Easy to prosecute, hmmm? (Score:5, Interesting)

Re:Easy to prosecute, hmmm? (Score:5, Insightful)

Re:fraud, copyright, phishing, decency laws (Score:3, Insightful)

Law of the Internet (Score:5, Insightful)

302 (Score:5, Informative)

Re:302 (Score:5, Informative)

Re:302 (Score:2)

Re-re-explained (Score:5, Informative)

Re:Re-re-explained (Score:3, Insightful)

Re:302 (Score:5, Interesting)

Re:302 (Score:2, Interesting)

Re:302 (Score:5, Informative)

Re:302 (Score:3, Insightful)

Re:302 (Score:2, Informative)

But what's the point? (Score:2)

Re:But what's the point? (Score:4, Informative)

301 redirects (Score:3, Interesting)

Wrong (Score:5, Informative)

Why? (Score:2, Insightful)

Re:Why? (Score:2, Insightful)

Re:Why? (Score:2)

Do what I'm going to do... (Score:4, Insightful)

Re:Do what I'm going to do... (Score:3, Funny)

Re:Do what I'm going to do... (Score:2)

Web presence pressure (Score:5, Insightful)

Re:Web presence pressure (Score:3, Insightful)

Gopher (Score:5, Funny)

Re:Gopher (Score:5, Funny)

Re:Gopher (Score:2)

Re:Gopher (Score:3, Interesting)

Re:Gopher (Score:2)

Wait... (Score:5, Funny)

The super-slashdotting (Score:5, Funny)

Re:The super-slashdotting (Score:3, Insightful)

How to check if your site is being hijacked... (Score:5, Informative)

And how to report this to Google... (Score:3, Interesting)

Good explanation about 302 hijacking (Score:5, Informative)

Re:Good explanation about 302 hijacking (Score:3, Informative)