El Reg Says Google Choking on Spam Sites 234
Grubby Games writes "The Register is reporting that Google is full, and in trouble." From the article: "Recently, we featured a software tool that can create 100 Blogger weblogs in 24 minutes, called Blog Mass Installer. A subterranean industry of sites providing 'private label articles,' or PLAs exists to flesh out 'content' for these freshly minted sites. And as a result, legitimate sites are often caught in the cross fire. But the new algorithms may not be solely to blame. Google's chief executive Eric Schmidt has hinted at another reason for the recent chaos. In Google's earnings conference call last month, Schmidt was frank about the extent of the problem. 'Those machines are full,' he said. 'We have a huge machine crisis.'" James Robertson points out that's a fairly selective bit of quoting.
How Google crawls a site (Score:5, Interesting)
Re:Finally, an explanation (Score:4, Interesting)
I've heard of the user being ignorant... (Score:3, Interesting)
Eh, or I could be completely off my rocker, and just not competent enough to see a simple and effective method of combating these guys.
Fud Light (Score:2, Interesting)
Re:Adsense is to blame (Score:4, Interesting)
Banner ads were taking the same path. If anything, we should thank google for making internet advertising less intrusive.
If google and the spammers have an arms race... (Score:5, Interesting)
Re:One idea? (Score:3, Interesting)
I foresee a time when to access large parts of the net you will be required to use some central "proof of life" system. The current mish-mash of captchas isn't working. We have custom English captchas on a forum I admin and it doesn't seem to stop the bots: presumably when they get stuck they call for help.
It's hard to believe a third of Googles index is auto-generated crap, but then I couldn't really believe the "50% of net traffic is spam or viruses" claim either and I'm pretty sure that one turned out to be true. It appears that an unregulated commons will always degenerate into a wasteland without some form of governance and law enforcement; perhaps rather than an arms race the only solution is for the internet to grow its own legal system and police force (how that'd work is left as an exercise to the imagination)
Re:Google is full. Try this... (Score:2, Interesting)
Top 10 results for "slashdot poneys" on yahoo:
1. slashdot.cuteness.org (not on google)
2. jfaughnan.blogspot.com (#1 on google)
3. jfaughnan.blogspot.com (#1 on google)
4. index.cristal-trace.com (not on google, outdated link)
5. mfrost.typepad.com (#22 on google)
6. pcdq.blogspot.com (not on google)
7. www.ninme.com (#15 on google)
8. www.firstworld.biz (not on google, spam)
9. musicindustry.firsindustry.com (not on google, spam)
10. girls-having-sex-with-horses.danielblog.info (not on google, spam)
Top 10 on google:
1. jfaughnan.blogspot.com (#2 on yahoo)
2. slashdot.org (not on yahoo)
3. slashdot.org (not on yahoo)
4. linux.slashdot.org (#27 on yahoo)
5. linux.slashdot.org (#27 on yahoo)
6. mitternachts-lied.net (#22 on yahoo)
7. interviews.slashdot.org (not on yahoo)
8. linuxfr.org (#19 on yahoo)
9. www.releton.com (not on yahoo)
10. www.japancar.fr (not on yahoo)
Both yahoo and google are missing pages from their indexes. Some appear on one but not the other. Yahoo was slightly worse at indexing spam sites. (Is www.releton.com spam?)
I'd say both are 'full' in the sense that neither seems to have enough capacity to index everything.