Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?
The Internet

Google's Search Results Degraded? 209

scrm writes "According to this Wired article, recent tweaks to Google's PageRank search algorithm have degraded rather than improved the accuracy of the results." I noticed this firsthand the other day, but only when I was searching for pictures of famous people, but all my technical queries came back fine.
This discussion has been archived. No new comments can be posted.

Google's Search Results Degraded?

Comments Filter:
  • by inc0gnito ( 443709 ) on Saturday October 05, 2002 @10:40AM (#4393061) Homepage
    Heh, and just who are these famous people you had to have pictures of?
  • Yeah sure! (Score:5, Funny)

    by unixmaster ( 573907 ) on Saturday October 05, 2002 @10:44AM (#4393076) Journal
    Old algorithm was better . See http://fantomaster.com/graphics/googhell.gif [fantomaster.com]
    • by registro ( 608191 ) on Saturday October 05, 2002 @11:27AM (#4393218)
      Well, this is what we found until now. In order to fight "Googlebombing" and "Pagerank for sale", they may have downgraded results when the Keywords is not in some important part of the on-page text, (to stop googlebombing), and the anchortext in ranking may have been tuned down, (to stop Pagerank monetization), specially if the linking pages do not have a good PageRank to begging with. Internal links, and links from interlinked pages may have been tuned down also. Still, we have sees as many as 200 regional competitive cats easily dominated by unscrupulous Dmoz editors. We have done some testing on that.

      To test if we really are in front of a Dmoz dominated Update, we have set up a Aspseek based small search engine, a GNU search engine with a crude PageRank-like ranking system. We have indexed around 1.500.000 pages, using as the starting point 700 Dmoz very competitive dmoz cats, including up to 250 pages per site, following up to 10 outside link, with up to 100 pages per outside link. What we found was that 59% of our top 20 results on the 100 cat-related competitive Keywords where also top 20 in Googles new index, and 26% of our Top 10 where also Googles Top 10.

      But we must also said that we have not been able to find a so-compelling relationship using no-competitive categories. A 2.000.000 pages index with non-competitive regional cats, using non-competitive Keywords, showed a very small correlation between top 10, top 20, and even top 50 results.

      So, our working theory right now is, yes, small changes, probably committed in order to fight both googlebombing and Pagerank commoditization, have affected the index accuracy in many different ways.

      We think the index is unbalance, or unless much more unbalanced than the last one, and, as a result, the weight of some previously no-so important characteristics are souped-up, opening the door for abuse.

      The main effect of all of this is small well managed sites at competitive categories can not relay anymore in good content + good linking to get a good listing. They will have to pay Google using Addwords program. That is the main change here. Forcing small business to pay to get at the top, using advertising space.

      But we do think this update and the changes committed are, to say the least, unbalanced, and the new algo is rampantly open to easy abuse. Lets hope Google good old Phd common sense returns soon, and a new, improved update takes place as soon as possible. Lets hope they are not tring to make a easy killing by forcin small popular sites, all of the sudden deprived from traffic, to pay google using Adwords.

  • actually ... (Score:5, Interesting)

    by js995 ( 608590 ) on Saturday October 05, 2002 @10:45AM (#4393079) Homepage
    The article suggests that many people are saying pagerank is working badly because they have lost their previous power to affect search results. Overall, the pagerank seems to have improved in this latest incarnation (IMHO)
    • When I search heavily on Google (many different search terms) for somewhat common stuff, I find a lot of the pages are the *exact same* free hosting pages with random words on them, that now redirect to this one company's webpage, which had absolutely no useful information, but wants me to pay them anyway.
      • Re:actually ... (Score:5, Informative)

        by zyklone ( 8959 ) on Saturday October 05, 2002 @11:19AM (#4393191) Homepage
        There is a company called Search44 which seems to have made this kind of stuff their living.

        They index lots of other sites pages and when google comes around spidering they return random content from them. If you follow one of these links from google you will be redirected to their portal.

        No doubt it gives them quite a bit of traffic.
        • No doubt it gives them quite a bit of traffic.

          "Come on, Slashdot me! Gimme your best shot! What, are you yellow? Chicken! I've got bandwidth coming out of my ears! Slashdot me! Just you try!"

    • I agree. I just finally got the page rank that I deserve. I'm very happy with the change. They got rid of a LOT of intentional search engine spammers. Good riddens. Google is much better now, not worse.
      • Not quite. Now, at the competive categories. Is pretty easy to detect who is the one at the top at conpetitive searches:
        The spammer with the greater number of Dmoz Editor accounts.
    • Re:actually ... (Score:5, Interesting)

      by roddymclachlan ( 169065 ) on Saturday October 05, 2002 @11:16AM (#4393183)
      Mmmm ... no one tries harder to influence google search than scientologists, they have countless different web sites and front groups all linking to each other to boost Google ratings. However shortly after the cult tried to censor Xenu (Operation Clambake) [xenu.net] a Google search on scientology [google.com] ranked Xenu top. Now it comes second rated, although you'd be hard pushed ot find it at all among the 2-million Scientology sponsored links on the results page ... (so be sure to add Xenu to your links page if you have one, you could save the life and wallet of some naive soul) ...
      • If you ignore the sponsored links that's not so bad. 5 of the first 10 results are anti-Co$ sites and 3 of the first 5 are anti-Co$.
      • Try searching for cell phone ringtones or popular song lyrics sometime. Those guys have tricking google down to a science. Javascript redirects, fake URLs with keywords in them, you'll find it all. The sad thing is it seems to be working. It is obvious that it is very hard to get a high ranking in these kind of searches without using some sort of google-tricking mechanism.
    • I notice that google gives a date with the results, and that this is when the googlebot, apparently, visited the link. As a test, I put my keywords in over several days, and google returned the page I was looking for with a date that was, in most cases, only a few days old, together with a short sentence from the beginning of the page. I had thought that googlebot took a month to go around the internet and list all the pages, but from this date thing, I would think that googlebot only takes a few days at the most. Try this: put "gary gray tropical discussion" in google. I just did, and the date returned is 10-1. I didn't put it there, googlebot must have, IMHO. Very nice, anyway, and google and search engines that work with a bot remain my favorite.
    • They degraded the "competitive" categories, aka categories where people is more likely to pay Google using Adwords (Google Ad program) if they don't get a good listing.

      Most of dose categories are now dominated by garbage sites and spamers. Good sites with a lot of content and a lot of popular links who used to be #1 or #2 are now #60 or #100, precede with spammers and affiliation programs and 1 page redirection sites.

      Those sites are being forced to pay Googles "affordable" Adwords program to stay alive.

  • Quote ... (Score:4, Funny)

    by uq1 ( 59540 ) on Saturday October 05, 2002 @10:45AM (#4393080)
    "I noticed this firsthand the other day, but only when I was searching for pictures of famous people, but all my technical queries came back fine."

    Famous people Google Search: "Courtney Cox getting a facial"
    Results: "Courtney Cox advertises facial creme from Loreal"

    Technical Google Search: "How do best use lube and type one handed"
    Results: "www.pinkbits.com"
  • by Carnage4Life ( 106069 ) on Saturday October 05, 2002 @10:45AM (#4393083) Homepage Journal
    ...available in Mark Pilgrim's blog [diveintomark.org]
  • Well speaking for the only site whose PageRank I regularly keep track of (my site) the new algorithm hasn't changed all that much. I, for one, am now coming up a little higher.
    • Oh, its definetely flawed then. Seriously, though isn't the efficiency of this system just a matter of oppinion? Also, thats what happens when you get rid of those pesky META tags. Comment on my sig, and I'll shoot you.
      • "Seriously, though isn't the efficiency of this system just a matter of oppinion?"

        I absolutely agree. The people who go down will say that the new algorithm is worse, and the people who go up will say it's an improvement.

        In my case, I've been steadily going up for a while, and so the fact that I went up doesn't seem obviously related to a change in algorithm.

        For a while I've held the #1 spot for 'Andromeda' and I just went to #5 from #7 for 'MP3 server'.

        Personally, I can't tell the difference in searches for other sites.

        From what I understand, the change is supposed to help new sites, which seems like a good thing (unless it's easy to exploit).

  • Pideon (Score:5, Funny)

    by joyoflinux ( 522023 ) <thejoyoflinux.yahoo@com> on Saturday October 05, 2002 @10:48AM (#4393095)
    Those pideons [google.com] must be slacking off...:)
  • by acomj ( 20611 ) on Saturday October 05, 2002 @10:53AM (#4393109) Homepage
    google still seems the best. Sometimes I use teoma or lycos because they give different top results. Being tied to one search engine seems bad as you miss alot.

    I had an instructor point us to a page on networking that was amazing good but not found on any of those 3 search engines, at least in the top 30. Most of those top 30 hits wern't very good either.

    Maybe yahoo has it right, The web should be indexed by people.
    • by jafiwam ( 310805 ) on Saturday October 05, 2002 @11:02AM (#4393138) Homepage Journal
      Yes, the web should be indexed by people.

      But how about not one that requires sites to pay to get in? (Yahoo)

      At Google, go to the "Directory" tab, or go to DMOZ.org (Open Directory) itself. DMOZ is bigger, better organized, has fewer broken links, no ads, and is built by hand by people who know their categories and are interested in keeping them linking only to sites with meaningful content.

      Semi-mindles search spiders are not all there is in finding stuff on the Internet.
      • by great throwdini ( 118430 ) on Saturday October 05, 2002 @11:50AM (#4393304)

        At Google, go to the "Directory" tab, or go to DMOZ.org (Open Directory) itself. DMOZ is bigger, better organized, has fewer broken links, no ads, and is built by hand by people who know their categories and are interested in keeping them linking only to sites with meaningful content.

        First, I would suggest going directly to the categories at dmoz.org rather than the Google relistings. Google picks up revised RDF dumps from DMOZ whenever they please, but the lag in the cycle is pretty long. If you are looking for the "fresher" data, go directly to the source.

        Second, DMOZ can become what you say it is only with proper editing. The project itself may list 50000+ editors [dmoz.org], but they're volunteers and there is a lot of ground to cover. A large number of edits are made by those "high up" in the directory structure to "lower"/"deeper" categories less well understood. Certain branches of the project are neglected; others eat editors for breakfast with the amount of work that needs to be done. Volunteer and help out.

        You may also want to investigate ChefMoz [chefmoz.org] and MusicMoz [musicmoz.org], too.

        • MusiMoz is a really a good idea, but still not a lot of material, I really miss Audiogalaxy for discovering new music and artists. It will be intersting later to make some connection betwen p2p programs and MusicMoz when the later will really mature.
        • DMOZ also has the problem that many of the editors are biased. This is especially true in commercial categories, where editors will often make it difficult for competitors to get their sites listed. Yes, there are supposed to be rules against that, but it still happens. For commercial categories it's hard to find expects who are also impartial.
    • I've had occasion to play around with AllTheWeb [alltheweb.com], which also often gives different (but still relevant) results on lots of queries. Worth a look.
    • Alltheweb works pretty well as a google alternative, and it has a better design too :)

      Don't get the wrong idea, I like google, but having just one search engine that's commonly used is a bad idea. No other company can really compete due to the fact that google has so much traffic to sell ads on.

      So anyway, that's why I pimp alltheweb whenever this topic comes up.
      • I like alltheweb [alltheweb.com] as an alternative too. Their indexing isn't as good as google's, but the do seem to index a lot of pages that google has missed.

        I use it if google gives me less than a dozen hits for a query.

    • by Snork Asaurus ( 595692 ) on Saturday October 05, 2002 @02:24PM (#4393907) Journal
      The following is highly subjective, but I do a great deal of searching.

      Background: Among other things, I am always trying to discover music from independent (especially blues) artists that post mp3's of their stuff on the web. I have been boycotting the major record labels 100% for about 15 years (hooray for independents!) for several reasons: 1: CD prices have always been a rip-off, 2: most major label artists suck, 3: I have worked in music business (artist/bands/production) and detest the industry for the way that it exploits artists, 4: I have always loved discovering talented "unknowns" and turning other people on to them. I went through a new music dry spell until the web started to become a vehicle for independent artists to promote themselves.

      It's amazing what's out there now - I've found great artists from all over the Americas, Europe, Australia, Russia and even a few from Asia. I have found a lot of crap, too. The mp3 search engines are essentially useless for this purpose (I don't want major label music) and I have never used Napster or any of the off-spring. Links pages are more often out of date than not and webrings have similar problems. I have contrived several search techniques that try to exploit the strengths of search engines and the likely information on an artist's site. One very simple one is to look for "mp3 +(insert name of a well-known blues standard) -(a lot of keywords to exclude the many sites that put "mp3" on every page that simply lists a song title just to pull in traffic) -(specific sites that pollute the searches)", to find artists that cover the song and also have their own tunes.

      I have been a proponent of Google for many years. It came along just as I really started to dislike Altavista and I was an almost instant convert. But I am always on the lookout for a backup or something better. I have tried Teoma several times in the last year (as recently as last night), but I'm not terribly impressed. I find its interface and the way it presents results simplistic and dumbed down and it appears to have indexed far less of the web than Google. I got turned off Lycos years ago, when it seemed to want to become another portal/Yahoo (as if we need another one).

      The one search engine that I do use as a regular alternative to Google is Alltheweb [alltheweb.com]. For one thing, IMO, its advanced search is currently better than Google's (I swear that I have brought Google to its knees by entering too many keywords - it stops responding and is inaccessible for several minutes thereafter - this has happened several times). When I've done back to back comparisons with Google, Alltheweb seems to fare pretty well and seems to find more international pages than Google. The difference in top rankings can also be useful. Google has some nice features that Alltheweb does not, such as the elimination of duplicate pages.

      For one-stop searching, I find Google best for me, but Alltheweb is a good alternative.

  • Hype hurts (Score:4, Insightful)

    by jetlag11235 ( 605532 ) on Saturday October 05, 2002 @10:55AM (#4393116) Homepage
    Regardless of whether or not the changes have degraded the service Google provides, unless Google (quickly) addresses this problem to the (at least superficial) satisfaction of people, it will hurt Google.

    AlltheWeb.com must be soaking this up with glee.

    -- jetlag --
  • by Anonymous Coward
    Sadly its true :( [google.com]
  • Pilgrim, who earns his living as a Web accessibility consultant, said in a phone call [...] "And 404's in the top 10? Hello? Pages that are completely blank -- how did those get in there?"

    This seems to be the most evidence presented in the article. Looks like baseless whining to me.
  • by Anonymous Coward
    We need the original-flavored Google back. Maybe a cache of Google linked from Google?
  • Don't Panic (Score:4, Insightful)

    by targo ( 409974 ) <targo_t@@@hotmail...com> on Saturday October 05, 2002 @11:10AM (#4393163) Homepage
    This is most probably not intentional.
    There are glitches in every complicated software. Maybe they were trying out some new algorithm that wasn't completely refined yet. Maybe it was a random off-by-one bug that has been already fixed. Shit happens all the time, Google is no different.

    There will probably be many people who try to see a conspiracy theory behind this and say that Google has sold out.
    This is very unlikely. The nature of the described flaw suggests that all queries are affected. Now why should they skew the results of everything to appease a single entity who might have given them some money? That just doesn't make sense.
  • Dmoz is King (Score:5, Interesting)

    by DeadSea ( 69598 ) on Saturday October 05, 2002 @11:11AM (#4393169) Homepage Journal
    Folks in the forums at webmasterworld [webmasterworld.com] speculate that google is putting the most weight on words that are found in the title of the site and in the listing of the site on the open directory project [dmoz.org].

    We who are editors at dmoz hold a lot of power right now. Its time for you to share in some of that power. Head over to dmoz and apply to edit your favorite category.

    Can't decide where to apply?

    • We who are editors at dmoz hold a lot of power right now.

      I fear that is all too true. I have encountered Google entries ranked "top five" in result sets of greater than 10,000 members where: (a) the search term(s) are not present in the document; (b) no other Google-indexed document links to the one in question; (c) sole use of the search term(s) is in a DMOZ listing of the document.

      It would seem that getting listed on DMOZ can really affect Google "placement" -- I cannot determine whether this is the result of simply weighting DMOZ or due to a "multiplier effect" as Google indexes all those many, many sites who utilize the RDF generated by the Open Directory Project.

      Then again, Google top-ranks a Web site that hasn't been on the Web in over a year for one query I run with frequency. I don't think I can point the finger at the ODP for that one.

    • The only thing I don't understand about Dmoz is why "Gay, Lesbian, and Bi-Sexual" is in every single top category. Do they have it in for the heterosexuals or something? Do they think that sexual preference is so much more important than race, creed, gender, etc. that it should enjoy its own category, while the others don't?

      That's why I prefer yahoo. :-q
  • by Reziac ( 43301 ) on Saturday October 05, 2002 @11:12AM (#4393172) Homepage Journal
    Whether the rest of the article and of Google's changes are simply causing a rash of sour-grape whining or not, one thing I did notice when I used it yesterday: for a current topic of major interest at least to its part of the world, I got a helluva lot of dead links and blank pages ("Document contains no data" and when I checked, sure enough, it was just the HTML and /HTML tags, with no content). This did strike me as unusual not to mention annoying. More to the strange, none of these "dead pages" were in Google's own cache.

    (I still haven't found what I was looking for :( and no, it's not pr0n :)

    • <html> and </html> tags are what mozilla shows when it doesn't get anything from a server. It looks more like connection problems on your end than anything else. However, it seems to me like these pages are cgi scripts designed to spit out the search terms that people find them with to increase their search ranks. A lot of them simply redirect to other sites when you click.
      • This isn't Mozilla, it's Netscape 3.04 which is less prone to make things up -- when it gets nothing from a server, it usually whines "NS is unable to find the file named blah-blah".

        Tho I did try in Mozilla as well, same result. THEN I checked docsource in both browsers, and saw what I noted in my previous post.

        The live pages on the SAME server were behaving normally in Netscape, so it's clearly not a connexion problem (in NS, that usually produces the "connexion reset by peer" complaint). When I went to their root site and tried looking thru their links and site search, it appeared that the entire branch I'd hoped to read had been deleted (anyway, no reference to it anywhere in sight) -- assuming it ever existed!! There was no scripting involved, and no reason for anyone at this site to give a damn where they're ranked.

        If you're wondering, I was looking for historical maps incrementally showing the shoreline for Devil's Lake, North Dakota, to supplement what I've found in ancient atlases.

    • by great throwdini ( 118430 ) on Saturday October 05, 2002 @12:55PM (#4393551)

      I got a helluva lot of dead links and blank pages.

      Definitely so [google.com] - check the first result. This is nothing new. For the referenced query, the faulty top-ranking has existed for a long time, though the site in question hasn't existed for over a year. [archive.org] I've even written support a number of times (blatant errors such as these are shameful). Google is far from perfect.

      • The first result to that query when I tried was http://www.win.tue.nl/~kroisos/roguelike.html, which does appear to be up...
        • The first result to that query [...] does appear to be up...

          You're posting two plus days, one more email to support (see thread), and a direct removal request (see thread) by me later ... they did finally kill the errant entry. One year after it was removed from the 'Net, but still it's gone.

          One small step for...

  • by saginaw ( 602407 )
    I had been searching for information on US bills that are still in circulation, and I found that google did a poor job in finding what I was looking for. The catch is that I did the same search about two months ago, and I got all kinds of decent sites from google.
  • Well, so much for PageRank. Im afraid this is whats going on: Link base PageRank, like 1 person per vote democracy, was turning dangerous for the few with enought resources to put some pressure at google. Lest see how they solved it. Problem: http://www.google.com/search?q=%22go+to+hell%22 Microsoft was #1 at "go to hell", thanks to popular vote like linking. AOL was #3, Disney #4. Solution: http://www2.google.com/search?q=%22go+to+hell%22 They are no counting any more Anchor links from "no-authoritative" web pages. Joe Doe pages dont count any more. Only Anchor links coming from pages oficialy "recogniced" at big sites whit a superior page rank to start with, or directories like Dmoz or yahoo count now. If this is the case, we can say PageRank is DEAD. From now on, big corporation marketing rules over popular choice. take this as a example: http://www.google.com/search?q=correo+gratis "correo gratis" is spanish for "free mail". Hotmail was #1. Now, at www2, is nowhere to be found. Hotmail is pagerank 9, and hundreds of spanish web pages where pointing at it as "correo gratis". Now is not, but is still #1 if you look for "free mail". Why? joe doe pages dont count, hundres of spanish users linking at it dont count any more. Only the "official", msn network pages count now, plus the few Dmoz pages pointing at it using that text as links. Most of those pages happen to be English Only, so only the english version of the query survives. Oh well. .
    • Must be how I moved up in some search results. My page is linked from about 20 slashdot pages, so I must be pretty popular. (since it is linked on every slashdot post I make)
    • registro wrote:

      From now on, big corporation marketing rules over popular choice. take this as a example: http://www.google.com/search?q=correo+gratis "correo gratis" is spanish for "free mail". Hotmail was #1. Now, at www2, is nowhere to be found. Hotmail is pagerank 9, and hundreds of spanish web pages where pointing at it as "correo gratis". Now is not, but is still #1 if you look for "free mail". Why? joe doe pages dont count, hundres of spanish users linking at it dont count any more. Only the "official", msn network pages count now, plus the few Dmoz pages pointing at it using that text as links. Most of those pages happen to be English Only, so only the english version of the query survives.

      This doesn't make sense. The current top result on a search for `correo gratis' [google.com] is LatinMail - Tu Correo Gratuito En Español [latinmail.com]. It seems to me that the listings have gotten more accurate, not less, if my search in Spanish for free mail returns a Spanish company instead of a Yankee one (Hotmail is Microsoft, in case you weren't aware). LatinMail belongs to eresMas [eresmas.com], who are headquartered in Madrid.

    • Come on, Microsoft and bill gates have had bad terms associated with them at google for ages. Assuming that google changed this just for M$'s sake is ludicris.

      If that was all they were worried about, they could simply have manually changed those searches to exclude MS (as they have done for people as small as Bernie Shiftman [alltheweb.com]), that guy who spammed his resume around everywhere. Searches on his name would turn up pages bitching about him.)

      Btw, you definitely deserve a +5 for plagiarizing this post verbatim [webmasterworld.com]. Well, except for the paragraph breaks, I guess.
    • Most of those "funny" search results (like go to hell) were the result of googlebombing, one of the very few loopholes in Google's spider. That loophole has now been closed, and I personally don't give a crap. Remember that Scientology was using it to inflate their Google rank too.
  • It seems like whenever someone makes a decent search engine, someone has to tinker with it until it breaks. I wish they would stop that.
  • Well there is a problem solved. Searching 'scientology' [google.com] comes with www.xenu.net [xenu.net] at #2. This is a Good Thing(TM).

    I encourage people not to judge Google's update before a few patches. All x.0 releases are non-functional. All you software engineer should know.

    -- nyri
  • by Styx ( 15057 ) on Saturday October 05, 2002 @11:30AM (#4393229) Homepage
    Pilgrim, whose blog dropped from first to sixth place in a search for "mark," admitted that weblogs may have been overrated prior to the latest index. "I was beating out Mark Twain before -- that's probably not fair."
  • by autopr0n ( 534291 ) on Saturday October 05, 2002 @11:48AM (#4393293) Homepage Journal
    According to this Wired article, recent tweaks to Google's PageRank search algorithm have degraded rather than improved the accuracy of the results

    Actualy, according to the artical, a few people who run blogs seem to think that google has been degraded, while google itself has not seen a higher number of actual complaints.

    Basicaly what happened is that google took some mesures to reduce the effects of "googlebombing" by bloggers.
    • Actualy, according to the artical, a few people who run blogs seem to think that google has been degraded, while google itself has not seen a higher number of actual complaints.

      I'm not a blogger and I've noticed more and more crap results on the first pages of my google searches in the past month or so. Unfortunately for Google, I haven't really stopped to think about why my search results have been crappy (and thus, complained to Google about it), but I have noticed I've been getting better results from alltheweb.com.

      The silent majority aren't going to complain about bad search results, they'll just move to another site that does better. How do you think Google managed to steal the crown from Altavista, et al. in the first place?

    • The results are hosed, the timing of the article is interesting for me, because I just sent a complaint to google about 30 days ago. Any search that contains the term "wholesale" in it contains pages and pages of results (some times as many as ten consective pages) that all redirect to ebay. Now obviously "wholesale" is going to be a spam magnet, but to have in excess of 75% of the first 10 pages of results all link to some sort of redirection script indicates some sort of page ranking issue.

  • by chrysalis ( 50680 ) on Saturday October 05, 2002 @11:59AM (#4393344) Homepage
    Google's results are now less accurate because people are cheating. The way pagerank works is widely documented, and people abuse from it to get better scores.

    I work for a company that hosts pr0n sites. Maybe 95% of our partneers are cheating that way. Fake sites, fake auto-generated HTML pages (with pseudo real sentences), cloaking (what Google sees is not what visitors see), javascript tricks, etc. are a must. They spend most of their time on trapping google, it brings more money than working on the site itself.

    The company I'm working for has even a team working full-time on this (spamming search engines, and creating thousands of fake sites just to promote one real site) .

  • darn (Score:4, Funny)

    by Raiford ( 599622 ) on Saturday October 05, 2002 @12:00PM (#4393346) Journal
    never should have stopped using gopher...
  • by Boss, Pointy Haired ( 537010 ) on Saturday October 05, 2002 @12:08PM (#4393373)
    Here's why.

    Your job, as a webmaster, is to produce a user friendly, useful, maybe informative website.

    Google's job, as a search engine, is to find the sites most likely to be of interest to a user, based on their search terms.


    To get good rankings, all you have to do as a webmaster is produce a user friendly, useful, maybe informative website.

    It is Google's job to optimise to the web, not the web's job to optimise to Google.


    Search Engine Optimisation is big massive NET LOSS to you, because all it results in is getting visitors who aren't the slightest bit interested in your website or product.

    It also results in a soon to be pretty useless Google, so please don't do it.
    • I agree with you 100%, but I don't think it'll ever happen. If everyone did as you suggested, and no one cheated, we'd live in a perfect index world - search results would accurately reflect what's on the web. Searches would be easier, and more relevant sites would come up with more specific searches. The number of hits you got coming from Google (or anyone else) would likely depend on the "real" content of your site.

      Then, one day, someone would rediscover that by putting "Anna Kournikova Blowjob" on their site, they'd get some hits they wouldn't have otherwise. No harm, they think, it's a few bytes and it's not hurting anyone. Webmaster tells his friends, they tell their coworkers, and this continues until the search engines have to begin working around the issue.

      Sound familiar?

      Unfortunately, people aren't always honest, especially when it's something they perceive as benefiting them without hurting anyone else, especially if the benefit is financial. They don't care about the integrity of some other website's engine, they care about profits. Realize also that even personal sites do this; people like other people to see what they've made. While your idea is a great fantasy, near-perfect search results will only come from human-edited sites, or a better algorithm than we have now.
    • Your job, as a webmaster, is to produce a user friendly, useful, maybe informative website.

      Google's job, as a search engine, is to find the sites most likely to be of interest to a user, based on their search terms.

      And somewhere in the middle, the two concerns meet. There are a *lot* of unintentionally odd entries in Google that should be cleared out as misleading. Three examples: (1) Redirection URLs from one site but content from the targeted site; (2) Past versions of multi-version documents such as Wiki revision URLs; and (3) Static redirection documents from relocated sites.

      All should probably be combatted by site maintainers through application of robot exclusion rules and a little redirection know-how. However, given the frequency with which I've encounted these awkward entries, I'm guessing that attention to such detail has gone by the wayside.

    • The web is not optimizing Google.

      Your job as a webmaster: Generate massive quantities of traffic... While figuring out how to be profitable.

      Google's job, as a search engine: Generate massive quantities of traffic... While figuring out how to be profitable.

      Now, the webmaster could accomplish that by offering tons of great content at a better price, and Google can accomplish this by having the most relevant results. That takes care of the generating traffic.

      Now, profitability... Tons of great quantity at a reasonable price will not be supported by 100 users... You need hundreds of thousands for the economies of scale to work in your favor. End result, you do everything in your power to drive traffic at your website. Top of mind awareness, etc... What you postulate is that they should stop sending me Credit Card offers because if I wanted one, I'd call. True, but would I call your company or another one??

      SEO actually shows that we are achieving a Search engine monoculture. Back in the day when we had varied competing engines, bombing one site hardly had any impact. (Actually, most results were bombed so you got used to checking several engines.) And these techniques were around about two minutes after the first Search Engine went online.

      Google was an evolution beyond the early Excite, Lycos, et. al. If an unusable Google is the result, new ideas will surface that are evolutionary. This shows that there are areas where you could improve on Google search technology and either compete or license it to Google. Then we will have an improvement over Google like Google was over Infoseek, like Infoseek was an improvement over Yahoo.

      Webmasters are NEVER going to play nice with the search engines. The job of the search engine designer will be to limit the ways in which his system can be abused. XHTML should be a big help, as most search engines get easily fooled by having their parsers parse verrrrrrry loose HTML. When they start actual content checking and checking for invisible divs and other HTML tricks, the engines will evolve again. I'm suprised with the increase in processor speed it hasn't happened more recently, but the extra horsepower has been dedicated to indexing PDF etc... Next generation maybe.

    • That's not how the real world works. Consider micro$oft -- they do not create the best, most useful, most stable software; they created the biggest marketing machine the world has ever seen. I do not recommend emulating micro$oft; I am merely using that famous example to illustrate my point.

      On the real web, the one where I make my living building and promoting commerce sites, the truth is that no matter how aesthetically pleasing, no matter how well engineered for human use, no matter how informative your site is or how low your prices are, if your site is not easily located by the methods people actually use, it is worthless.

      I don't waste my time doing search engine optimization -- I make very good use of my time doing search engine optimization. I don't use doorway pages, spamdexing, googlebaiting, or any such cheats, but I do build every site from the ground up with the primary focus being search engine spidering and indexing. That's why I'm still in business, and my clients are not only still in business but are spending right now on expansions and upgrades to their sites.

      Your utopian vision would be fine if the real world would just consent to be ruled by it, but the real world is a fickle and contrary organism that defies such simplistic wishfulness.

  • observations... (Score:5, Interesting)

    by Lazy Jones ( 8403 ) on Saturday October 05, 2002 @12:12PM (#4393384) Homepage Journal
    * you can't beat the best google-spammers in the end ... they're always smarter, quicker, difficult to identify

    * worse rankings with a particular keyword mean that a company will seriously consider using AdWords to maximize the traffic gain from google - so hey, it's good for google ... let's hope this isn't a reason

    * the big mistake is to use a "static" relationship between websites as a measure for a site's traffic or importance - better offer a "google counter" (google has the resources, I suppose)

    All things considered, Google is still doing pretty well.

  • by Anonymous Coward
    Google used to be pretty damn good. Have a hard time finding stuff these days though. I think one part of it is time. Google made its debut during an explosive time of growth for the 'net. That means the majority of the searchable items were roughly the same age, and so timewise context wasn't really an issue. But now that we've all got a few more years around our waists, some of those documents aren't going to be much use to us folks who are living here in the next millenium. An aged document should get a heavy negative score unless the user is explicitly looking for old stuff.

    Was having problems with a program last night and kept finding links from 1996. I kept saying to myself, "My good man, surely those bugs have been fixed since then!"

    Looking for things on the net with any search engine is a distinctly unpleasant experience. Kind of an impotent feeling to type in an explicit search with all the trappings ("bla bla" -foo -gar..etc), and to get a bunch of nothing in the results.

    And don't ask me why, but I've been getting an uneasy feeling about google. Where do they get all the money such a company needs to stay afloat? They're in a pretty good position to learn all kinds of neat details about everybody. Pretty powerful position to hold.
  • Panic! Disaster!

    "tim ward cambridge" always used to get me in seven of the top ten places - now it's down to six!! Better get my mates to put up a few more links ...

    But as I'm still on the first page with "accommodation cambridge" I can agree - famous people are worse off, but technical queries are unaffected.
  • by dpbsmith ( 263124 ) on Saturday October 05, 2002 @12:22PM (#4393419) Homepage
    ...because I've noticed some odd correlations.

    For a long time, a search on "Samuel Johnson" returned Frank Lynch's "Samuel Johnson Sound Bite Page" as the first hit. And, flatteringly, but mysteriously, a search on "Eyeglass Prescription" returned a web page of mine as the first hit. (I say "mysteriously" because the only page that Google reports as linking to my page is... my own home page! So it is not PageRank that accounted for its ranking).

    About a month ago, Frank's page dropped to #3 and mine dropped to about #20. In Frank's case, the #1 spot went to a fine Samuel Johnson web site at Rutgers; in mine, I was edged out by a bunch of commercial sites selling eyeglasses.

    The interesting thing is that two or weeks ago both sites popped back to number 1.

    And then a few weeks later, Frank's is again at #3 and mine is down around #10 or so.

    I don't think there's any reason why eyeglass prescriptions and Samuel Johnson would be connected. (And, no, Frank's page and mine do NOT link to each other!) So the changes must reflect tinkering by Google.

    Neither Frank nor I use any kind of "cheating" to boost our ratings. And I don't think the sites that climbed above our did, either. Nor do I think many of the sites involved changed ANYTHING significant that would have altered their rankings.

    (BTW I'm NOT giving URL's because the contents of these pages are irrelevant to my observations, I don't want them slashdotted, and this is NOT an attempt to boost the rankings of either page).
    • I have some pages ranking various search terms according to google hits (see my home page for links). I noticed that new terms tend to switch between two numbers for some time, but finally they stabilize. I suspect this is a cache effect, Google is a big system, and information may take some time to propagate within it.
  • by K-Man ( 4117 ) on Saturday October 05, 2002 @12:53PM (#4393542)
    When I wanted to look up Jeff Ullman's home page [stanford.edu], I used to go to google and type "ullman" [google.com] and he would come up first. I assumed that being a Stanford professor guaranteed a number one ranking, but now he's slipped a notch, and some company that makes sails is in his slot.

    He still beats Tracey Ullman, though.

  • Plural oddity (Score:4, Informative)

    by Tablizer ( 95088 ) on Saturday October 05, 2002 @01:27PM (#4393684) Journal
    I noticed that google is plural-sensative. For example, "SQL alternative" will give a bit different answer than "SQL alternatives".

    It does not seem like a very good idea to me in most cases.
    • Re:Plural oddity (Score:2, Informative)

      by Anonymous Coward
      the reason they're doing this is that google's indexing a *lot* of non-english pages... stemming algorithms (which would index 'alternatives' as 'alternative') generally work decently well on english, but applying even the most basic rules to other languages has really strange, unpredictable, and generally bad results. so, instead of trying to figure out what language each page/query is in and stemming appropriately, which would be very, very painful to do, they just ignore stemming. also, IIRC, another reason that stemming isn't used much is that its been found in academic studies that it doesn't really improve the results you end up with that much...
      • (* the reason they're doing this is that google's indexing a *lot* of non-english pages *)

        If it is an English word, then they should ingore plurality. Generally, it should be pretty easy to determine the language of a page anyhow using statistical probabilities, etc. If the metric suggests that it is mixed, then maybe be more literal.
  • This suggests a related thread, not so far off topic as it may seem.

    The service search engines provide to all of us, in the face of what can only be charitably called a glut of data (some of it actually useful to some of us) is remarkable. (And let's be clear: We're talking about Google, possibly the only thing keeping the net from collapsing under it's own weight in ether.)

    It's great that Google is effectively "free" to us, but should that be so? I'm personally willing to pay a nominal charge -- perhaps in the form of a subscription -- for access to the most effective search algorithms. It has that much value to me. If someone wants to create a free system -- as Google has done -- that's great. But as the web gets bigger and hairer, we might need to think about ways to support (read, fund) continued development of effective search/navigation tools.

    There's some serious linear algebra in publicly disclosed portions of the PageRank(TM) algorithm, and the really good stuff is probably even funkier. This sort of thing isn't going to be hacked together on the weekend by your average penguinista. Could the brains behind Google go diving for herring in richer seas? I suspect so.

  • by teetam ( 584150 ) on Saturday October 05, 2002 @01:36PM (#4393735) Homepage
    Perfect search results are only present in the minds of the searchers. Google is, without doubt, the best search engine around.

    The pagerank algorithm is one of the most important reasons why it is so good in bringing up relevant and popular results. But, this is just one of the ways of searching for good results and will not always work to your satisfaction.

    Google gives preference to the number and quality of links to a particular site rather than the content of the site itself. One can easily come up with cases when this is probably not the best approach.

    For example, consider a portion of the web containing lyrics of songs. If you search by artist name or song name, google will return excellent results, because the pages are probably linked using the names. However, if you only know the soneg from radio, you might want to search for songs containing a few particular lines. The pagerank algorithm might not be the best fit here.

  • by Julian Morrison ( 5575 ) on Saturday October 05, 2002 @02:01PM (#4393824)
    If a site is "convicted" of google-spamming (use the ranking engine to prescreen, and human checking to verify), or of helping to spam, it should be permanently blocked from the results by name and IP.

    Result: pr0n sites will be too terrified of deletion to munge their ratings.
  • I know Pagerank algorithm is patented.(6285999) Will I get into any trouble if I write any open source software using a similar algorithm? Thanks.
  • by prostoalex ( 308614 ) on Saturday October 05, 2002 @03:40PM (#4394173) Homepage Journal
    I e-mailed the problem to Google about a week ago, but so far they didn't seem to get around it. Anyway, a Google search on my last name [google.com] reveals my personal homepage as the result number one, which is no surprise, considering the last name. However, the cached version of what supposedly is my site [] is an entirely different site that I have never heard of. Furthermore, since the results of Google search use the title and description from the cached version, the title for my homepage as well as description come up pointing to RhytmicPalmz.com or something of that nature. It seems to be a cache glitch, at least so far I haven't been able to come up with valid explanation for that.
  • My sig isn't just trying to be funny. I can construct a search query that almost always gives me what I'm looking for in the top link (no, it's not search:"CNN"..ah look, "www.cnn.com", I'm so damn good).

    When I just search and get multiple results, on average one of the page 1 links is faulty (dead/404/not the same page anymore/spam). But this is not new or news, I have run into this for years with Google.

    There is still no better option out there, regardless of a few new quirks in PageRank that might be noticed by those trying to bomb it anyways. Of course, if you taunt the monkeys, they'll scream and complain.

  • by Alea ( 122080 ) on Saturday October 05, 2002 @05:25PM (#4394511)
    I'm wondering if Google has started punishing keyword metatags instead of (mostly?) ignoring them. Glancing around (via Google, naturally ;) for information about metatags and Google, they consensus seems to be that keywords are ignored completely or only contribute very slightly to rank.

    Now, I have written some software for a reasonably obscure research area in computer science. It used to be that when I searched Google for "obscure area" (where that's the name of my obscure area, not the actual words), I would rank somewhere in the 40's. This seemed quite reasonable to me. My software is fairly widely used, but is by no means the number one concern in that research area. It's free, GPL software so I have no monetary stake here.

    Now if I search for "obscure area", I rank somewhere around 140! Worse still, if I search for "obscure area software", I still don't rank in the top 100! There are questions on mailing list archives from people _looking_ for "obscure area software" that rank in the top 10!

    Only a handful of research groups have software in this area, certainly less than 10. The #1 ranked site in that search is actually based on my software, mentions it, and links to my software's site! A few other pages link to my site, including some that are ranked very high in that research area (including the #1 page).

    So here's the kicker. My pages have keyword metatags that honestly provide information about what's on the page. I'm wondering if Google has started treating such keywords as manipulation attempts and is punishing my pages.

    Since I doubt anyone can say for sure, I guess I'll have to remove the tags and see if my ranking rises. The ranking is not, frankly, terribly important to me. People who want such software can ask on a mailing list, but it does raise concerns that some honest attempts to accurately categorize pages could now result in lower ranks. It's now significantly harder to find something that is exactly what it claims to be!

    Then again, maybe the tags have nothing to do with it and ranking is just broken. But it seems awfully strange...

    • Speaking of punishing... I searched for "google bombing", "buy page rank" and "search engine" got Zero hits... I'm sure that's purely unintentional. Incidentaly, I've decide to shell out money to google.

      If you don't get it, just move on.
  • I think Google has only improved in recent weeks. For example, my blog now gets the top hit for Irish Porn Star [google.com]. That's what I call progress.

Kill Ugly Processor Architectures - Karl Lehenbauer