Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

[ Create a new account ]

Traversing the "Googlearchy"

Posted by CmdrTaco on Thu Aug 17, 2006 05:36 PM
from the he-just-made-that-word-up dept.
baloney farmer writes "How much do search engines influence the availability of information online? A new study gives some surprising results. Search engines help with popularity, but not as much as you'd think: 'Traffic increased far less than would be expected if search engines were enhancing popularity. It actually increased less than would be predicted if traffic were directly proportional to inbound links. In the end, it appears that each inbound link only increases traffic by a factor of 0.8. The results suggest that the reliance of web users on search engines is actually suppressing the impact of popularity.'"
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • Self-reenforcing cycle? (Score:3, Insightful)

    by lkypnk (978898) on Thursday August 17 2006, @05:40PM (#15930861)
    I've got to say no to this. Yes, when you search for something, you get the most popular results. But not everyone uses the same search terms, and even if you only go for the first three pages of results, you've still got 20 - 30 different sources of information, each different but similar query returning a slightly different set.
    • Re:Self-reenforcing cycle? (Score:5, Insightful)

      I've got to say no to this.

      And your evidence and study showing the researchers are wrong is... what?
       
       
      Yes, when you search for something, you get the most popular results. But not everyone uses the same search terms,

      Actually, if you've ever watched those 'live search' services (I.E. showing in realtime search terms users are entering), you'll see the same terms pop up again and again. Equally, for most search items - there simply are not that many (properly spelled) variants. (I.E. for the Seattle Mariners - there's pretty much only one way to type that.)
       
       
      and even if you only go for the first three pages of results, you've still got 20 - 30 different sources of information, each different but similar query returning a slightly different set.
      Many studies have found that the first page is what it's all about - what's on page 4 might as well not even exist. (There's a reason why SEO's exist you know.)
       
      In essence - your claim that the researchers in TFA are wrong is based on smoke and mirrors.
      [ Parent ]
    • Re:Self-reenforcing cycle? (Score:5, Informative)

      by 70Bang (805280) on Thursday August 17 2006, @07:42PM (#15931534)

      It has nothing to do with what the search engines do or provide per se. Search engines aren't always needed to a certain extent any more, particularly when it comes to popular sites, specific uris, etc. The reason (IMO)?

      Word of "mouth". Actually, email messages[1] are sending names of services or specific uris for a particular site (e.g., something particularly funny on youtube) and people are pointing their browser in that direction, then exploring what else is there. If there are uris to other locations on the web, people follow those. One of the local affiliates in Indy played a considerably portion of this [wthr.com] last night and made sure everyone knew there was a link on their web site. Lots of people likely pointed their browsers and youtube had a lot of extra traffic[2]. On the youtube page is Explore other videos. Lots of information conveyed, but no search engine activity in the process.
      The web has enough toys^w services which people regularly visit (e.g., blogs, youtube) they don't necessarily need search engines unless somethings isn't found via the normal means. And normal now includes the various discussion forums where people provide the advice from the voice of context. IMDb.com has a professional side (reasonably priced paid service) where people who are in the biz can post things they're looking for or are available for. A couple of nights ago, someone was asking about the best software for scriptwriting on a small budget. ca. eight people chimed in with what they knew about different packages, including a couple of free ones, a commercial one for $25, a template which can be downloaded for MS Word, and some of the pros & cons about the ones they'd used. Where will you find ad hoc information in that context on demand in a search engine?
      __________________________________

      [1] Unless you're in the media and use "emails" as a noun.

      [2] Several years ago, I had a client who helped small to medium newspapers get online. Someone build a web site for them (taking six months, #include files nested six deep, every call to the server required 20'000 lines of code to be processed, regardless of the function involved. Once more than twenty people hit a site, the server showed you its impression of the La Brea tar pits. One site for a reasonably small city, perhaps a handful of a thousand people had a sheriff's deputy arrested for pedophilia, a ten-car pileup on the nearby interstate, and the largest employer (a substantial percentage of the citizenry) was going to be dismissed. All of this hit CNN with a reference to their newspaper's web site. That's about the time Chrnobyl and Three Mile Island happened at the same time. Fortunately smarter people are starting to anticipate resource issues a little better than they used to.
      [ Parent ]
    • Re:Self-reenforcing cycle? by cheese-cube (Score:1) Thursday August 17 2006, @07:45PM
    • Re:Self-reenforcing cycle? by fyngyrz (Score:3) Friday August 18 2006, @10:23AM
    • 1 reply beneath your current threshold.
  • What? (Score:1)

    by MyLongNickName (822545) on Thursday August 17 2006, @05:40PM (#15930862)
    (Last Journal: Saturday October 14 2006, @08:12AM)
    Only 0.8? Roland will have to post an additional 25% more "stories" to get his blog rank up.
    • 1 reply beneath your current threshold.
  • direct (Score:4, Insightful)

    by TheSHAD0W (258774) on Thursday August 17 2006, @05:40PM (#15930863)
    (http://www.shambala.net)
    It means people are finding what they're looking for more directly, rather than having to gad around. This is a good thing.
  • i can see that (Score:5, Interesting)

    by User 956 (568564) on Thursday August 17 2006, @05:45PM (#15930892)
    (http://www.atomjax.com/)
    In the end, it appears that each inbound link only increases traffic by a factor of 0.8. The results suggest that the reliance of web users on search engines is actually suppressing the impact of popularity.'

    I can agree with that. I've seen users type "yahoo.com" into the search bar in firefox... which goes to the google search results page, where they then click on the "Yahoo!" link. It's almost as if users are conditioned to use "search" as their first action, regardless of whether they can remember the domain or not.
    • Re:i can see that by fossa (Score:3) Thursday August 17 2006, @05:52PM
    • Re:i can see that (Score:4, Interesting)

      by XorNand (517466) * on Thursday August 17 2006, @06:27PM (#15931144)
      What you describe is actually *very* common for novice 'net users. In fact, I might say that more of them do it than don't. Check out AOL's recently released search data [dontdelete.com]. Just randomly check out various users' search histories. It would be interesting to see how this correlates to the frequency of Google users doing the same thing.
      [ Parent ]
    • Re:i can see that by doobystew (Score:1) Friday August 18 2006, @09:03AM
  • A factor of 0.8 decreases traffic (Score:3, Insightful)

    by Anonymous Coward on Thursday August 17 2006, @06:05PM (#15931018)
    A factor of 0.8 means that the traffic is decreased by each inbound link. Weird.
    • Re:A factor of 0.8 decreases traffic (Score:4, Informative)

      by sidney (95068) on Thursday August 17 2006, @06:31PM (#15931173)
      (http://www.sidney.com/)
      TFA says that it is a linear relationship with a slope of 0.8. They scaled the data so that a direct linear relationship would plot as a straight line with a slope of 1, which is a line going up at a 45 degree angle, hits increasing one unit for every one unit increase in incoming links.

      Instead they saw a straight line with a slope of 0.8, meaning the hits increase 0.8 units for every 1 unit increase in incoming links. More links still correlate with more traffic, but, for example, doubling the number of incoming links increases the traffic by a factor of 1.6, not by a factor of 2.
      [ Parent ]
    • Re:A factor of 0.8 decreases traffic by bluebox_rob (Score:1) Friday August 18 2006, @06:37AM
    • 2 replies beneath your current threshold.
  • How are you defining popularity? (Score:5, Insightful)

    by crazyjeremy (857410) * on Thursday August 17 2006, @06:09PM (#15931037)
    (http://users.mtrx.net/funnypics | Last Journal: Monday September 25 2006, @11:29AM)
    Maybe a site's popularity isn't defined by the number of inbound links because no matter how many links to your site you have, people still only want to look at things they are interested in. So by defining web popularity not by links, but as "Some internet item people want to find" that means that the more links to an individual site simply lets interested people find that site easier. It would only change the popularity if it's forced on you (like ads) and you become interested by a curious side thought... The more links to a site you have, the more likely interested people will find it.
  • by pla (258480) on Thursday August 17 2006, @06:14PM (#15931077)
    (Last Journal: Monday April 03 2006, @07:23PM)
    The results suggest that the reliance of web users on search engines is actually suppressing the impact of popularity.

    When I first read this summary, I thought, "WTF?". So I read the article. And re-read the summary. And re-read the article. And I think I finally "get" it.

    Let's say you run a "popular" site like the BBC news. You get a hell of a lot of traffic, and people tend to go directly to your site rather than via a link. Alternately, you get a lot of links that only a small percent of people seeing them follow.

    Now compare that with an unknown site (most personal or academic webpages, for example). They get very few visitors, but most of them come from search engines.

    So what does this tell us?

    Almost nothing we didn't already know - Search engines DO indeed negate the impact of popularity, because popularity has little to do with relevance, while search engines generally try to maximize relevance.

    This I consider a "good" thing. When searching for info on ripping a DVD using the latest copy protection scheme, I don't care if the latest pop idol calls ripping "totally not cool". I want methods, programs, and real life examples that might only have gotten a few dozen hits ever.
  • Stating the obvious as if its not (Score:1, Interesting)

    by davros-too (987732) on Thursday August 17 2006, @06:29PM (#15931162)
    (http://www.takeabreak.com.au/)
    Sites with more links have more visitors (as defined by Alexa ranking, a rough tool at best) - big surprise , NOT. Everyone knows that sites with more inbound links tend to rank higher on the search engines and therefore get more visitors.

    TFA then tries to make a big thing out of their 'discovery' that links are not the _only_ factor in the popularity (however defined) of a website. Again, completely obvious.

    Then we hear that the correlation (not defined clearly) between links and 'traffic' (presumably actually some Alexa rank) is 0.8. Not clear what this actually means, but its hardly surprising the relationship between links and traffic isn't 1:1. Many factors will be causing this. For example, site-wide links off large sites make a huge contribution to the number of links but will make a smaller contribution to the target site's search engine ranking than the same number of links each from an individual site.
  • Slashdot link? (Score:3, Insightful)

    by complete loony (663508) <Jeremy@Lakeman.gmail@com> on Thursday August 17 2006, @06:39PM (#15931220)
    I'm guessing that link up there in the summary had WAY more effect on their servers...
  • by houghi (78078) on Thursday August 17 2006, @06:45PM (#15931250)
    (http://www.houghi.org/)
    What do they mean by 'increse with a factor 0.8'?
    If my startingpoint is 1 and I put a link on it, does it mean that I now have a 1.8 or a 0.8 or what?

    Or do they mean 0.8%? So if I start with 100, I now have 100.8 per incoming link?
    Are they cumalitive? e.g. is is the second link (if it is %) over the 100.8, or over 100?

    Also it looks like captain obvious. Pages that have more links to them are more popular. Also that people who have intersts in certain pages will only go to those certain pages.

    Now if only a searchengine company would realize that, they could use this data to get some advertisements on both their site and on the site they link to.

    Oh, wait, they reversed-emgineerd Googles business plan.
  • Etymology Nazi (Score:2)

    by a whoabot (706122) on Thursday August 17 2006, @07:33PM (#15931483)
    The Greek would definitely have a contracted eta for just "Googlarchy."
  • Funky math (Score:1)

    by pablodiazgutierrez (756813) on Thursday August 17 2006, @08:11PM (#15931697)
    (http://www.ics.uci.edu/~pablo)
    In the end, it appears that each inbound link only increases traffic by a factor of 0.8.

    What does this mean? Without any other reference, I would assume that each link takes 1 unit of traffic (ut) to (1 + 0.8)ut. If so, n links would take your traffic to 1.8^n ut, which is unbelievable. What's missing here?

  • by smartdreamer (666870) on Thursday August 17 2006, @08:13PM (#15931704)
    I hear somebody laugh at Google: "haha those ranking noobes did not understand anything."
  • # of inbound links == page rank?? (Score:1, Informative)

    by Anonymous Coward on Thursday August 17 2006, @09:36PM (#15932026)
    Yes, the core of Google's ranking algorithms is based on incoming links, but it is far from something as simple as just counting the number of links. The _quality_ of the links is way more important. In addition, there are many signals Google takes into account beyond just pure PageRank (if this wasn't true, almost anybody could build Google). Yet, TFA uses and interchanges "# of inbound links" and "search engine score" as if they meant the same thing.

    If they really are using # of links as an approximation to search engine score, then they're flawed from the beginning. If they aren't, then somebody isn't very good at conveying information.
  • It reminds me of the quote (not sure the origin): People who like this kind of thing will find that this is the kind of thing that they like.

    You think it's bad now, imagine when Google has an AI model of what you want to find such that it tailors the search results for you alone.

    Some years back, in the early 90's, I think, when there was little or no web and when advertising was done in physmail, I started to receive lots of mail about object-oriented stuff and little about other kinds of programming. "Ah, we're winning," I concluded foolishly. Later, I realized I was just pigeon-holed in a special Hell where I would never again learn about what others were doing because someone thought they had learned what I "liked".

    It amazes and saddens me that a whole industry grew up around "personalized interfaces" which does not include as part of its regular practice: "ask the user what he likes". Amazon's court of last resort is to allow me to "correct" it assumptions about me by deleting records of specific purchases that are confusing its belief that I like certain things.... all substituting for an interface that just says "do you like X?" and lets me say "yes/no". And there's even some research saying they know better than I do what I want. Bleah. Personal indeed.

    I'll be interested to see if this result holds up. It seems just as grim as the "personal interfaces" result. But sad or not, it does seem believable...

  • Not to obnoxiously plug, but lylix.net [lylix.net], a Linux/Asterisk VPS host that I consult for, has gone from a single-man show with few customers to nearly overflowing with incoming business as a result of an aggressive "white hat" SEO campaign - mostly just putting up good content on the site in a format that search engines like (and probably also the thousands of links from slashdot from my sig/homepage).

    These results surprised me very much - I've gotten over a thousand hits on lylix.net as a result of my postings in the last month and a half, but this is easily dwarfed by lylix's position as the 3rd hit for 'asterisk VPS', first for 'linux asterisk vps', and being 4th-5th page for just "VPS".

    For those who can put up quality content and carve out a decent search rank, Google is a veritable gold mine. Yes, it's possible that looking at the internet through Google's lens gives a skewed perspective, but it's still the best way to find most things. Word-of-mouth is find for big sites, or niche sites known by your friends, but I can honestly say I do not find most things online that way.
  • by martin-boundary (547041) on Friday August 18 2006, @12:18AM (#15932535)
    It's hard to tell how interesting these results are from second-hand information (the original paper isn't available freely, you have to pay for it), but the writeups aren't particularly surprising. So this should be taken more as a criticism of the writeup than the (unknown) paper.

    1) The biasing effect is not hard to calculate _exactly_, for example it's done implicitly in this old paper [lbreyer.com], see p.6 the paragraph after eq.10. Of course, it's well known that Google hasn't used PageRank exclusively for years.

    2) PageRank's formula is well known, and doesn't just count the number of inlinks, but uses a "boredom" probability of about 0.2 (as explained in Page and Brin's original papers at Stanford, I think they used 0.15). To be precise, PR is the weighted average of 0.2 times a uniformly random measure and 0.8 times a matrix based on the number of inlinks. See a pattern? It's not surprising that the inlinks should only account for about 0.8.

    3) Judging from a couple of older papers available online by the researchers, they've spent some effort to work out an approximation to PageRank using inlinks. The idea being that inlinks is easier to estimate than PR or whatever modified PR Google uses these days. Now they're looking at the inlinks empirically, and they're finding a factor of 0.8 associated with Google. Well, duh! That would be circular reasoning.

    4) If the data they're using is recent and sufficiently significant, it might suggest that Google's secret PR algorithm is only a second order modification of the original PR, ie that even though the real PR is secret, it can be well approximated by the original Stanford PR. That in turn is both exciting and troubling.

  • This is pretty obvious.

    If links were the only way to find new web content then the number (and popularity of linking sites) would totally determine a websites popularity (modulo a bit of advertising).

    Now if you believe that at least occasionally people find sites through search engines that weren't linked to from any of the sites they normally visit the search engine reduces the impact of popularity. All you need is one example of someone searching for "f22 raptor cost overruns" who doesn't browse milatary/political websites and the search engines have reduced the impact of popularity.

    I always thought the criticism of google was that their choice in search algorithm did less to reduce the influence of popularity than it could. I don't find this a compelling criticism, wisdom of the masses and everything, but it is at least a cogent point.
  • 0.8 (Score:2)

    I'm astounded that they think the correlation should be 1:1. Using some arbitrary figures:

    If you have a large web page with 4 million inwards links, and you put the link in a million more places, you're 25% more visible - but part of the 25% that can now see the link in the new places will have known about the site before, and those people then don't add to the figure even though they've been targetted by the new advertising.

    If you have a small specialised web page with only 40 incoming links, you're only being found by people who have criteria that fit your particular company; assuming here that it's not just from being a web fledgeling, you've only got a small userbase inside the specialisation who will come to your web page, and chances are they'll probably know about it. If you add 10 more links, then sure you'll get more people - but the people who are your target audience are likely to know about the site anyway, whether via magazine/word of mouth/forum discussion.

    Unless your company is special, and is in the startup phase of getting to the relevant people - where the target audience hasn't found out about the site yet, and adding 25% to the links, by being in the right place, reaches that audience. You might get a return of greater than 1 if you do it in the right way there; where you were previously known by only a fraction of the target audience and can via google adwords or whatever suddenly reach a far far far more reaching audience, you'll get good improvement on your visitor numbers.

    A major assumption in the whole thing is that each company assessed considers the entire markettable world as a potential customer base. By targetting 25% more people you'll get 25% more interest? Even if we assume that the extra people don't know about the site already, that'll only work if your product is interesting to 100% more people, which in the world of the web seems fairly unlikely.
  • ... the article for you:

    The desirability of a website is not given by how search engines rank it but by it's actual content.

    Well ... yeah!

  • This morning (Score:1)

    by Jarth (666336) on Friday August 18 2006, @02:18AM (#15932826)
    (http://slashdot.org/~Jarth | Last Journal: Saturday January 06 2007, @04:47AM)
    I had a 'vision' of an article discussing google on slashdot and hah! behooold! Maybe all these digits are getting to me after all. Eeery ... Anyway, The equation that came to mind was a bit like follows.

    A scientist who makes an unusual discovery is alsmost certainly to get critics all over him. Yet, in time his discovery will be recognised as the result of an intellectual effort, an achievement. This scientist will become known as 'a smart person'.

    Discarding the percentile of scientists who succeed at setting such a milestone and looking at people with scientific capacities (for the sake of argument, 5% of the googlers) one can only argue the search results in google will only become more irrelevant to the intellectual part of our society. So the results of google will become increasingly insignificant to the more educated part of the population, maybe even plain scholars.

    This is of course not true for most specialisms and so on but even now sometime results are quite insignificant.

    The signs are allready here.

  • by 140Mandak262Jamuna (970587) on Friday August 18 2006, @07:47AM (#15933813)
    (Last Journal: Wednesday October 31, @08:33AM)
    Expecting the traffic to a site to increase in direct proportion to the number of inbound links is completely stupid. Let us say, for example, my site gets one inbound link from google.com main page. I will get some traffic. Then I get another inbound link from the home page of IIT-M Alumni Association of Allegheny Valley. Now with two links to my page, you think I should get twice the traffic? How stupid is that? All sites dont have equal traffic. Unless you weight the inbound links with popularity of originating sites, it is a meaningless exercise.

    Further the search engines themselves allot page rank by the number of inbound links and the keywords found in the "a" tag of originating pages. So more inbound links will raise your page rank, get you ahead in the search listings and get you more traffic. But the traffic will be counted as a "search engine" generated traffic not as traffic originating from a referring site. With this much of interdependance between page rank and the number of inbound links how did the study control for it?

    The number of inbound links is already reflected in the search engine generated traffic, or to use Wall Street parlance, it is fully discounted. There is nothing to see here. Move On.

  • First I'll admit I'm a little confused by the article. Are they measuring a page's popularity in a search engine by its number of inbound links? So they're saying that as the number of inbound links increases (i.e., in their opinion, the site's ranking in the search engine), the number of page visits increases? Maybe I'm missing something, but if that's the case this research raises an eyebrow here at least. If they have page ranking data from Yahoo, why not use that instead of inbound links? Or maybe by "page ranking" they mean "number of inbound links".

    I guess people need to study something, and sometimes one will come up with surprising results. But this study reminds me of the "Long Tail" discussion that was all the rage for a couple weeks. "Wow, with the internet we can find niche information!" Who didn't know that? So now some information pundit (don't remember his name) gets to make a bunch of money for putting a name on a self-evident truth so that business types can sit around and discuss it like it's something revolutionary.

    If people didn't use the internet to find things, then why would Google be worth billions of dollars? If people didn't use the internet to find things, then why would companies be paying Google huge sums of money for page rankings? Those who track ROI will usually tell you that it makes them money (and if not they'd stop doing it, if they were tracking ROI).

    I don't follow professional sports, but a lot of people do. So a lot of people are going to search for "Toronto Blue Jays". Good for them. There are ~6 billion people in the world, each with some common interests, and each with some less common interests. If you're making a web site to sell iPods then likely you'll be lost in the crowd and have a hard time gaining traction. If you're selling refurbished vintage Massey Ferguson tractors with patent leather seats and Corvette LS1 engines, then likely you'll end up #1 on Google pretty quickly. Do you want 0.0000000001% of a billion dollar market or 100% of a $0 market? Your choice.
  • Re:Factor (Score:2)

    by Shaper_pmp (825142) on Friday August 18 2006, @05:05AM (#15933242)
    increases traffic by a factor of 0.8


    Maybe everyone's english education causes them to develop language-parsing filters instead? ;-p

    It's technically phrased incorrectly, but the meaning is still clear from what they wrote. Your interpretation would mean traffic actually decreased, which is flatly contradicted by the statement.

    TBH, the real problem I have is the idea that every additional inbound link could increase traffic by a constant factor. Isn't it saying that if I've got 100 inbound links and 100 users/day, getting one more link would get me an extra 80 users/day?

    I think they meant that "the increase in traffic from each link was only 80% of what they expected from a linear relationship", not that "each inbound link increases traffic by 80%".
    [ Parent ]
    • Re:Factor by hauntingthunder (Score:1) Friday August 18 2006, @08:00AM
  • 8 replies beneath your current threshold.