Forgot your password?
typodupeerror
The Internet Editorial

The Real Problem With Alexa 372

Posted by CmdrTaco
from the get-my-irk-on dept.
Alexa drives me nuts. It uses a broken methodology to measure the internet and is, for reasons unclear to anyone, regarded as somehow definitive simply because it allows you to compare two sites with a single simple number. Its sampling methodology is flawed and the numbers it produces are meaningless. And if you want to help me prove this, please install their toolbar. Of course since most of you are Slashdot readers, most of you won't and that only helps prove my point. Read on for what I mean by all of this, and why it matters.

As the defacto 'Guy in Charge' of a reasonably large web site, I am routinely asked questions by a variety of people that lead inevitably to Alexa. It might be a question from my Boss at SourceForge about traffic. Or it might be a sales guy asked by a possible advertiser why some other random website is bigger or smaller than Slashdot. Most often it's a random reporter doing background for a story that has nothing to do with Slashdot. Why I'm considered an expert is very confusing, but why they always regard Alexa rankings as meaningful is even more so.

Here's the problem: Alexa doesn't work because of who will install it, and perhaps more importantly, who won't. Let's start with a place I'm very familiar with: Slashdot readers. Until recently Alexa didn't work on Firefox... instead only IE users participated. On the internet as a whole that's fine: like 80% of users run IE. But on Slashdot only like a quarter of you do.

What about re-installing the plug-in after you update your browser? When Firefox 2.0 came out, almost a third of Slashdot readers upgraded within a few days. You upgrade Minor Firefox releases overnight. Even IE users of Slashdot update relatively fast, from 6 to 7 or even minor revisions. New versions often break old plug-ins. When you get that alert that a plug-in is out of date do you just forget about it? I know I do. And that's not even counting clean OS installs. But if I went to random non-technical friends and family installations, I frequently see versions of software so dated it makes me cringe.

And that's not even talking about the fact that Alexa's toolbar is pretty much spyware. How many Slashdot readers are giddy to install spyware? You either? Big surprise. Because of who we are, and what it is, our population will self select out of consideration.

Did you know Alexa excludes SSL? How many etrade users do you think there are? Now personally I'm glad that they aren't tracking my browsing at my credit card company, but it's just another factor reducing accuracy.

Equally perplexing is the accounting of iframes. Let's look at someone like double click's alexa rating. Now it's hard to say, but I don't think I've ever visited their website. Have you? But according to Alexa, they have nearly a 1% share of the internet. I'd tend not to believe it... but they have iframes on zillions of web pages and counting those sure would account for this huge ranking. What about all those badges for the popular social networking websites? What influence are those iframes having on Alexa rankings? Alexa's FAQ says they don't count, but I'm skeptical.

In Fact, Alexa KNOWS that it is a flawed metric for measuring. Have you ever tried actually looking up alexa on alexa? Unsurprisingly, it is unavailable. Why? Visitors to Alexa.com would be the most likely of any user population on-line to have installed their plug-in. I don't know what their 'Rank' would be, but I bet it clearly would be an apples to oranges comparison against ANY other site on-line.

Of course who do you think actually will go out of their way to install something like this? I have a good guess... if you are obsessed with acronyms like SEO or terms like PageRank you are very likely to care very much about these things. I spend a real percentage of my week dealing with people flooding my systems with garbage content designed to screw with these ratings. And you know they all have the toolbar installed so their zillions of worthless spam websites are being counted.

This problem has parallels elsewhere of course: The Nielsen ratings struggle to account for PVRs. Since you got a TiVo, when was the last time you watched "Live" TV? This is part of why Science Fiction shows struggle on TV... scifi fans are early adopters. So we stopped getting counted and our favorite genres are butchered by networks and lost to the void. PVR users tend to be wealthy (those boxes are expensive) and educated. Now I'm not saying that the dumbing down of TV is exclusively the fault of Tivo, but it sure didn't help that we weren't being counted as excellent "Smart" TV shows get canceled while we keep getting more seasons of Survivor. Who we are and how we live causes us to not be counted, and this has unintended consequences.

So what do we do? I wish I had a good answer to this. My first suggestion would be that if anyone mentions Alexa to you that you freak out and go on a 5-minute rant about how Alexa is stupid and anyone who is using it to seriously make a business decision should be fired. It doesn't actually help, but i estimate that every time I do this, I burn the same number of calories as I might on an elliptical trainer. I assure you the beer gut ain't getting smaller on its own.

Alternatively you could just install the toolbar on every machine you can find and skew the numbers ridiculously towards people that are likely unrepresented. Of course, the conspiracy theorists amongst you will just bitch that I'm trying to fudge Slashdot's own rankings in a system I'm claiming to hate. But that only helps proves my point... the conspiracy theorist is a demographic strongly represented on Slashdot that is unlikely to trust this software. We all ignore a broken status quo "Gold" standard that would fail a 100 level college science class on the grounds of flawed methodology. And this only leads to us not being counted.

This discussion has been archived. No new comments can be posted.

The Real Problem With Alexa

Comments Filter:
  • Alexa's Spiders (Score:3, Interesting)

    by Garridan (597129) on Monday July 23, 2007 @12:10PM (#19957319)
    When I used to administer a website (b2b, you've never heard of it) my boss loved Alexa. I told him time and again to uninstall it, and even did so myself a number of times... but he'd put it back every time. Then, one day, all dynamic content on the main page just vanished. I brought it back from backup, and chocked it up to a bug. Then, it happened again a little while later. I started snooping around our logs.

    Turns out, Alexa's spiders were ignoring the robots.txt file, and capturing usernames and passwords. It logged into the administrative area, and followed the "delete" link for every entry. My dumbass boss still didn't want to uninstall Alexa. Could have strangled the man.
  • by truthsearch (249536) on Monday July 23, 2007 @12:13PM (#19957367) Homepage Journal
    My first suggestion would be that if anyone mentions Alexa to you that you freak out and go on a 5-minute rant about how Alexa is stupid and anyone who is using it to seriously make a business decision should be fired.

    I've been doing this for years. The problem (or actually just what marketers perceive as the problem) is that there is no generic public way to compare web site traffic. The only true way to get traffic metrics is from the web site owners. And they could easily make it up to take in more advertisers. So people in advertising look to Alexa as the only third party source.

    The biggest sites don't have as much of a problem because they can work closely with advertising partners. Medium and small sites, however, don't get as much personal attention. So proving themselves as worthy web space for ads is more difficult.

    The only people I've heard of that install the Alexa toolbar are web site owners because they want to see their rank often. Ironically so few people have the toolbar installed that they drastically boost their own rank.

    We need to convince marketers that Alexa is pointless. But I'm afraid that without a good replace they'll keep using it.
  • by poetmatt (793785) on Monday July 23, 2007 @12:14PM (#19957385) Journal
    I hate to say it, but that really proves not as much that "The only way advertisers can get accurate data as people opt in", as it proves that they have not elected to find new methods to track data properly/independantly. If you were able to develop a way to get honest and accurate data of the number of hits on a site to site basis, would even that be more accurate? (assuming you started to collect an enormous list of sites). Say check all the news aggregator websites language by language (I'm sure there's thousands in each), but rank them by who is getting the most unique hits in a day, etc? Of course a site could skew their own results which creates its own problem but would this not at least be more valuable than alexa data?
  • by trolltalk.com (1108067) on Monday July 23, 2007 @12:18PM (#19957447) Homepage Journal

    "It isn't surprising that people who spend money on advertising want to have some metric by which to predict (estimate, guess, what-have-you) the impact of each dollar spent on web advertising."

    There are several easy ways:

    1. as an advertiser, host the ad on your own server, and just look in your logs ..,
    2. as an advertiser, get access to the server's banner administration system for your ad account (postnuke allows this on a per-advertiser basis)
    3. as an advertiser, just be skeptical as all hell and don't believe 99% of the stuff you hear - its all BS anyway

    If you're so naive as to not insist on hard numbers for actual views (the log files are best , you deserve to get hosed - you can analyse the log files and factor out multiple views per host ip to get the actual number of real views, and reduce fraud; ditto with geolocation of ip addresses to factor out bots in 3rd world countries; ditto for bots that crawl every link on a page; ditto for pages that are loaded then immediately dumped for another page).

    As an advertiser, I'd want unique eyeballs - real human eyeballs - that can be verified.

  • by zarkill (1100367) on Monday July 23, 2007 @12:21PM (#19957501)

    And frankly, if we're not willing to provide the information necessary for advertisers to make informed choices, we're going to continue to be ignored, both on the web and on television.
    This is one reason that I actually like Amazon's recommendation system. I can provide information about what I like and don't like, and the site will then suggest items that I may be interested in based on that. If it suggests something that I'm not interested in, I can click "not interested" and it never presents that item to me again.

    I would LOVE to have a similar scenario for other ad-driven media. Imagine if I could flag TV commercials with "not interested" and then never see that commercial again, or any commercial for a similar product. Once it got a good feel for what I really like and don't like, I probably wouldn't feel the need to skip commercials. The same could be said of web ads. If I could cherry-pick which ads I was interested in and which I wasn't I might not be so inclined to block ALL of them.

    Ads are useful to me sometimes, but picking the signal out of the noise is usually such a hassle that I'd rather just skip the whole process. If everyone could make a very personal statement about what they want to see ads for and what they don't, I think the benefit for both parties would improve.
  • Spyware yup. (Score:5, Interesting)

    by crabpeople (720852) on Monday July 23, 2007 @12:22PM (#19957507) Journal
    Symantec corporate flags the alexa toolbar as spyware, so I couldn't run it if I desired to.

    http://www.symantec.com/security_response/writeup. jsp?docid=2004-062410-3624-99 [symantec.com]

  • by Erskin (1651) * <erskin@@@eldritch...org> on Monday July 23, 2007 @12:24PM (#19957551) Homepage

    This is no different than the Nielsen ratings

    I'd argue it is rather different. TV is one way. Your television browsing habits are slightly less revealing than say, your banking activities or the blog entries you post.

    Also, Alexa claims to give you some value in exchange for letting them piggy back on your browsing. Nielsen is more public and more respected. This helps mitigate the sampling problems.

    Suck it up and find a better metric for your boss.

    If his "boss" (or any of the other scores of people who accost him about the popularity of websites) would let him pick the metric, he wouldn't have this problem.

    The point of the article is that he has to defend someone else's choice of metric.

    Or perhaps, the point is more of an "Ask Slashdot" sort of thing...

    As in, "Hey all you /. geeks, what's a better way to do this?" Taco's comments on the flaws in Alexa's system and Control Group's comments on some of the particular challenges against this demographic in general support that.

    Heck.. it seems like an interesting enough problem to me, but then again, I don't have a sig like yours:

    /.: "Anti-Microsoft Rants, Apple and Google d*ck sucking." Pathetic.

    If you hate it that much, why are you hanging out here?
    (Sorry, I really need to stop feeding the trolls...)

  • by RetroGeek (206522) on Monday July 23, 2007 @12:27PM (#19957611) Homepage

    That is, on the one hand, we're a fantastic demographic to succeed with, but on the other, we're a tough nut to crack.

    And add to this mix that we collectively HATE advertising. So we all use ad blockers, flash blockers, script blockers, image blockers, and anything else we can find which reduces or eliminates advertising which gets in the way of reading the content of a web site.

    So even if we do get "counted" and the advertisers can determine what it is that we browse, the current method of "in your face" ads will quickly push us towards a way of either blocking the ads, or simply not going there any more.

    And I DO click on ads, but only if they are:
    - NOT in the way of the content
    - NOT blinking, flashing, moving
    - NOT trying to distract my eye towards them

    If ANY of the above happen, I am gone from the site, and will NEVER go there again.

    (Hey, this is my 1,000th post. Woo Hoo!)
  • Pfft, screw that. (Score:3, Interesting)

    by oGMo (379) on Monday July 23, 2007 @12:34PM (#19957741)
    If digg is "beating" slashdot, let it win. Maybe the YouTube popularity blog can suck away the idiots from slashdot.
  • by Mandrake (3939) <mandrake@mandrake.net> on Monday July 23, 2007 @12:40PM (#19957845) Homepage Journal
    On larger sites, doing things like collecting / reading web site logs (like your apache log files) is completely unrealistic. We don't even have them turned on here anymore, because they generate so much disk i/o and flood so much disk space (each of our web heads when we last had logging enabled over a year ago produced over 8 gb of apache logs every day - multiply that times 30 and that's a hell of a log parse every single day...) - so we tend to gauge traffic more in megabits per second than anything else.

    I am not saying that Alexa is good for looking at traffic trends either - their numbers vary WILDLY from what our actuals are. Oddly enough, Hitwise does a much better job, but I suspect that is a lot of blind luck on their part as I think they take data in a similar fashion.

    I'm not sure I had a point, except that web logs aren't really feasible when your traffic crosses a threshold - I'm sure /. has similar logging problems.
  • Re:Alexa's Spiders (Score:3, Interesting)

    by _xeno_ (155264) on Monday July 23, 2007 @12:42PM (#19957867) Homepage Journal

    The HTTP spec clearly says that GET requests should only be used for idempotent [wikipedia.org] actions. Technically, deleting an entry is an idempotent action, so using a GET link for a delete entry is - well, brain-dead stupid. But it doesn't break the spec.

    See, an idempotent action is simply an action which has the same outcome the second time you attempt it. Deleting an entry twice doesn't change the final state of the system - the entry is still deleted. That makes it idempotent.

    Of course, anyone with an ounce of sense would realize that what they really meant was that GET requests shouldn't change state and that POST requests should be used to change a system's state. (Or PUT, or DELETE. But no one ever uses those.) Which was the point of the parent poster in any case.

    But before someone pulls out the "GET is supposed to be idempotent" part of the HTTP spec, remember that deletes are, technically, idempotent. They're safe to attempt multiple times, and leave the system in the same state afterwards.

  • Alexa ratings (Score:3, Interesting)

    by evildogeye (106313) on Monday July 23, 2007 @01:02PM (#19958111) Homepage
    I have gotten numerous sites into the top 75k of Alexa ratings by simply installing the toolbar on a couple of machines and regularly browsing through the entire site. On the other hand, I have sites that receive 3000 unique hits a day ranked around 300,000 on Alexa. That being said, I still use Alexa all the time to figure out which sites are well trafficked, and I imagineit is far more accurate than the author is giving it credit for. If you eliminate obvious exceptions (sites that cater to SEO folk and sites that cater to certain audiences such as Linux users) I think you will find that Alexa makes for a useful although not 100% accurate tool.
  • Re:Rant as news (Score:3, Interesting)

    by IceCreamGuy (904648) on Monday July 23, 2007 @01:31PM (#19958515) Homepage
    I dislike pointless rants as much as the next person, but I feel like you can at least give a little credence to posts like this. I'd imagine it's extremely frustrating dealing with this type of thing; the general reader doesn't really have any idea that this is a problem (I've been reading /. on an hourly basis for the past 4 years, but it' always possible that I just never paid enough attention to hear about this), but apparently it's something that he has to deal with on a regular basis. If he's making a post on the main page, it's obviously something that he feels is a serious issue and he's looking to the community for support and feedback. Rag on Slashdot all you want, but if you're posting on this then you obviously read it a good deal and hopefully get useful/interesting information from it. Why can't a founder of such an excellent (in my opinion) site complain and ask for feedback on an issue that's obviously important and causes serious problems in a way that most likely the users and the admins never anticipated or know how to deal with? Maybe if we all were running 640x480 or browsing with Lynx we could legitimately complain about the post taking up important news space, but in the majority we're not, and in addition Slashdot caters to such a wide variety of readers that you're never going to be interested in every single news post on the front page, even with the customizations. (if you are then I want your job).
    On that note, I don't actually have anything to say about the topic at hand, but then again, neither did the parent.
  • by Mandrake (3939) <mandrake@mandrake.net> on Monday July 23, 2007 @01:45PM (#19958715) Homepage Journal
    I might take this up with the next generation of a system we're working on here potentially, if you guys don't have a problem - we have a workaround system going live shortly that does a certain amount of logging via syslog to dedicated syslog hosts (god bless syslog-ng) but we don't look at every pageview in order to lessen the load, we look at and log specific events (ones ripe for abuse - payments, signups, email, etc).

    -Mandrake
  • by Anonymous Coward on Monday July 23, 2007 @01:54PM (#19958833)
    I contend that the ratio (site visits)/(google searches) is far closer to constant than the ratio (site visits)/(alexa stats).
  • by LWATCDR (28044) on Monday July 23, 2007 @02:14PM (#19959107) Homepage Journal
    The church of RMS probably does have something to with it. Really is a shame because a lot of people on Slashdot buy a lot of software and hardware. I think part of Slashdot's problem comes from using Doubleclick to serve adds. What Slashdot user doesn't have *.doubleclick.net* in their ad blocker?
  • by metlin (258108) on Monday July 23, 2007 @02:25PM (#19959267) Journal
    Oh sure, and YouTube is beating Digg, but that doesn't mean that we'll all move over to YouTube.

    No, like another poster said, it is quality over quantity.

    If you think some of the arguments on Slashdot are asinine, wait until you read the ridiculous ones on Digg. And give everyone the power to moderate and you have people burying others' comments because they disagree with them.

    Add bad grammar, spellings and l33t speak and you have a ridiculous combination of utter rubbish that only a bunch of emo sixteen year-olds can spew forth. Give me Slashdot any day.

    At least some you trolls have character. ;-)
  • Re:Alexa's Spiders (Score:1, Interesting)

    by Anonymous Coward on Monday July 23, 2007 @02:44PM (#19959595)
    Actually, there's technically another HTTP method (DELETE) that's specifically for the deletion. However that's a big nitpick since DELETE isn't even implemented by many HTTP servers.

    Still, many apps do use GET requests to delete things when the desired UI is to have a delete link (i.e. text rather than a button). This is somewhat preferable to having a hidden form that gets submitted (or triggering an XMLHttpRequest) since it doesn't require the user being browsing with JavaScript enabled. This practice is becoming less acceptable since you can pretty much style an HTML button to look like a link in almost any browser.
  • by jadm (720143) on Monday July 23, 2007 @03:48PM (#19960479)
    Sparky. It's called Sparky. http://www.alexa.com/site/download [alexa.com]
  • by ger (3028) on Monday July 23, 2007 @09:19PM (#19964233) Homepage

    At W3C [w3.org] we log almost everything as well, and we end up with way too much data as a result.

    But we use the logs to detect and prevent certain classes of abuse as well (e.g. too many requests in a short time interval [w3.org] or re-requesting the same resources over and over [w3.org]), and we also want to be able to track trends over time, so we have been reluctant to just throw that data away.

    I have a plan that I have yet to implement, which is to log only 0.001% of the requests for certain very popular resources (e.g. HTML DTDs and valid-HTML icons), which would allow us to monitor trends without logging tens of gigs of data per day; we'd just need to compensate for it when calculating stats later.

    Then I planned to monitor for abuse by also logging every request to a script that watches for abusive traffic patterns, an easy adaptation from the current script that wakes up and skims the logs every 10 mins.

    (in your journal entry, when you say you are MD5ing IP addresses for privacy reasons, are you adding a random bit of data to the IP address before calcuating the MD5? If not it's pretty easy to find out which IP address corresponds to a given MD5 sum.)

One picture is worth 128K words.

Working...