
On Counting Website Traffic 145
Logic Bomb writes: "The San Francisco Chronicle has an interesting article about measuring website traffic. This is kind of an obnoxious issue, but it means everything to commercial websites seeking investors. Apparently the figures reported by the sites themselves through analysis of server logs are often much higher than the ones given by firms like Media Metrix (whose numbers I see all the time in articles from Cnet and the like). The basic dispute is over whether sampling, a la Nielsen, is appropriate for the web. It seems counterproductive to purposely use an innacurate statistical measure when exact counts are readily available, but I can't imagine many things easier to fake than a server log. Anyone have a good idea about how to approach this?"
webmetrics (Score:2)
Revenue Not Hits (Score:1)
Black boxes ? (Score:1)
Of course some sanity could be put in the data if the usage bandwidth from the site's upstream provider is taken into account (5 million hits and you transferred only 2 megs ? hmmmm) pity that obviously this data is usually not publicized or available to third parties for obvious reasons. (so you pay 5$/gig upstream and you charge me 10$/gig if I go over my quota eh ?)
I am sure, though, that through the wonders of reverse engineering, IP spoofing etc. it could be possible to foil even black boxes, I mean, how much would it take to just take a machine in the office, connect it to the black box and send http requests crafted so that they appear to be from other IPs ? If the site owner has physical access to the site's hardware this would be really easy to pull off.
It would be more complicated if one had just a colocated box or a virtual host, but even then with some 3l33t h4x0r skills one could make the black box's head spin in whatever direction one wants...
The web, though, is very interesting in this respect, because unlike in the TV case (where you have to count the viewers at the viewers' location) it is theoretically feasible to do a precise, repeatable and cost effective analisys by monitoring only the point of origin, which means that the money that would go into finding an acceptable demographic group, providing them with set top boxes that analyze their habits etc. one could invest all of this in creating one single monitoring device installed at the site's location.
On a related note, I have a digital cable set top box, and I am sure that behind my back the cable company is collecting my viewing habits (I mean, how hard could that be ? the digital set top box already connects to their network to download programming information and it has a unique ID) anybody knows more about this ? My cable company is pushing heavily the digital set tops even towards people interested only in basic cable, and I don't think they are doing it for charity...
Re:Why bother? (Score:1)
Re:even radio... (Score:2)
The police radios, however, are always on.
Re:Web server statistics are NOT for marketing! (Score:2)
It's precisely the same problem with democracy (Score:2)
Advertisers don't just want to know what the most visible piece of real estate is in the world so they can erect a billboard on it. They want to know what the next upcoming innovation is so they can be the first to ride the upsurging wave of popularity. It doesn't help that altavista [altavista.com] is the most popular search engine in the world today if placing a big banner ad on google [google.com] tomorrow will catch the as-yet unseen mobs.
Take Netcraft [netcraft.com] and server operating systems. You don't just want to know what people are actually running. You want to know what they dare to tell you they're running. This is why it's ok for Netcraft to base its statistics on what servers tell each other they're running, rather than on some complicated fingerprint of their tcp/ip stacks.
It comes down to this: Adam Smith had it wrong with his theory of the invisible hand of market forces. It's not just what the markets do that's interesting; for that tells you nothing more than what, imperically, they do. If you pretend otherwise, then you're behaving no differently from all the Linux bandwagoners or Microsoft bandwagoners who base their decisions only on the herd. Herd mentalities are antithetical to proper advertising, and advertisers are finally waking up to this fact.
Cheers,
Froid
Richard Fromage (Score:1)
Coupled with the common short version of "Richard" that is pretty funny.
My personal favorite fake name which is on my Fake ID (I'm over 21, I have it just in case) is Justin Case
---CONFLICT!!---
Re:Perhaps a less annoying alexa? (Score:1)
One example is Raydium [raydium.com]
I'm sure there are others as well.
-Rusty
measuring web traffic (Score:1)
Re:Fraud (Score:1)
No? Really? [shakes his head in wonder] Those are strange times we live in...
The punishment is that you have to give all your money to a lawyer.
I was under the impression that having to give all your money to a lawyer was the punishment for needing (or thinking you need) a lawyer.
Kaa
Re:Why bother? (Score:2)
steven
Re:Dealing With It Now (Score:1)
A use for DoubleClick? (Score:1)
'Course, DoubleClick can be fooled by having cookies disabled, a JunkBuster proxy [junkbusters.com], or whatever, but I'd imagine at this point only a tiny percentage of users are sufficiently clued to use JunkBuster or Cookie Pal [kburra.com]. Certainly too few to make the count less accurate than sampling.
Re:Why bother? (Score:1)
Unlike a television ad, in which an advertiser pays a large amount for one ad, banner ads are charged per impression. So whether a banner is served up 4 million times or 2 million times, the advertiser is charged the same per impression (well, assuming it's on the same site, and assuming no volume discount for the additional 2 million impressions, but you get my point).
Fraud (Score:3)
Besides, banner ads are typically served from a server NOT controlled by the company which own the page. So people like DoubleClick know for sure how many times their ad was ignor^H^H^H^H^Hseen.
Kaa
Honesty (Score:2)
If a company truely wanted to, they could easily obtain numerous IPs to forge the logs ahead. And think about a script kiddie exploiting java, perl, or whatever-- that would certainly make a website's statistics look better. The list goes on of ways to increase a website's usage.
I think the only way to get this done fairly is to post a raw log, and let the investors (or whoever the target is) decide for themselves. Apache logfiles are fairly straightforward, and require little to no effort on deciding what is an actual hit and what is not. Of course, this would require honesty on part of the company, which seems to be the real issue.
Perhaps a less annoying alexa? (Score:1)
Who Cares?? - Advertisers, investors (Score:1)
Faking stats (Score:3)
Are you kidding? When I worked at my last internship the boss would take the server stats from WebTrends, plop it in a Word file (to look good for investors) and then sometimes "moderately improve" some of the stats before printing the document.
Fact is, most investors don't get a verbatim server log with all the technical "mumbo-jumbo". They get a simplified version with only the information the CEO wants them to hear.
Re:Why bother? (Score:1)
Yes, but you can only look at your own logs to see that Yahoo sends you 10,000 people a day after paying them substantial money for what they said should average 20,000 people a day, based on their (or Media Mextrix's) logs.
Re:Questions on making your own stats (Score:1)
We've hacked out a packet sniffer that runs on the network picking up good data (ie, where is the hit coming from) and then between that and our dns load balancing software we get a really good monitoring package to find out exactly how busy our servers are. And BTW, our servers serve approximately 2000-6000 hits a second. yes, a second. Yeah.. it is a lot. And it all runs apache.
[phpwebhosting.com]
nerdfarm.org
Who do you trust? (Score:1)
It's a matter of trust
Sponsors and advertisers simply need to rely on a consistent, reliable, and _reputable_ company to provide the numbers by which they purchase advertising. It is the same reason that makes large corporations with huge accounting departments hire Ernst & Young to run their 10-Ks before posting them to the SEC: reputation and consistency.
Everyone, including the site managers, advertisers, and Metrix, know that the numbers can be faked anywhere along the chain. The most honest option, and least culpable in terms of liability, is to have a measuring company run its analysis for all, even if that analysis is statistical predictions. If a site manager fakes numbers, he's a little untrustworthy; If Metrix screws up, they're outta business.
Re:Most Downloaded Woman (Score:2)
-- iCEBaLM
Re:three types (Score:2)
Re:Why bother? (Score:1)
But then a lot of people just got around that, and just got to freeload.
Routers, hosting providers, and stats (Score:1)
These are the main routers for the hosting providers hosting the website in question. At some level these machines "know" not just how much traffic is requesting the given website, but how many different IP addresses are requesting, how many of the requests are short/quick Cache refreshes vs. long "real" sessions etc.
Capturing this data would require some sort of sniffing, which could have a performance hit and does raise security implications but could be overcome. Additionally the main routers for most hosting companies are outside of the control of the client companies so this data could be seen as more trustworthy than the server logs from a machine to which the client company has root.
Anyway, just my thoughts a fair amount of work would need to take place to turn this into a new profitable line of business for the hosting companies, but given the market need for accurate data I think it would be worth pursueing.
Shannon
Meaningful Web Stats (Score:2)
Now if a company is interested in gathering web statistics in order to steer corporate decision making, then they should really look at collaborative filtering [http] as a means to do this. No matter what else you have to say about Amazon.com, their implementation of the Net Perceptions [http] collaborative filtering engine is incredibly accurate at analyzing and predicting their customers' needs/desires.
Fun with fake names/addresses (Score:1)
Makes me wonder how much junk mail the Chicago Cubs get and routinely dispose of...
---
Re:Why bother? (Score:4)
And this is the really sad part. The information age has created a new type of cyber-criminal. The false information broker. Society is moving away from products and building multi-purpose machines. As a whole were're more service oriented than we used to be. This means all our assets and business transactions are on paper. Nothing tangible is being exchanged. And typically we have such a high volume of data being transferred that it can't be checked for 100% accuracy. I signed up for one of those "saver" cards at a local grocery store(part of a national chain) and totally faked the information on the signup sheet(I get enough spam as it is, thank you very much) No one caught it, even though an application with an address of 1600 Penn Ave in Ft. Worth, Utah with a completely made up Zip code and a Texas DL number showing up at a store in Tennessee _should_ have raised an eyebrow or two.
So now we have the buyers and the sellers. A buyer can't always trust a seller and a seller can't always trust a buyer. Enter the middleman who keeps both parties honest. Am I the only one saddened by the necessity of a service like this?
Steven
Re:Lies, damned lies, and proxies (Score:1)
This is just the warm up... (Score:1)
Where it starts to matter is measuring audience for streaming media. And neither server logs nor sampling will give an accurate vision of what is actually happening. This is where real audience management comes in, from the likes of companies such as Reliacast [reliacast.com]. The ability to get exact counts of the number of participants on a streaming event regardless if it is a unicast or multicast event.
all persons, living and dead, are purely coincidental. - Kurt Vonnegut
Hits vs Page Views (Score:1)
I've seen some stats pages, and there are usually over ten times as many hits as there are page views.
I'm assuming that the average visitor views more then one page. So when they say "We get X million hits a month" they're only getting roughly X hundred thousand visitors or so? Or are they using "hits" as a term for visitors?
Do these people even know their own real stats?
Re:even radio... (Score:2)
Wonder what kind of range it has?
Never give out your logs! (Score:2)
Re:Why bother? (Score:2)
You let them have your DL number? Seems kind of pointless to lie about the rest of the stuff when your DL number is on there.
I guess I am assuming that you didn't lie on your drivers license (about more than your height and weight)
pardon me.. (Score:1)
peas, -Kabloona
Re:More importantly, demographics (Score:3)
mod_log_spread to an auditing host? (Score:1)
Re:Questions on making your own stats (Score:1)
Wussgage? is that some kind of measurement of the tendancies to give up a in a fight, or complain or something? Exactly what kind of scale does one use to guage the amount of wuss in a person? Is this differnt than measuring the amount of wussy in a person?
Or is this the thing I keep hearing in that Budweiser commercial: "Wusaaaaage?", "yeah, wusage.", "Wusssaaaaaggeee!".
hmmm.
URL fixed: mod_log_spread to an auditing host? (Score:2)
Overlooked? (Score:2)
The Divine Creatrix in a Mortal Shell that stays Crunchy in Milk
Re:Carnivore (Score:1)
I know its off topic..but huh? (Score:2)
I'm sorry... I just don't buy it... a device that could detect what radio station you are listening to? Nope. Don't buy it.
---
Get the ISP to give up stats (Score:1)
Get bandwidth statistics from their ISPs if you can.
The ISP that I work for generates stats on almost every interface on our network, save a few odd pieces of hardware that do not support it or are not worth supporting it. You cannot count the hits, but you can count the proverbial p0rn that they are pushing... or pulling.
Most Downloaded Woman (Score:3)
When I eventually went to her site (I can't even remember her name for gods sakes) she had almost no pictures on it of herself, lots of other girls however, I tried in vain looking for some of her and I was thinking to myself that the numbers were severely inflated.
While this might be an "obnoxious" question I think a standard way of evaluating just how many hits and downloads a site gets needs to be determined, expecially for awards like the Guinness Book.
-- iCEBaLM
Re:even radio... (Score:1)
---
I'm surprised larger companies are so un-educated (Score:1)
Simple solution (Score:1)
Re:Honesty (Score:1)
He got caught. He deserved to be caught. I thought it was damned hilarious!
Basically, he had just enough knowledge to be dangerous, but not enough to make his hits look like real, unique hits. The best thing about it is the guy has a massive ego and it took a huge deflationary hit ;-)
Re:Why bother? (Score:1)
Walt
Re:Most Downloaded Woman (Score:1)
of course, that's for the total amount of time she's had the site. i have no idea how long that is, but i seriously doubt it existed before '96.
most people on earth dont have internet access or dont care about the internet, but that still leaves a lot of people who do. on a quick look, i couldnt find any numbers to supply. considering she's backed by playboy, i would expect her audience to be worldwide.
(not to mention it's porn....it probably comes up in every search done on the net.)
Darth -- Nil Mortifi, Sine Lucre
Re:Lies, damned lies, and proxies (Score:2)
[technos begins scrawling in the checkbook.. Pay to the order of: Courtney Love, Date: September 25, 2000, Amount: $3,000 and no cents]
Re:three types (Score:1)
I work for a company that recently made its way into the Media Metrix top 20, and I know that we built our name by focusing on popular, yet niche, content. Some of it didn't "rock," but that's all subjective, and we invested in the numbers.
Once we have the numbers, we can develop the investment and provide quality assurance in creative and informational aspects of the network. In fact, that happens to be what more than a few people do around here.
Re:Hits vs Page Views (Score:1)
Those that are pointing out that "sessions" are a more accurate measure overlook the fact that banner ads are typically delivered per page pull - and users who view 10 pages may see 10 completely different banner ads in that time.
These numbers are important for both advertisers and web operators alike, because many banner ads pay on a cost per thousand (CPM) basis. Advertisers need to see how many people are actually seeing their ads, and operators need accurate numbers to be sure they are paid fairly.
Any web operator you meet, can tell you that the page impressions counted by the banner ad company *never* match the page impressions reported by the logs, and often are on the order of 1/3 the number webtrends or webalyser reports. Add that to the idiotic way companies like Media Metrix are "projecting" traffic based on relatively small sample sets, and the result is, that website operators always manage to get screwed in the deal.
Re:Why bother? (Score:1)
Nope, that sort of criminal has been around for quite awhile. The classic example is the "blue book" purporting to give average market values of used cars. In fact, the blue book is put out by the used car industry with higher-than-market prices, solely for the purpose of allowing used car dealers to advertise that their prices are "below blue book", and/or convincing consumers to agree to artificially high prices for used vehicles. (There are other used car price guides which are more accurate in their values.)
Re:Faking stats (Score:2)
Good point...and its not like web logs are the only thing that gets treated like this.
---
Re:I know its off topic..but huh? (Score:2)
I would guess the device works by picking up the audio coming from your car, then comparing it to the output of known radio stations in the area.
Just gotta mic each parking space.
Re:Simple solution (Score:1)
Nathan "They're watching me, I swear" Cento....
(Yeah yeah yeah... I sould learn to use PREVIEW)
Re:Lies, damned lies, and proxies (Score:5)
Okay, I did it. Unfortunatly, I was reading your post at the same moment my boss was entering the cube, and I've been fired. Under the terms of the 'technos' AUP (As amended September 12, 2000), and UCITA, you are hearby notified that you owe me $28,941,285.42.
Referencing clause two of the AUP, this number reflects the sum of my maximum earnings potential until retirement age, as well as the cost of obtaining said employment (six years of college at a major University), as well as an additional 34% transgressive penalty and a 9% compounded cost-of-living increase.
You have ten business days to remit the sum, in whole, or I will be forced to submit a class B lien request against both your holdings and those of your employer in the State of Maryland.
Clause six clearly states you indemnify me against any legal malfeasance or action, so don't even try to get cuetsy with a countersuit. It has a binding compensation clause of $2,000,000.
Web server statistics are NOT for marketing! (Score:5)
Re:I know its off topic..but huh? (Score:1)
Re:Fraud (Score:1)
the trick would be in writing a script to do it that would be subtle enough to not be caught by someone analyzing the logs and yet get enough clickthrus to make it worth the profit.
Darth -- Nil Mortifi, Sine Lucre
i use webalizer (Score:1)
Re:Questions on making your own stats (Score:2)
They're not interested in "Hits" (Score:1)
We get asked for detailed reports on "impressions" (an old print advertizing concept) and "page views" and "unique visits" and "return visits", and "length of visit", and "pages per visit"
Who the hell get's paid for hits?
Re:webmetrics (Score:1)
You mean they prefer post-its? I was wondering why they always had the shotgun ready when I returned....
Re:Why bother? (Score:2)
--
From the advertising point of view... (Score:3)
I come from a good (read: more than five years
In reality, our clients still really don't understand why these numbers are so different and then question our recommendations based on what they read. It challenges our reputation and affects the trust the clients typically feel in our creative or media teams. Broadcast and print, as well as the other "offline" mediums, really then have one big advantage: those mediums have been in use long enough that our clients no longer ask the questions of "how can we justify those reach numbers" or "sure I see what you're saying, but my other consultant says that you're only reaching half that audience with that commercial."
So, maybe the challenge really lies with each of these "measurement" firms not admitting that they could be wrong. Maybe its that the sites that are polled are financially incented to inflate their numbers to justify acquisition or second-round financing. Maybe its that the technology exists to perfectly track a user's path anywhere, anytime but one of the first "features" in the browser was anonymity. Maybe it's the convergence of all of these different pieces at the same time (which is most likely the case).
Sad. The interactive space has such opportunity to get around lofty advertising and blink-tag style direct marketing. But unless we can justify the funds, apportioned largely based on reach to the market, we won't end up with the type of experience marketing that actually ads value to those of us online.
If you're worried about "inflated" stats... (Score:2)
Re:Dealing With It Now (Score:1)
I was recently speaking with a company who advertised on our site through a network. They were spending $25 CPM for the ads, and we saw about $1 CPM by the time it made it our way (due to advertising agency and network costs, which seem to be much larger than stated).
My suggestion to content sites: learn how to sell your own ads. Even if you sell just a handful of your impressions, you'll probably make more than any network could bring you. Keep the sales in house.
Prevent log file forgery and gain creditbility... (Score:2)
Rob.
A little late, a little old... (Score:1)
See what happens when your morning is full of meetings? Sheesh...
Anyway, although now it's looking old and stale, I still consider the following paper of mine, which was published a few years ago, to be relevant to this topic (IOW, things haven't changed enough since then to make it irrelevant):
Examining the Validity of World-Wide Web Usage Statistics [drizzle.com]
Enjoy...
Re:Dealing With It Now (Score:2)
sig:
Server logs and their usefulness. (Score:2)
third party logging? (Score:2)
The hosters guys can't access the box because it is literally black boxed (locked up, no physical access, and no knowledge of the logins/passwords)
The third party logger can remotely access his box, download logs or whatever and provide that info to the advertiser. The advertiser can then check the logs of the hoster and compare them to the thirdy party (aka verifier). If the verifiers logs match the hosters you know the data is somewhat accurate (at least as accurate as these things can be).
I mean, nielsen does this with those boxes they give to their test families, why can't some enterprising third-party verification company (hmmmmm?) do the same with web-hosts.
This looks like a nice little niche market for exploitation and mucho money to be made off of. I mean you write a few scripts to keep control over your logs and to send the logs back to a central server that formats this stuff into nice pretty print outs for the suits to drool over at their next board meeting.
Just a thought...
Web site usage has to be done client side (Score:2)
The only effective measurement of web traffic is by having volunteers that use a special proxy that reports what sites that the user visits back to a server, and to generate it from there. Exactly how the Neilsen boxes do it for television, which unfortunately means the same problems will crop up (Neilsen families tend to be favored around east/west coasts, thus making shows that appeal to midwest or plains state viewers less popular by appearence). Additionally getting volunteers might be a problem, as you'll most likely create a biased set by whom you select. And probably most importantly, privacy issues are more apparent for net ratings.
Problems with measuring traffic. (Score:3)
In the end, the only way to guague how many people have read your site is to place unique or unusual information on it, and then find out who knows it.
Dealing With It Now (Score:3)
BS. So I applied to Engage [engage.com] (formerly Flycast) last night to get our ads through them. Are they any better? I have no idea. But I do know that ContentZone is screwing us over, and that's incentive enough for me.
-Waldo
Re:Lies, damned lies, and proxies (Score:2)
three types (Score:3)
I think that if you're investing in a web company, you should IGNORE the statistics. Go to the site. If it's lame, don't give them your money. If it rocks, go for it? How hard could that be?
Re:i use webalizer (Score:2)
For some bizarre reason, there were 15 counts of a referrer from osdn regarding the Slashdot cruiser. I checked the Slashdot cruiser web page only to find nothing linked to my site. Strange(but then again a page doesn't have to be linked. It could be 15 people were at that page first, and then went to something on my site)
Other than that, I found the usual google returns, and plenty from articles I commented on from here.
And when i hosted a real wacky e-zine(the boulder news frenzy) which had tons of vulgar language, every pervert with keyword searches of "toilet sex", "rape", etc went to the zines on my pages, only to be disappointed to find an ASCII rag.
So take a look at those referrers. You'll be amazed what you find. Often you'll see someone on a webboard post a link to one of your pages with a positive/negative comment.
Why bother? (Score:4)
In fact, by looking at your own logs, you can say, "Well, Yahoo sends 10,000 people a day to my site, but only 10 of those people buy anything.. Meanwhile, Slashdot sends 1,000 people, but 500 of them end up buying stuff."
So why are such ratings needed?
--
Carnivore (Score:4)
Carnivore is the answer. Let the feds provide accurate and unbiased information!
Re:I know its off topic..but huh? (Score:2)
---
Re:Why bother? (Score:2)
In the example you provided of 10K clicks when 20K were expected, this can be chalked up to a crappy banners by your graphic artists. However, there are brokers out there who buy ad space from many websites and sell it at a reduced rate to companies, some of these brokers buy space from the companies that pay their users to click on banners, in which case you'll get a high clickthrough but nobody who clicks through is interested in your website and will click the back button immediately. If you are contacted by a broker, ask them what websites your banner will be on, if they do not mention any of the websites where people are compensated for banner clicks, and it turns out that the majority of the banners are going to these companies, contact the broker and tell them to pull your ad then contact your credit card company and dispute the charge. This has happened to my company several times, the brokers have never attempted to get their money after being told the charge was disputed.
Careful what you measure (Score:2)
As for the ease of faking server logs, not a problem (inserting standard I-am-not-a-lawyer disclaimer here): if you're using them as proof of traffic to your advertisers, write that into the contract -- then faking the server log becomes fraud, with the appropriate legal remedy's available. This is not my favorite solution (especially not with anything to do with the Internet), but displaying advertisements for money is a business relationship, and can be managed as such.
Re:Why bother? (Score:2)
Sure the Internet is providing a new avenue for many past practices, and the information-centric focus does create greater opportunity for "fudging", but this isn't anything that hasn't happened before.
Re:Serious question about this... (Score:2)
The premise behind this "marketshare is everything" is that, since the internet is a "new thing", the guy who takes over the most marketshare first, is going to be the dominant player - people think this way because they saw what happened when Microsoft entered a new market, and got the most marketshare. They dominate. They're damn near owning the whole freakin world. If they had played it more laid back, and done more honest hard work up front, they probably would have avoided this whole DOJ mess, and ten years from now, *would* 0wn the whole world. But no, the execs got lazy and greedy, and when it became apparent early on that Microsoft was only interested in putting out "good enough" products and killing off competition (instead of allowing competition to exist, albiet in a weakend state), the threat was so obvious, they had to be stopped. Act like a bunch of gangsters, get treated like gangsters.
Anyway, the investment and business community is expecting SOMEONE to take over, and they want a piece of the action, of course, so that's why people are willing to risk a few investment bucks on who they perceive will be the Genghis Khan of the Internet.
That's the "new economy" in a nutshell. And frankly, AOL/TW is "it".
Re:Fraud (Score:2)
Exact Counts? (Score:2)
Questions on making your own stats (Score:4)
What I'm blocking out so far is:
our company's internal IP traffic
images
funky robots like Keynote-Perspective that the old webmaster had let loose on our sites.
This gives us some numbers I have confidence in (even though they're 10x less than the numbers the old guy was producing through Webtrends), but I'd like to find out what others are doing for making their own web stats.
Thanks,
Steve
Serious question about this... (Score:2)
Seriously, who really cares if NewsTrolls is visited more than Slashdot (just an example). The important thing is that they're getting visitors and the owners are enjoying their job.
--
Lies, damned lies, and proxies (Score:4)
Excuse me while I go "Grumpy old man". This is an old, old problem. It goes back to the days when I first started using the web. See "Why web statistics are (worse than) meaningless [cranfield.ac.uk]." It's an old article. That's the point.
In short, spiders, proxies and caches make it impossible to be accurate in measuring traffic. But everyone else is affected the same way. So your relative stats are relevent-- they just aren't hit-for-hit accurate.
What your server logs are really for is resource planning. They'll help you find out how much traffic your server is serving, which should help you plan bandwidth and hardware upgrades as needed.
one thing i can count on... (Score:2)
the one thing I can count on is that my site doesn't (and won't) get any hits
Re:Questions on making your own stats (Score:2)
Fortunately, I was only intrested in changes over time, so I concentrated on inventing a measurement that gave a fair comparison.
My work was done the old fashion way... Look at some server log filter out the obvious (gifs, anything with a sessionID in it, internal or developer hits etc) throw spss (statistical analysis tool) at it and start scratching your head...
I started my presentation by telling everyone to please ignore the absolute figures and focus on trends and variations.
What really bothered me was the thought of how good an site analysis tool I could have hacked together in those hours I spent decrypting archived data. The intresting part was to see how some people really care about being anonymous on the web. Makes an slashdot addict glad to see stuff like referer="none of your business" and cookie: Note="like most people I prefer my browsing habits to be anonymous"
Re:Lies, damned lies, and proxies (Score:2)
I still hope to get some time to work on BBStats [capsi.cx] again, my webstats package - in terrible shape as it is - but I was hoping to solve this by using optional session cookies or the ability to import other cookies from the site.
For example, /. could log my IP to see whether I am unique, but they could also fetch my cookie (and still fall back on IP if it doesn't exist).
Reality and Comparable Statistics (Score:2)
First, suppose I am using a number of web sites to promote my online store, In this case, I may be most interested in the amount of sales each site produces from click through users. For this purpose, I can simply assign a sale to a certain site. For the purposes of this discussion, I will assume that all sales can be assigned to a certain web site. At certain intervals, I can find the percent profit attributable to each site, and create a statistic with the ratio of the % profit from a site to the cost of advertising on that site. This statistic will create a valid comparison between sites.
Second, suppose I am most interested in branding, as Verizon is of late. In this case, I might want to pay an external agency to monitor the sites on which I advertise. Such an agency would presumable use a consistent and statistically sound method to determine the number of eyes that has seen my brand. I can then set up a statistic with the ratio of # of eyes to the cost of advertising for each site. Again, this will create a valid comparison.
It is notable that in either case the web logs for particular sites are not clearly useful. Even if the information itself was not suspect, web logs would not be comparable between sites. It would be difficult to set up a useful statistic to compare the value of each site with respect to my product. To put it another way, the web log for a particular site are useful to that site for generating a number of site specific statistics, but few if any of those are going to be of interest to me as a paying advertiser.
Hey a use for CARNIVORE! (Score:2)
-------------
Re:Why bother? (Score:3)
They're needed because they have to have numbers to show to their advertisers. An ad that's being viewed by 4 million people has significantly more value, and thus has a higher cost, than one that's only being viewed by 2 million people. If someone is willing to take the hosting site's word at face value with regard to eyeball real-estate, then I've got some banner ads (and a bridge) to sell them.