
WWW Surpasses One Billion Documents 157
Gary William Flake writes "A new study by Inktomi and NEC Research Institute show that there is at least one billion unique indexable Web pages on the internet. The details are pretty interesting; for example, Apache dominates the server market.
"
mcdonald's effect (Score:1)
Apache: Millions and Millions Served
(For Free!)
Nice :) (Score:1)
In related news... (Score:2)
the best part (Score:4)
http://www.tax.taxadvice.taxation.irs.taxservices
taxpayerhelp.internalrevenueservice.audit.taxes.c
gee. A tax site with a long, unintelligble, confusing domain name. Go figure.
"You want to kiss the sky? Better learn how to kneel." - U2
Billions Served (Score:1)
A billion pages of information, and nothing's on.
Now, if real life exemplified the web, we'd know that 85% of the earth's population speaks English and, as can be expected, the IRS's domain name proves to be a lesson in redundancy and triplicate.
And at least one of them already comments on that (Score:4)
Meaningless Statistics (Score:3)
For all you know - the web has surpassed at least 1 webpage count. Big Fscking Deal!!!
Heh... (Score:2)
<DrEvil>One... billion pages</DrEvil>
Sorry - couldn't resist. :=]
________________________
Longest Domain Name (Score:1)
Of course its about taxes, you got to hand it to the IRS, even their URLs are hard to read and understand. I wasn't able to open this link, can anybody else?
Why? (Score:3)
Re:the best part (Score:1)
This is not and never was a real site, I doubt it very much. It was definately just used for Search Engine Spamming, nothing else.
- TheOpus
technically inacurate statistics (Score:3)
All I want... (Score:1)
Seems almost insignificant (Score:1)
-BlightX
Apache is the largest (Score:1)
Anybody following Netcraft's [netcraft.com] Web Server Survey [netcraft.com] already knew this. But it's still nice to get it confirmed from additional sources.
Re:the best part (Score:1)
Secondly: It isn't anything at the moment, it won't resolve. I can't even resolve audit.taxes.com.
Indexable webpages (Score:1)
Inktomi, publicity, and mod_perl (Score:3)
One thing of interest, though. If you look under the "Web server market share", Red Hat and mod_perl are apparently web servers now.
Re:Longest Domain Name (Score:1)
Taxes.com isn't in use (except by a domain hoarding company).
The IRS also has nothing to do with it either.
Hmm...
I believe it... (Score:1)
Online gaming for motivated, sportsmanlike players: www.steelmaelstrom.org [www.steelm...gtargettop].
Re:Meaningless Statistics (Score:1)
How long does it take to count 1 000 000 000 links anyway?
Apache Dominates... (Score:1)
Just looking at the top three:
Apache 60.33%
Microsoft-IIS 25.26%
Netscape-Enterprise 3.79%
Wow - Apache still kicks everyone else's butts, and not by a small margin! I think Apache is about the perfect case for OSS development - not just being a blip on the radar getting larger, but, covering almost the entire radar screen!
I'd love to see more stats out of Inktomi on this, but, it's still cool to see what little the did provide (261,472 links to MP3.com should say something about the digital music scene )
MS uses Inktomi uses Sun (Score:1)
Stuff like that make me smile ;)
-Peace
Dave
Did they bump the count to extraghost.com? (Score:2)
1,000,000,000+ and what do we have? (Score:1)
Also note that while these pages exist, there is also a lot of random crap out there that really just wastes space and time. As the number of pages increases, I'm sure that it will be harder and harder to find quality documents among the wasteland of stuff we don't need.
"You ever have that feeling where you're not sure if you're dreaming or awake?"
suspicious (Score:1)
why did they list the number of links to rickymartin.com or cooking.com
why did they list the longest url as a nonworking URL that probably used to spam the search engines?
oh great, uh hey guys, today I have determined there are 1 billion webpages!
Thaaat's great... (Score:4)
Finding information on the web is going to increasingly be like trying to find hay in a needle stack. Already the current indexing engines can't keep up, and you have unscrupulous web authors putting bunches of keywords unrelated to their site in their meta tags to insure that they get mentioned in every single search. Some indexing engines already ignore meta tags for that reason. And how many times have you tried Altavista, Excite or Google only to find that the page you're trying to get to has expired or is 8 years old and hasn't been changed in 7?
This issue is going to have to be addressed, because the web is going to continue growing.
Re:Seems almost insignificant (Score:1)
Personal sites are what the web is for! All
this commercial and ecommerce stuff is just silly.
So... has anyone figured out how many monkeys? (Score:1)
The internet disproves this hypothesis.
But seriously - has anyone figured out how long it would take to requroduce certain random documents? - such as the works of shakespeare?
Re:Inktomi, publicity, and mod_perl (Score:2)
unique? (Score:4)
Really, this article says nothing. Unless it states (and it does not) *exactly* how they mean "unique" I'm not going to take this seriously. A more interesting statistic (and one I haven't seen updated in awhile) would be what the information conversion ratio is between the "RealWorld" and the web - ie: how much information that you can find in a library can you also find online in it's entirety. That is a more accurate measure of growth than raw page numbers.
1 Billion useless pages. (Score:4)
49.5% pr0n pages with javascript popups
1% other
We humans should be so proud of ourselves.
:)
Re:the best part (Score:1)
http://Llanf airpwllgwyngyllgogerychwyrndrobwllllantysiliogogo
---
Re:Meaningless Statistics (Score:1)
Re:And at least one of them already comments on th (Score:1)
Always remember... (Score:1)
Always remember... (Score:1)
A public or private search engine? (Score:2)
Is inktomi publicly searchable? If it is not, then my pages wouldn't be publicly searchable. So, what's the point of them making connections to my sites?
Is the following how you ban a site from your server?
/etc/httpd/conf/access.conf
#deny from domain
Or from the late Carl Sagan (Score:1)
*Sarcasm* Gee Wiz (Score:1)
The Internet is large. Leave it at that.
Cheers,
Slak
Creepy - (Score:1)
I'm just curious if that was supposed to be impressive or disturbing. Of course, a good lot of those one billion pages are made by teenagers so-
-Noiz,
Who thinks Ricky Martin looks too much like a clone to be a "hottie".
---------
Inktomi false advertising? (Score:1)
Someone please tell me if I'm missing some great coolness here. After all, I haven't used anything other than Google for months.
Re:Indexable webpages (Score:1)
Shades of David Langford's "Net of Babel" (after Borges). Or see here [tlon.com] for a real demonstration that the 'net contains an infinite amount of data (although it'd be stretching to call it "information").
Re:the best part (Score:2)
in.2032.the.world.as.we.know.it.will.self-destr
*There was nothing illegal about them, execpt that the university banned servers.
only a billion? (Score:1)
1 Billion no phone (Score:1)
-Peace
Dave
Re:And at least one of them already comments on th (Score:1)
because 90% of online p0rn is crap too
erm
or so I am told :+)
--
Re:technically inacurate statistics (Score:1)
Did anyone else notice the language bias? (Score:1)
Re:Nice :) (Score:1)
RedHat? (Score:1)
Since when is RedHat a webserver and not a distribution? I'd like to know the method these guys used to get these stats, and why they listed Redhat as a server.
Recount (Score:2)
Inktomi vs. Google (Score:1)
"By examining the entire Web and analyzing the billions of links between all of its documents, Inktomi can distill an index of the highest quality documents to provide users with
more relevant and intuitive results."
Isn't that the "technology" that google has patented?
Re:suspicious (Score:1)
its called an example
Re:Meaningless Statistics (Score:1)
I guess if you keep verifying each set of results, we will eventually reach what could be collectively known as an "accurate" number. But who wants to spend all that time, when we can just take these numbers and assume that they're good? I admit, I certainly don't, and I am happily willing to say, "Hey Inktomi, and NEC Research Institute, thanks for the thorough study and it's subsequent report! I can now sleep better at night knowing that my one web page on the internet is confirmed to be not alone! Way to go!"
But wait! How am I supposed to know that my one teeny website was included in their numbers?!? Hmmm, guess I'll have to run my own study just to verify, but then someone else will have to verify my report
UK or US? (Score:2)
1,000,000,000 (US)
or
1,000,000,000,000 (UK)
There's a large difference.
Re:A public or private search engine? (Score:1)
Inktomi, last I looked, don't run a search engine site; they develop the tech and license it to others who get involved in the messy business of making a popular search engine site.
IIRC, their highest profile customers are Hotbot [hotbot.com], who used Inktomi from the start, and Yahoo! [yahoo.com], who switched from Alta Vista to Inktomi. Inktomi is a more neutral backend for Yahoo!, who are competing in the same market as Alta Vista.
Dave
--
Re:1,000,000,000+ and what do we have? (Score:1)
Re:Why? (Score:1)
choose hamsterdance from the list.
unless you have a small child with you. (it's the only cure)
Re:A public or private search engine? (Score:1)
Use Google (Score:4)
Google is one of the best search engines available for most purposes, because it ignores meta tags, and scores pages higher based on links to the site from other high-scoring pages (this is a recursive definition but the recursion bottoms out).
The result of this is that it gives useful results even when very common words are used. Try searching for Linux on Google. The first ten results are
While a human being might be able to come up with a better list, a machine came up with that list, based solely on the structure of the web. (I wonder why linux.davecentral.com rates so high -- possibly because it's attached to a high-ranking site, davecentral.com).
ObAdvocacy: and Google runs on Linux.
Re:A public or private search engine? (Score:1)
Re:Nice :) (Score:1)
Re:A public or private search engine? (Score:1)
First, why do you not want them to index your site ?
Second, if you've read the other replies to your question, you might want to re-consider...
Finally, I believe the all search engines will ignore you if you do the steps they give. That is, if they follow the rules.
Hmmmmmm (Score:1)
all persons, living and dead, are purely coincidental. - Kurt Vonnegut
Hmmm...4 Billion pages... (Score:1)
Geez....I say that there are far too many people on the net who just don't belong, and freedom of speech or no, some people shouldn't be allowed to make web sites.
Who am I?
Why am here?
Where is the chocolate?
Re:A public or private search engine? (Score:2)
Inktomi sells their technology to other companies; they don't operate a search engine under their own name. HotBot [hotbot.com] is Inktomi-based; there are others as well but I don't know who.
Re:1 Billion useless pages. (Score:1)
And to expand on that statement....each of those popups is another page adding to the "1 billion".
So the ratio is like 1 pr0n page to 15 popups! =)
Pablo Nevares, "the freshmaker".
One billion documents in the Inktomi index (Score:1)
Re:Use Google (Score:1)
--Evan
Re:Hear Hear!! (Score:1)
Re:So... has anyone figured out how many monkeys? (Score:1)
An almost infinite number of monkeys bagning away on a similar number of typewrites will eventually reqproduce the works of shakespeare.
An almost infinite number of monkeys banging away on a similar number of typewriters will create...
... one hell of a mess!
Re:A public or private search engine? (Score:2)
Re:Apache is the largest (Score:2)
Netcraft's measure is by number of servers, while this measure is by number of pages.
It's not suprising that they both agree, but it's certainly possible that larger sites might have a different server to the average site, causing a difference.
Impressive Marketing statistics (Score:4)
Well, my take from the site that what they're actually saying is "Look at our lovely indexing cluster. It can index 1 billion web thingies! Shouldn't you be buying an search engine product that powerfull?
Or, in other words, it's another example of meaningless statistics spewed in the name of marketing, vaguely covered-up as serious research.
References: Car MPG & top speed figures vs actual usage, Processor MHz as function of system throughput, quoted battery life as function of laptop utilisation, quaketest FPS compared to average internet multiplayer experience etc etc etc...
Infinity (Score:2)
Hair splitting alert ON.
The number of (different) pages on the web is actually infinite. Here [eleves.ens.fr] is a sample infinite component.
(Actually it's finite because the maximal accepted length for a URL is finite. But it's way above the billions.)
Note that these are not dynamical pages. Dynamical pages (i.e. pages whose content changes for the same URL) don't count: they're cheating.
(The source used to generate this infinite number of pages is available under the GPL [quatramaran.ens.fr].)
In related news... (Score:2)
Re:1 Billion useless pages. (Score:1)
At a speed of one page per minute, it will take the rest of my life to read them all (about 57 years, considering that I can't read more than 8 hours a day: I'll also have to eat and sleep, ...).
ms
That's search engine trickery... (Score:1)
Guess it's time someone anti-microsoft gets microsoft.ms.windows.windows2000.windowsnt.office
Re:the best part (Score:1)
it's phony.
---
Re:Seems almost insignificant (Score:1)
Why are personal sites pointless ? Just because most of them aren't things you wan't to read doesn't make them, IMHO, useless.
In fact it's the empowerment that enables Ordinary Joe to publish his personal page that makes the web what it is and not just a virtual shopping center.
I'd just like all six billion people to be able to participate.
Good for the WWW, but where's my damn page??? (Score:1)
But keeping track of all these billions of pages, will be difficult, and sooner or later, people are going to demand satisfaction! (Slap me with that glove again, and I'll give you satisfaction, in a
The Gray Wolf
Re:Meaningless Statistics (Score:1)
Mr Owl, How long does it take to count 1 000 000 000 links anyway?
Mr Owl: "ah one, ah two, ah three *CRUNCH*-- ah three"
There you go folks, it takes three to count 1 000 000 000 links. Thank you, Mr. Owl!
Re:Meaningless Statistics (Score:2)
Remember, 53.4% of all statistics are invented on the spot. Of those, 63.1% are never checked against any reliable source. The rest are attributed to a survey done by Expensive Management Consultants [devnull]. You can buy a copy of the report from them for only $2499, which includes the introductory price of a year's subscription to their weekly newletter containing the abstracts of other reports you can purchase, at a substantial 10% discount off the regular price that no one ever pays them anyway.
Re:Inktomi vs. Google (Score:1)
Mark Papadakis, WebDeveloper
Re:A public or private search engine? (Score:2)
No hits.
Google finds them, though.
Something's definitely amiss regarding Inktomi.
Re:And at least one of them already comments on th (Score:1)
because 90% of online p0rn is crap too :+)
Do you mean fecofilia, or just low quality? *impertinent smirk*
--unDees
Re:Apache is the largest (Score:1)
Re:A public or private search engine? (Score:2)
Hotbot uses Inktomi technology. They don't use Inktomi's database (I don't know who does).
Re:the best part (Score:1)
My idea, you can't patent it (Score:2)
I'm willing to help moderate on some subjects.
just a thought (Score:1)
so that means that if each and every page on the WWW were worth $100, then it would equal bill gates' pocket.
that's nuts
Re:Apache is the largest (Score:2)
Re:UK or US? (Score:1)
Large but Finite number of monkeys (Score:2)
The Internet does not represent an infinite number of users (at least, not yet) but you're still more likely to get an infinite volume of monkey shit out of it while you try to dig up the works of Shakespere.
Or you could save time and go here. [mit.edu]
Re:Why? (Score:2)
Re:Web Antiques (Score:2)
Your Working Boy,
Re:the best part (Score:2)
is
i.should.co.co
but I dunno how to register a hostname in columbia (or whereever CO is)
Re:A public or private search engine? (Score:2)
Ahh, I see now. They are crawling my sites but not letting anybody search the results unless they pay big bucks.
Hmmm, looks like I'll be making a modification to my robots.txt files and possibly adding some new rules to my firewall.
I should be allowed to find out what info about my sites they are trying to sell. If I can't, they won't be getting access.
One billion channels and nothing on ... (Score:2)
Just another for-all-practical-purposes-meaningless statistic to nonetheless feel overwhelmed by, I suppose.
If there were a billion pages to look at, I don't know when I'd have the time to do anything else, being the info-junkie that I am. Fortunately, a sufficient quantity of these pages do not interest me.
Then, too, I wonder how many of these pages are de facto duplicates? ("Department of redundancy department, redundant division speaking
That also makes me wonder more about this statistic. Are there one billion ACTIVE pages, or merely one billion pages that have ever existed? If the former, how many pages have ever existed? That would be an interesting question
Well, by making this post I'm probably creating yet another page and adding to the noise and confusion. Consider it my chaotic deed for the day.