Please create an account to participate in the Slashdot moderation system


Forgot your password?
The Internet

Brewster Kahle & The Largest Library In History 88

BorgiaPope writes "WAIS creator and Alexa founder Brewster Kahle is interviewed by Feed. Kahle talks about the 30 terabytes of 'net content stored in Alexa's Linux servers, a data store he calls the 'largest library the world has ever known.' Some fascinating observations about how sites move in and out of the top traffic tier. He also claims that the top ten Web sites have the "greatest worldwide concentration of power since the Roman Empire.""
This discussion has been archived. No new comments can be posted.

Brewster Kahle & The Largest Library In History

Comments Filter:
  • Some national libraries try to collect every published medium in order to preserve it for future generations.

    For this purpose, from every published item, at least 1-4 parts have to be provided to the library. This includes books, daily newspapers and even school newspapers. (In the hindsight, maybe the reason for this institution was not preservance of things, but to be able to control and censor). However as far as I understood, they also neglected a lot of the published information: music, TV, radio, and the new, possibly very short lived internet sites.

    These public institutions have to rearrange themselves quickly, so that all this possibly valuable information will be available in the future for everybody and access (for scientific or educational purpose) will not depend on the kindness of the companies that harvest this information now.

    Dont forget what happens with Dejanews!

  • I haven't missed the point at all.

    Britain, a dinky little island with a relatively tiny population at the time, controlled just about 2/3 of the globe.

    Less than 1% controlled the remainder.
  • Wow! Google is at Number 15, up from 19 last week. That's pretty impressive. Way to go, Google-guys.

  • Is (at no 24 in the rankings) a pron site for robots?

    Don't flame me for watching too much futurama...
  • Excellent points. Another thing that makes a library is a library catalog. A library isn't just a pile of books, it's arranged in an orderly fashion (usually by subject) and with a catalog allowing access by subject, title, author, type of document, etc.

    Kahle touches on this a bit in the interview, noting that human cataloging in infeasible for such a large collection. Perhaps. But without it, I don't think it can really be called a library. Cataloging librarians apply subject headings--some much more specific than anything you will find in Yahoo or the ODP. (And some less specific too. One valid criticism of library cataloging as it's practiced today is that it's too slow to keep up with emerging subjects. Subject headings in computer books, for example, are currently much less diverse than they ought to be. But I digress.)

    It's not just subject cataloging where humans still do a better job than computers. Even titles and authors--which seem simple and straightforward at first glance--need that human element. Here is a book by "John Smith." Is this John Smith from Ohio, born in 1956, or John Smith from California, born in 1963? If it doesn't say on the book itself, a cataloging librarian will research this, so that the library catalog can differentiate between the two authors. Here's a videotape of "Star Wars". Will the automated cataloging system recognize that people might also search for it by the title "A New Hope"?

  • I must say, it's sad to see Yahoo at the top of the list, and the Open Directory Project [] not even on there, especially since it's now bigger than Yahoo, and growing faster. (Though, as always, it's in need of editors.)

    It is an interesting list to look over, some of the ones on there are very suprising.
  • from
    We now have about thirty terabytes of archival material that we data mine. And that's 1.5 times the size of all of the books in the Library of Congress.... we're now beyond the largest collection of information ever accumulated by humans... We use as our original inspiration the Library of Alexandria. Because they were the first people that tried to collect it all... They got up to five hundred thousand books... The Library of Congress -- the largest library now -- is seventeen million. Only thirty four times more than what we had in 300 B.C. "

    2300 years = 34x as many volumes
    less than 10 years = 1.5 x smithsonian and growing fast?

    and the bit about the top 10 sites get 20% traffic is incomplete-- his point is the linear nature:
    top 10 sites = 20%
    top 100 sites = 40%
    top 1000 sites = 60%
    top 10000 sites = 80%

    With every use of 'what's new' in netscape 4.06+ sending stats to alexia's servers, this guy gives me some real mixed feelings:--)


  • "Larry Wall won the obfuscated C contest. Twice. Kinda makes you think, eh?"
    It shows his immense talent as a coder.
  • I'm not sure if this public info, but if it's not, it is now. Both Wal-Mart and Air Miles have 40+ Terabytes of information on spending patterns of people. I think air miles was 46 Terabytes as of last year.. but don't quote me on it. So I wouldn't really worry about "revealing" information about yourself by using the "what's new" button or "related sites" button.. I'd be more concerned about the profiling being done on you each time you make a purchase with a credit card at Walmart or get those oh so precious air miles (you might get to fly for 45 minutes after 45 years of collecting!)
  • And by the way - why does he think that all web pages are made for profit and would disappear without it?

    Exactly. There were plenty of sites on the web in the good ol' days [sic] before ad banners, and there are still plenty of helpful, entertaining, and informative sites out there without any business model or revenue stream. It's like assuming all artists just want to get paid first and foremost - there are plenty who do it because they love it, and many of them would do it even if they never saw a dime from it.

  • Huh? Britain did not help the countries it occupied and exploited become more civilized.

    'Civilized' is NOT the word you are looking for -- maybe 'Westernized' -- but the Western way of life is not any more civilized than any of the other cultures that were occupied during the era of European expansion.

  • Am I the only one who thinks this whole article stinks of marketing nonsense? It reads like a long series of prompts for Kahle to plug Alexa. How about this question / brown-nosing:

    One of the things that's always been amazing about Alexa, and I think that people are increasingly realizing the power, is not just that you're able to see all this information about traffic patterns but that information slightly processed is being fed back to the users.

    Mmm yes, I think I speak for everyone when I tell you that I'm increasingly realizing the power of the "what's related" button in nestcape.

    All credit to them for opening up their archives to research centres free of charge, I think that's very important and a brilliant effort, but at heart Alexa are just a data-mining, marketing-driven outfit like hundreds of other dotcoms around the world.

    He's eager to be painted out to be some kind of visionary, but really, since 1991 all he's done is push WAIS as a way of charging for material over the web. I mean, good work in inventing a protocol and all that, but the charge-for-content model looks like it's failing right now. I think Britannica used to be on WAIS but it's free now.

    It's interesting the way he talks about the urgent need for a publishing system without once mentioning WAIS. I wondered what happened to WAIS, Inc, his attempt to provide a publishing system commercially back in 1991 or something. Go to [] and you'll end up by being redirected to [], an
    Enterprise portal suite [which] is the industry's first, fully integrated, scaleable, end-to-end portal system
    Don't you just love that internet marketingese?
  • IANAL, but cant a library archive information? I find it discusting when a library can no longer hold information because of stupid copyright laws...
  • This isn't a troll. This is just an observation. This particular "library", which you need special software to access, is not available for a large portion of the slashdot readers: the *NIX crowd. IOW, this doesn't concern us. Another boring post for slashdot...just what we need.
  • He's proud of that? Still touting that as an accomplishment?

    WAIS was the biggest piece of sh** to ever get steamrolled by the web ...
  • In a more general sense, copyright (and now license agreements) are to blame. There was a lot of talk in the "early days" about getting lots of stuff online, and it's slowly happening with, for example Project Gutenberg [] and alt.binaries.e-book [alt.binaries.e-book]. But currently this is slow; OCR technology isn't good enough to process things without an editing pass, and sharing the original scans currently requires institutional resources. That, combined with the periodic extension of copyright terms to cover almost anything created in the 20th Century has put a damper on volunteer efforts.

    One would think that libraries would be a great place to start with this at the institutional level. Even without scanning, a lot of recent journals come with electronic versions as part of the subscription. And they're bought and paid for, so copyright isn't an issue (as long as you belong to a subscribing library). But...restrictive license agreements to the rescue! This article [] on oss4lib [] describes a situation where librarians are required to scan paper copies of journals they have electronically for interlibrary loan purposes.

    Fundamentally, the movement to put a fence around information and charge for every view is at odds with aim to preserve it. If we want hardcopy to be available electronically, or electronic documents to be preserved at all, we have to change the rules, or ignore them. In the meantime, start a private collection in the hope of publishing it someday. Historians will thank you.

  • And what we're finding is, people are interesting, diverse and peculiar. They are constantly looking for new things that are of interest to them. Yep. That's it in a nutshell. I'm interesting, your diverse, and he's peculiar. Ohhh! He meant that people are attracted to an interesting, diverse, and peculiar mix of web sites. Why didn't he say so?
  • I love the Braille edition of Playboy.

    It has a hairy page!

  • If you're interested in the most massive digitization project in the history of the world, check out Questia []. Questia is positioning themselves to offer access to a tremendous amount of information, starting early 2001. From the web page: "Questia is building the first online service to provide unlimited access to the full text of hundreds of thousands of books, journals and periodicals, as well as tools to easily use this information." The web page also says they will start with 50,000 of "the most valued volumes in the liberal arts from the 20th and 21st centuries" and then build to 250,000 titles over the next couple of years. Granted, it's not the LoC, but it's a step in the right direction.
  • Personally, I think that concentrating anything is a bad idea. Anyone heard the phrase "Don't keeps all your eggs in one basket"? If All knowledge is conentrated, then sooner or later, it's all going to go pear-shaped, same problem with power, any of those famous empires still around today? No? Wonder why?
  • by Gendou ( 234091 ) on Friday September 22, 2000 @04:32AM (#761641) Homepage
    A professor of mine as well as myself and a number of other students are doing some indepth research on language and how it changes over time. One of our biggest problems at this point is finding sufficient samples of text data from strict editorial sources, so we have had to resort to using photocopied->scanned->OCR'ed National Geographic articles. However, now that we're moving on to a new phase of the project, we need ten times as much data to realize the accurracy of our results. As of now, sources of digital text are few and far inbetween, with no sources going back very far. Why is it that organizations in our society haven't invested the money and time into, say, digitizing the Library of Congress? I realize it's incredibly expensive and timeconsuming - that's what we discovered, but it would be oh so useful to be able to read publications from a hundred years ago on my web browser. It's also great to see modern material produced by our society being archived, but there's a lot of ancient history that should be put into a format that should last forever as well.
  • I guess I'd like to not have all the 'junk' there either, but as I understand it, the people who use Alexa's applet essentially select the content of Alea's database by surfing from site to site. I'd be really interested in how much so called junk moves in the rankings as the internet ages and it becomes less and less of a novelty. I would like to think that a lot of the 'junk' sites would drop in ranking, but who knows?
  • Interesting how Kahle talks about the top 10,000 sites getting 80% of the web's traffic... But what does that say for Alexa when his site can somehow handle 30TerraBytes of crap, but can't handle a little traffic boost from /.


  • Was the Library at Alexandria a giant server farm?
  • As I understand it (from the little bit I read on their site, and from stuff gleamed from the interview), there's an program you can download from alexa's site ( When you run it, I imagine that it tells alexa what sites you're visiting. So their hitcounts are only from people using their program - though I could be wrong.
  • You honestly expect us to believe there are no pron sites in the top 100 www sites ?
  • One useful step in that direction is JSTOR []. They have digitized the entire contents of over 100 leading journals [] in a range of academic fields, and made their abstracts searchable as well. Unfortunately, you or your library must subscribe to get at the goodies.
  • Yes, I also don't like his way of thinking regarding payment for content - and especially analogies to the printing/publishing world. I'm afraid that Kahle strongly believes in current copyright laws and he doesn't notice systems like Napster or even more Gnutella (where people share things without any "revenue model") in his publication concentrated vision.

    And by the way - why does he think that all web pages are made for profit and would disappear without it?

  • I checked out the website, and was impressed. Then I went deeper. The fact that they base all of their rankings on people who use their plugin, automatically invalidates all of their data.

    They also have a deal with Netscape and MS-IE to use some of their "Features" to gather data, but don't apply it. They also readily admit that they really only gather info from Windows users.

    >In addition, we remove from the list certain
    >technology, graphics, and banner ad-serving
    >domain names, as well as sites operated by Alexa Internet.

    Translation: "We give average people the results that they expect. We pretend to rank the world, but only things we want to."

    I do think the idea of havng all websites made stored someplace is a good idea. It will be invaluable to anthropologists in the future.

  • The Phoenix main library has a Braille Playboy, believe it or not.

    From the FAQ: []
    Does anybody really read Playboy for the articles?
    The articles may not be the first part of the magazine most readers turn to, but judging from the letters we get, millions of Playboy readers also enjoy our award-winning journalism, humor and fiction. The only people who can rightfully claim to read it solely for the articles are the thousands of blind readers who peruse our Braille edition, which has been distributed by the Library of Congress since 1970.
  • OH my god, that was the first thing I thought of when I rea the article.... lol I thought it was just me and my morbid side.
  • THANK you so very much! You are a life saver! It's a shame that the articles are just scans (or at least from what I've seen of the demo), but still, the quality is so good that OCRing should not be a problem.

    Many, many thanks for your suggestion!
  • "The greatest trick of the devil was convincing people he didn't exist."

    "to be in power you didn't need guns or power, just the will to do what the other guy wouldn't."

  • school is great
  • Give me a break.

    I realize I should probably post this as an AC, but I'd like it to be seen.

    If this redundant comment [] (it was posted after I did) is Interesting, then why is mine a Troll?
  • He's right about the ISP being the best place to start, what I don't know about is why would I use Alexa? Alexa sends my habits back to their servers.

    I'd rather my ISP brought me the information and services I was looking for and blocked the retailers and everyone else from tracking my path.

    I already have to trust the ISP, why trust anyone else?

    Doesn't the ISP have the most to gain from keeping me on their portal, as convergance comes about? Don't they have the most to gain by using My extra CPU cycles? I want my ISP to give me frequent flier miles! I'll turn 'em in for a better coprocessor or two and then they can use those extra CPU cycles, how's that?

    I'll tell them what my hardware needs are, and they go find the retailer, and I get a banner when the price is right. Oh, and that banner? It doesn't take me to the retailers site; It just signs me up for the group buy.

    I want my ISP to guard my privacy like nothin' else, and for it, I'll pay. Hey, I can lose my privacy and use any of the freebie ISP's? But what will I pay for? What would you pay for? Privacy.

    Now, if they could offer me ASP services that I can't afford to purchase, they could charge me more. I'm looking for a really really good searchbot. Heck, if they could offer me that, I'd
    let them have hardrive space!

    Josh Drvsh
  • Yeah, Minitel had "micropayments". Pricing was comparable to 900 numbers. Prices varied; the telephone directory was free, chat sites ("messageries") were a few francs a minute, and sites with official government data like lists of research projects cost about 4x as much as sex chat.

    That's the telco model of information pricing. The telcos had to be dragged, kicking and screaming, into the era of cheap communications and free content.

    The basic problem with micropayments is that all the enthusiasm for them is on the collecting side, not the paying side. Contrast this with credit card acceptance, which consumers actually want.

    On the web, there are are only two (non-porno) pay sites that do significant business. The Wall Street Journal [] and Consumer Reports. [] Both had top reputations in the print world. Everybody else who's tried it has bombed, including MTV. So pay-per-view is the wrong answer. Kale is way off base on that. His "ISP tax" idea is even worse. That sounds like something the RIAA would come up with.

  • There is a project to digitise all the Victorian fiction ever published: Chadwyck-Healey are doing it. I wrote about it in Wired UK some years ago. The same firm has done the whole of English poetry -- every single poem published in English before 1900 -- but it is expensive and subscription only.
  • There's nothing wrong with making archival copies of information you can come by legitimately. Alexa isn't redistributing the information, so they're fine. The did, however, at one point promise to give a complete copy to Kahle's non-profit Internet Archive. My understanding at the time was that the Internet Archive is legally structured to look much more like a library than a business, and libraries have a special status under copyright law.

  • I thought the British empire was the greatest concentration of power since the roman empire....

    Go figure.... Guess history *was* wrong after all...
  • I really do see the similarity between Alexa and Alexandria as a bad omen.
  • by redelm ( 54142 ) on Friday September 22, 2000 @04:02AM (#761662) Homepage
    Much as I like InfoTech, I don't like the Roman Empire analogy. Information can influence people, but it is NOT military power.

    Perhaps a better analogy would be to 400-1400 when the Popes and the Roman Catholic Church did hold a monopoly on religious information in the West. That ended with Gutenberg and the Reformation.
  • Does someone recalls a teen called Mafius Boyus who locked all roman empire's activities during several hours just for fun ?
  • by Ndog ( 230982 ) on Friday September 22, 2000 @04:04AM (#761664)

    That does seem like a very questionable statement to me. The top ten web sites are potentially powerful, but it depends what content they are serving up. If they are selling things, like Amazon, would that be so powerful? Sure, you can push certain things, but ultimately it's up to the buyer. Of course portals like Yahoo are powerful, but only when it comes to the content they are providing. Do they really have any power over my everyday life? What about people and cultures without so much internet access? Are they not even considered in this discussion?

    Besides, power is fleeting.


  • by darylp ( 41915 ) on Friday September 22, 2000 @03:59AM (#761665)
    This line was inexplicably removed from the final inteview: Q: "Thirty Terabytes? That's a lot, isn't it?" A: "Well, once we've taken out all the Spam, 'Make Money Fast' schemes, Pr0n, "w3 0\/\/N j00" homepages, Natalie Portman fansites, 'USS Enterprise vs. Star Destroyer' discussions, links to, and Jon Katz articles, we can fit it all onto a floppy."
  • How does Alexa avoid violating copyright? Linking is one thing, mirroring another.
  • Information can influence people, but it is NOT military power.

    I agree, but I think they're talking about power in general, not military power. There's a big difference.


  • Well it just goes to show that there are lies, damned lies and statistics

    I didn't see a warning saying 'These are the Top 100 sites on the WWW' excluding .....'

    If they start excluding anything from the generic top 100 surely the results start getting skewed.
  • Extending Alexa into the realm of helping people with products.

    30 TB of data, giant server farms, Library of Alexandria and all they're going to do with it is be another My Simon?

  • Did you look at Alexa's top50 of busiest websites in August []?

    I really wonder what [] does there above microsoft, geocities, ebay and altavista.

  • by CAIMLAS ( 41445 ) on Friday September 22, 2000 @04:59AM (#761672) Homepage
    Power is the ability to conform the will of others to your own. The Romans did this by killing their opponents, and by the threat of such things. These sites don't have such power - anything that people submit to are submitted to out of free will. Unless, of course, you count thinks like the ability to sell personal information. :)


  • Wow, imagine all the email addresses you could harvest from that...

    (Taco, I'm still bitter about the hard cap on karma...)
  • by Harri ( 100020 ) on Friday September 22, 2000 @05:01AM (#761674) Homepage
    ...and the disadvantage of a library, is that the stuff is selected by a librarian, according to a view of what is interesting that is specific to the age in which he is living and the culture to which he belongs. (Or she, or it). Thus I believe there is a good deal of value in the idea of a library which is not filtered at the time of collection, but which can be filtered at the time of reading according to the interests of the reader.

    For example: In three hundred years, pornography is viewed as a valuable cultural resource. A historian wishes to study the subject of pornography over the ages and relate it to the prevailing attitudes in those ages. The historian will be stuffed, because to a librarian now, pornography is clearly not suitable for inclusion.

    The history we have is much more a history of the rich and powerful, and not a history of the poor, because nobody wrote anything about the poor. Today, big scientific tomes are kept, but Joe Blogg's Geocities page (with exciting photos of him and his family and his cat) gets binned. In three hundred years this might be interesting historical evidence, the same as Joe Chimney Sweep's diary from 1800 or something.

    The technology to do this effectively might not really be here yet, but it will probably arrive in those three hundred years. (Unless we're all too busy looking at porn instead ;) )

  • So I'd say that he's on the mark with the content idea, and the web itself is a powerful distributor of knowledge and information. But the most concentrated since the Roman Empire? Almost. That's still the press/media.

    I wonder if that's the sense he meant it in. From reading the interview, I took his phrase to mean not that this is the most powerful group in the world (although that is still possible as many of these companies have off-line influence in spades as well), but that it is the most concentrated group. Television media, for instance, may rightfully be considered more powerful culturally, but it's also more distributed when viewed by number of "hits". These top ten sites, OTOH, are more concentrated in a small area.

    The analogy to rome in that sense is a good one, since most of the true power during the Empire's peak was concentrated in a very small area. Unfortunately, the idea of these small number of companies having equivalent power to the Empire is unfortunately untenable.

  • Admittedly, Playboy is the only sex magazine in the Periodicals collection, but a quick search of subjects Pornagraphy and Erotica turns up a bit of historical porn from 1911, numerous collections of erotic stories, and at least 3 biographies of porn starts with plenty of visuals (I know about the last 3 because I've read them all :)

    Librarys, good ones, collect a Huge variety of stuff. It is unbelievable the number of volumes of stuff in there on esoteric subjects. My favorite oddball find in UD's is a multi-volume collection on why the Masons are evil from 1893. A decent library will collect many items that seem of little use right now and keep them for future study. Really, the function of the library is to catalogue knowledge, not trim it.

  • Ok, who else here is not amused by Feed's ascetic hip post-modern minimalist interface? Wasn't "Buttons with no contextual information" cardinal sin #1 of web page design? What the hell are those stupid chicklet buttons on the have to roll over the damn 6x6 pixel things just to find out what the hell they mean. Maybe the colors are self-explanatory to really hip artsy people.
  • by Azog ( 20907 ) on Friday September 22, 2000 @07:37AM (#761678) Homepage
    Here's a reality check - for me, anyway: I honestly thought would be somewhere in the top 500. I was going to make a joke about "You know it's time to move on to kuro5hin when slashdot makes it into the top 100". Nope. Slashdot isn't in the top 1000.

    Linux doesn't show up in any of the top 1000 domain names, but windows does - once - in, which is about a TV-like a site as you can get, and a subsite of MSN.

    Google was 21st, was 37th, and was 970. Other than that, none of the sites I've bookmarked are in the top 1000.

    I guess I shouldn't be surprised that the web I see is nothing like the web most of the world sees. I am a little disconcerted though. No wonder the general public doesn't care about software freedom, DeCSS, software patents, privacy, etc. The awful truth is that for most people, the internet is like TV.

    What a depressing way to start a Friday.

    Torrey Hoffman (Azog)
  • Something he said in the interview sparked a little inspiration in me.
    DISCLAIMER: this is the result of being bored at work and the idea(s) here jump around a bit and might seem self contradictory. Don't worry, it makes sense to me - you don't really matter!

    Each server gives money to the site based on the amount of traffic it generates. This could be negotiated, bought at a flat rate or be a fixed %age. Developers get money for putting the effort inn and governments will like this because it constitutes an income - which can be taxed. Servers pay money to sites from money they are payed from ISPs for the priveledge of access to certain sites (note that this price will be decided by the market probably - more on this later). Normal people pay ISPs for the priveledge of having access to the internet (as it is today) and can also have web sites of their own on commercial servers if they pay for the space as happens now etc. ISPs pay telco's for cost of networks etc. There will be free servers that don't pay for any sites that they host and only pay basic costs and don't make profit in the same way - possibly raising funds from donations (charity or whatever) or things like advertising eg. banner ads.
    ISPs would pay servers according to the amount of data transfered (or just downloaded?) at a predtermined rate. This might have to be set by the government if things prove too complex (imagine hundreds of companies trying to negotiate with hundreds of other companies, all for the same thing).
    Big problem: governments could use this to, effectively, tax the use of data - restrict, influence or control the flow of information. What happens if the people find that accessing data online is jiust too expensive? This is probably where the free servers come in. You can up/down load all the data ou want for free so the servers cannot be taxed on this, transfer costs are illiminated. They would have to find another source of income such as banner ads like now.
    Telcos would probably have to keep records about the data movements to provide a basis for financial records.
    There WILL be a lot of shifting in market dominance, it will be a new market. How will this all come about? Servers will have to offer to pay to host sites (as above, pay based on traffic - rates can vary depending on contract used). The popularity of site will decide how much they are paid (survival of the fittest). If an ISP wants access to a certain site (to attract more customers) it will have to pay the host for access. Use of merket economics to change network topology. Free servers will be restricted only in how they earn and legal issues such as copyright - this part of the internet will probably resemble internet as it now and possibly be less commercial, though I am not entirely sure.

    As I said before, this is only an idea, not a solution - don't flame me!


  • holy shit man. Info has always been power. Keep the people ignorant, then you maintain the control. Look@ us nat'l security. The have all the power, cause they keep the info to themselves so they HAVE to make all the decisions. Sure, there is a difference between info and tanks, and the roman empire and todays world contrast very differently. But..just take a look at the russian revolution to see how info can = power.
  • It may seem imflammatory, but compare post British colonies with colonies by any other colonial power.

    Well, London was colonized by the Romans. So let's compare London to any of the places colonized by the Brits.

  • The LoC has been digitizing their card catalogue, (that's right--card catalogue--room upon room of those funny cabinets with long, slim drawers) not their collection. They've been at it for 20+ years and the last time I checked (three years ago, I admit) they had worked their way back to items added to the collection since 1976. That is, they can barely keep up with digitizing the *indexing* of the material added each year.

    Digitizing the collection itself is prohibitively expensive and time-consuming. I daresay you can't hire enough monkeys to turn pages and fire the scanner no matter what you pay, let alone proofreading, which requires the more costly labor of homo sapiens sapiens.

    One thing the ''print distribution nightmare'' gets you is a quality filter. Hrmmm, 2Tb filtered out of 2,000 years of production or 3Tb wiglomerated over 5 years?
  • However, that makes by definition the American media & Hollywood the #1 social power on the planet, not those sites.

    Last I heard, Hollywood was completely eclipsed by the production and exportation capacity of India's movie studios, though. We don't see it from here because the entertainment tastes of Asia seem to be very different. Also, given the relative land areas and population densities, I believe there are more easterners than westerners.

    In other words, if we *are* heading for a mediocracy (all sorts of fun wordplay there), America might actually be screwed. (We do have those nifty DNS servers under our thumb and that there Bill Gates guy, but how long can that last?)
  • Yes, they digitized their card catalog, but did they digitize the subject classification number and with it the subject classification subject headlines, so that they can be listed and one-click searched in a yahoo-like subject category tree ?

    Did anyone try to adapt or enhance the subject classification schedule to online content beyond scholarly books ? Did anyone think of using an extended subject classification schedule together with search engines like google and come up with a combined output ?

    Why is google so much better than, for example, the older Altavista search engine ? Because they incorporated all the subject classification done by users when they putt subject related link collections on their home web-sites.

    That's using categorization brain power of humans to an otherwise dumb, but fast search engine. Why not use the brain power, which went into LC subject classification schedule, and use its classification power for online content as well ?

    Why would we need to digitize every book ? It would be much progress already, if we just could truly find online content, as well as book titles classified a scientifically/scholarly manner, the way the LC has done it for years. This has nothing to do with scanning each book of the LC.

  • Though, as always, it's in need of editors.

    That's understandable. I have signed up thrice in three different categories to be an editor. I have not ever heard back from them. That means that either their registration/application process is so difficult or counter-intuitive that I cannot figure it out, or that they just don't give a shit if they get another editor or not. Either way, I'm not surprised that they don't have as many editors as they would like or need.

  • That's understandable. I have signed up thrice in three different categories to be an editor. I have not ever heard back from them. That means that either their registration/application process is so difficult or counter-intuitive that I cannot figure it out, or that they just don't give a shit if they get another editor or not. Either way, I'm not surprised that they don't have as many editors as they would like or need.

    Thanks for the comment, I'm bringing it to the attention of those people responsible for accepting new editors. It took me two applications to be accepted, and the first one seemed to have found it's way to /dev/null like yours did. If you do decide to apply again (once you're accepted, it's not nearly as bad as the initial application), just remember to apply to smaller categories with few subcategories (especially ones without any editor currently), and fill in the URL fields of the application.

    I do agree that what they did to you is a horrible way to get people to edit and even use the directory... :)
  • WAIS Inc. wasn't started till around 93-94, as a way to move it out of thinking machines. After a couple of years of independence, they sold out to America Online, who ran it as succesfully as all their other Internet aquisitions....


  • I don't know if you're going to read this, but if you applied recently (in the last couple months or so), you might want to try logging in as an editor with the name/password you chose. Apparently there was a bug causing editor acceptance letters to not be sent, so people wouldn't know they were approved as editors.
  • Right ! And that's why we should request from the companies who make big boo-boos like DC and Amazon to make it up to us and be sent to Congress.

    Instead of sending us serial numbers hidden in CueCats to voyeuristically watch our one-clicking compulsive moves, the Alexas and Amazons, Forbes/Wired/DC et al. should be sent to the most knowledgable branch of the government and be confronted with the people's will.

    Kahle says:
    "The Library of Congress -- the largest library now -- is seventeen million. Only thirty four times more than what we had in 300 B.C. It indicates that the technology hasn't scaled. But now we've broken through into a new technology that allows us to bypass the Library of Congress in very little time, and the sky's the limit. What can we discover about ourselves as a species? As different peoples? Are we couch potatoes or do we actually have independent will?"

    Bypass the Library of Congress and its accumulated knowledge ? To replace it with the big intelligence of 0s and 1s ? Wow, hold a minute !

    You ask, do we actually have an independent will ? Sure, Mr. Kahle, we do !

    And that's why I send you my independent-minded CueCats to get all your 56 million instances of ISBNs with one-click (not paying license fees) and let some people, who actually have some knowledge dealing with them, get involved. Meanwhile you can "find the business model of the web" and let the content rot a bit more.

    Wouldn't it be a great match-maker game to hook up Alexa, Amazon, DC/Forbes/Wired and the Library of Congress ?

    They all have something to make up for, Alexa, which has technology, but no brain, and the Library of Congress, which has brain, but no technology, and Amazon, which has the great drive, but no direction, and DC/Forbes/Wired, which have direction, but don't know how to drive ?

    Amazon made a boo-boo with selling one-clicking seductions, qualifying as a drug-dealer and patenting this business process, Alexa made a boo boo selling out to Amazon, searching for business models instead of searching for knowledge, the DC/Forbes/Wired gang caved in to voyeurism through CueCats and the Library of Congress slept happily instead of kicking their citizen's butt and ask for money so that they could put their knowledge to use in the digital world.

    Let these people kiss and make up ! Turn your trash heap into a library and your information into a knowledge, Mr. Kahle. Give your buddies a chance to be good guys. I want a happy end !

    If you can't help but being cowboys, so be it, but well, I like those guys who save the farm for the blonde and then heroically ride away in the dust of the prairie, those lonely unsung heroes, which you wished were real and not only Hollywood's phantom boys.

    Get real, use your brains, save our dignity and save your dimes for the BIGGEST LIBRARY ON EARTH ! (and NOT for Amazon/Alexa/DC, because they ain't one).

  • A spider archiving the 'net I can understand. But How can he measure hitcounts? Does he just read and believe the countboxes? What of sites that don't have any?

    Still, traffic distribution is interesting.
  • Can't agree with the Papal analogy either. You've completely ignored the Scholastic movement which utilized Moslem-Judeo translations of originally Greek works as a primary source. Not really controlled by the Papacy (at all.)
  • You miss the point that this concentration of power is larger than anything that has preceded it since the Roman Empire. If you read the artical it says that the top 10 sites control 20% of what all people around the world see on the web. 20%!! That's an amazing amount of media power to be held by so few people. So few people able (if they got together and had the desire) to strongly influence the way the Internet population feels about an issue - and more important tell them they should buy Product X instead of Product Y.
  • "Library" ?
    Is that all I have to call my collection mp3's to get free publicity and left alone by the RIAA?

    I'm sure we all know that he has the largest collection of MP3's in the world! (with maybe a TB each for videos and pr0n)


  • by Markvs ( 17298 ) on Friday September 22, 2000 @04:18AM (#761694) Journal
    His assumption of power concentration would be true, if the net was the major medium for all, which it is not. That crown, for better or for worse, is still television.

    However, that makes by definition the American media & Hollywood the #1 social power on the planet, not those sites. Sites will come and go. It's not the hits that count. There are countries with no web access or very restricted access (Chad, Syria, almost anywhere in the 3rd world), yet these countries get much more "Americanization" via movies & print literature.

    So I'd say that he's on the mark with the content idea, and the web itself is a powerful distributor of knowledge and information. But the most concentrated since the Roman Empire? Almost. That's still the press/media.

  • Influencing other people gives potentially more power than conquering their land, killing them or threatening to do so (only uses of military power). When you force someone to do something he won't do it well, but when you will make them believe in what they are doing - they will be doing it to the best of their abilities.

    So your religious analogy is not that bad when you notice that monopoly you wrote about wasn't based on military power (Vatican as a state never had that powerful army) but on influencing people into believing in their cause.

  • I _really_ hope you arn't being serious....
    Have you _any_ concept of what you're saying?

  • Does that make Jack Valenti to be the Mule?
  • by zlite ( 199781 ) on Friday September 22, 2000 @05:21AM (#761698)
    I always thought Brewster's neatest trick was getting his company this amazing space in San Francisco's leafy and spacious retired military base, the Presidio. It was reserved for non-profit firms, so he said that Alexa was archiving the web. Then, lo and behold, he found some commerical application of that library (does anyone actually use that "context" bar thing?) and sold the company to Amazon for a bazillion dollars. And kept his space!
  • For example: In three hundred years, pornography is viewed as a valuable cultural resource. A historian wishes to study subject of pornography over the ages and relate it to the prevailing attitudes in those ages. The historian will be stuffed, because to a librarian now, pornography is clearly not suitable for inclusion.

    I don't know about you, but my porn will always be in my library:)


  • Using their internal mirrored/cached hitcounts or those reported by their trojan are highly suspect.
    How many people have even heard of Alexa. I bet it isn't in any true top 10 list.

  • True. I remember hearing of them, but I've never bothered to download their app. I looked at it today, but decided that I wasn't really interested in using it. Kahle says they're getting 500,000 people using Alexa day to day, so their rankings, I would think, are not overly accurate. Still an interesting project, especially if/when more people use it.
  • ...and they should be public too, I think.

    When deja took away the newsgroup archives pre-99, I was at first outraged, and then of course I realized that they're a business and not a public resource.

    The wealth of human knowledge available in the newsgroup archives is immense and extremely useful on a day-to-day basis. A repository of public newsgroup archives would be a great public resource, and I'd love to see a project that gets shares that knowledge with the world. Hopefully this project will go that way, but I dunno if usenet is included in the 30 terabytes.

    Hopefully we can also get these archives without the annoying product links inserted in them. :]
  • Library: A place where information (esp scholarship) is stored, cataloged and crossreferenced by subject. Even the most comprehensive libraries makes choices about what to hold in their collections, what to specialize in and what to throw away. A heap of undifferentiated "content" (surely the most insidiously misleading word in the history of the internet) is not a library. A trash heap, even a searchable, interesting-to-rummage-through trashheap, isn't a library.

    And don't even get me started on the difference between social power and information. Suffice to say that the Library of Congress is not the most powerful branch of government, even if it's the most knowledgeable.

  • From the: Separate-but-equal-things dept.

    The popular sites made money. And when we came out with ways that...the Web, it all came out of the wrong places. ...a lot of the economics had turned into something quite bizarre -- in which the advertising world tends to benefit the large-scale publishers. ...But the royalty system of books has preserved a diversity of book publishing that is unparalleled in magazines, newspapers, video.

    I believe that Kahle is speaking erroneously about this subject, because he is speaking of oranges and apples.

    The royalty system of economic gain on products works for such things as newspapers, books, and hard-matter videos. This is because they are tangible, solid objects. If I want a paper, I can't just go download one -- not in paper format, mind you. Just in digital. Hence, when I have to either subscribe or drop a quarter into the hand of a street-side vendor, I'm paying that royalty on something distinct,tangible, and traceable, three aspects which I believe make it hard to do the same internet-wise.

    Granted, porn sites seem to be making a decent living because they charge for content. But really, how would you like to have to subscribe to read Slashdot? You wouldn't. You'd wait till your buddy, who gets /., would email you the story/link/comments/etc. Or you'd share a subscription. I lived in college once, there was no end to how cheap cable TV could get if you had enough coax and splitters.

    Digital content isn't tangible. You can't HOLD it. You can't walk with it to the checkout counter. It just wouldn't work as well, using traditional royalty models, to try and make money off of content, because there are too many ways around it and people do not receive that tangibility as a reward for their money. Sure, some of us might pay a few cents to read a good paper online, but my father would think it ludicrous to pay $0.25 to access a website. However, he'll easily drop $0.75 on a paper, because he can hold it. It's amazing what reality can do to people.

    I'm not all against someone figuring out how to charge for content if it is done properly and fairly. (granted, everything free is great, and honestly, I like the idea that I pay one flat fee to get everything) I just don't think that our traditional ways of thinking about it will suffice.

  • by hiryuu ( 125210 ) on Friday September 22, 2000 @04:19AM (#761705)


    And I think the right place to tax is the ISPs.

    And here:

    Right now, people are paying all of their money to use ISPs but the ISPs don't have to pay for the content.

    Part of the reason I don't like that notion is because it starts a level of accountability that I wouldn't be comfortable seeing. Where would the tracking begin - or end, for that matter - so that the proper payment balance could be provided? Which ISP - the one the surfer is using to view the content, or the one hosting the content? I imagine he means the latter - and that bothersome. If an ISP can be held financially liable for content that a user provides - regardless of who the copyright holder/content owner is - then how long before said ISP decides to host only content that's marketable and profitable? Draw your own conclusions about where the picking and choosing would go from there.

    Another reason I don't like it - not necessarily a valid one, but definitely a personal one - is that it commercializes the web that much further. There's already enough corporate-owned and profit-driven crap here. It's not like we need more like that.

    Kahle mentions that something like ASCAP is needed, but he himself talks about the nasty history behind his example's development. He also throws out AOL as an example of a company in the "best position" to implement such a thing. Like we didn't have enough concerns about content ownership/control/marketing without an endorsement like that...

  • Once I saw that he surrpassed the Library of Congress, the drool was running off my chin. :)
  • by streetlawyer ( 169828 ) on Friday September 22, 2000 @04:19AM (#761707) Homepage
    He misunderstands the concept of a "library". A library, in its historic definition, is not just a heap of information and publications; it represents someone's selection and preservation of worthwhile knowledge. A massive cache of shitty Geocities sites, corporate bumph and pathetically precious weblogs is not a library by any stretch of the imagination. A library isn't a library unless it has a librarian, deciding what needs to be preserved and, importantly editing out the dross. His servers might have three times as many bytes as the Library of Congress has letters, but I know which one I'd rather spend an afternoon with.

    The Internet will be useless as a repository of knowledge until it is quite ruthlessly edited. I doubt any posts on this thread (including this one) would survive in a proper library.

  • Which, I suppose, is why you never hear any criticism of Amazon on the Web .... errrr .... kind of wraps it up for your "media power" theory, doesn't it?

I came, I saw, I deleted all your files.