Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Carnegie Mellon's Digital Library Exceeds 1.5 Million Books

Journal written by cashman73 (855518) and posted by Zonk on Thu Nov 29, 2007 08:30 PM
from the might-just-be-enough-to-read dept.
cashman73 writes "Most Slashdot readers are probably familiar with Google's book scanning project, a collaboration with several major universities to digitize works of literature, art, and science. But Google may have been beat to the punch this time -- about a decade ago, Carnegie Mellon University embarked on a project to scan books into digital format, to be made available online. Today, according to new reports, they now have a collection of 1.5 million books, the equivalent of a typical university library, available online."

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Carnegie Mellon's Digital Library Exceeds 1.5 million Books 50 Comments More | Login /

 Full
 Abbreviated
 Hidden
More | Login
Keybindings Beta
Q W E
A S D
Loading ... Please wait.
  • Link here (Score:5, Informative)

    by autophile (640621) on Thursday November 29, @08:36PM (#21527159)

    http://tera-3.ul.cs.cmu.edu/

    • Re: (Score:2, Informative)

      Another link here

      http://dli.iiit.ac.in/ [iiit.ac.in]
      • Re:Link here (Score:4, Informative)

        by Rebelgecko (893016) on Thursday November 29, @11:09PM (#21528251)
        If you're looking for the Mac or Linux versions of the plugin, try rereading the part of the page that says

        To see the book pages of ULIB, please dowload free TIFF plugin or DjVu plugin
        Then try following the link to the DjVu plugin and downloading the Windows, Mac or Unix one, depending on your what you need. They're available here [lizardtech.com].
        [ Parent ]
        • Re: (Score:2)

          Can anyone point to a way of downloading the plugin that's NOT wrapped in a Windows executable file, in order to install it on Mac Firefox?

          The only Mac plugin option they offer is for Safari, and it doesn't seem to work or have any ability of installing it
          • Re: (Score:2)

            I wonder if there's a way to implement the whole viewer as a Java applet or a client-side Javascript or something...
            The Javascript type of thing is definately possible. A proof of concept [google.com] was even mentioned in TFS.
  • Nice to have alternatives (Score:5, Informative)

    This site (which is found at ulib.org [ulib.org] BTW) seems to have a pretty good collection of obvious titles to choose from, though having to download a custom plug-in to read anything is a bit annoying (and apparently temporary). I played around for a while, seeing what I could dig up, and didn't see any obvious gaps (though I purposely avoided anything modern).

    As an author, I was always a bit worried having Google as the sole gatekeeper for this kind of service... not that I necessarily distrust Google's intentions, but if they changed their worldview one day, it'd be a pity to have so much work invested in only one place, and have to re-build it all somewhere else. It's nice that there are proper choices, and not all from a commercial stance either.

    I don't know how smooth the integration process is (I submitted one of my books, but it appears it's a very un-automated system involving email etc, so it will probably take a while to see results). But still, I'm glad they're giving authors a way to help grow the library. Here's hoping it becomes even better than its promise!
    • Re: (Score:2)

      I'm quite impressed by the years of books they have offered. While I figured that many of the books would be out of copyright and a few would be done with permission, I was shocked to see that they have nearly 1/2 million books published after 1981.

      Check
    • Re: (Score:2)

      This site (which is found at ulib.org BTW) seems to have a pretty good collection of obvious titles to choose from, though having to download a custom plug-in to read anything is a bit annoying (and apparently temporary).

      I agree that custom plug-ins suck.
    • Re: (Score:2)

      ... though having to download a custom plug-in to read anything is a bit annoying...

      You don't need a special plugin. You just need to specify a program which displays tiff files to your browser.

  • Because apparently the Slashdot editors can't be bothered...

    http://www.ulib.org/ [ulib.org]
  • search engine (Score:5, Funny)

    by HemmingSay (1136561) on Thursday November 29, @08:55PM (#21527321)
    i really like the idea of online libraries, but i had to laugh when i got the following result for the first book that came to mind: "Please provide a valid query (Word greater than length 3)" the book was "the old man and the sea".
    • Re: (Score:2)

      Maybe they should outsource their search features to google.
  • by chipasd (1135399) on Thursday November 29, @09:17PM (#21527477)
    For those that missed the articles about C.M.'s associated project for validating all those scanned words on all those scanned pages: http://recaptcha.net/ [recaptcha.net]

    reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA.
  • on book from '20s

    wow. Universal access pffft
  • need a visualisation (Score:4, Funny)

    by ross.w (87751) <rwonderley@@@gmail...com> on Thursday November 29, @09:30PM (#21527565) Journal
    So how many Libraries of Cogress is that?
  • by liftphreaker (972707) on Thursday November 29, @09:51PM (#21527655)
    I picked a book at random, Dickens' tale of 2 cities. Here's the first few lines:

    "TIT was the best of tunes, it was the worst of times,..."

    "li was tie winter of despair, we had everything before us,..."

    I guess they just OCR'd books en-masse without proof reading. Oh well, think of it as an exercise for your brain.
    • Mr. Burns: 'Lets see. It was the best of times, it was the "blurst" of times! You stupid monkey!'

      Maybe CMU just needs to hire smarter monkeys...
  • Lirbraries Are Not Dying (Score:2, Interesting)

    The definition of a Library is just changing. When you look at a small Internet cafe what you are really seeing is the modern version of a Library that also caters for those who wish for some refreshments. If the old Dickensian hard copy libraries want to
  • A lot of these books would languish in obscurity, only to be touched by very few people. Now the information is available via search, which means even more useful information can be had and these lost "works" might finally serve the purpose they were mean
  • by aminorex (141494) on Friday November 30, @12:34AM (#21528893) Homepage Journal
    1.5 million books? Ok, maybe my tastes are a bit more focussed on mathematics, physics, programming, economics, and linguistics than would be the CMU library, but I just burned 3 DVDs worth of math books alone, 12GB of PDF, at roughly 8MB/title, for 1500 titles. And that was just one week's worth of crap filtering for one man. Methinks CMU isn't really trying.
  • Last time I counted, I had 800,000 e-books on disk. For a large institution, I'd expect better. Their collection probably isn't mostly sci-fi and D&D manuals though :/
  • It packs black and white images like crazy, though a Firefox plugin would be nice, this really is one of the best online book viewer I've seen technology wise. It's fast and pretty easy to interface with scripts, and all the images seems to be cropped.
  • Wow! (Score:2)

    ...that's nearly as big as my...er...friend's...MP3 collection.
  • not (Score:2)

    That just wasn't a good experience. I found the one book I looked for (Pilgrim's Progress) but I found the User Experience next to bad. They need to kick that up a couple of notches before I would use this over Google's books.
    • Re: (Score:2)

      Information is not free, and it wants to be largely neglected by a significant poriton of [american] society!

      Heil Physical Media!
    • Re:Yay! (Score:5, Insightful)

      by Joe Tie. (567096) on Thursday November 29, @08:54PM (#21527315)
      Traditional libraries are long dead in a pretty significant percentage of the US. I live in a fairly large city, and it's pretty much useless for anything but the level of book one would expect high school students to need. No real database access, no journals, very little in the way of primary sources for anything. It's all novels, magazines, newspapers, "subject X for dummies", and out of date encyclopedias. The wireless access there has been useful at times, but that's about it. You don't get a good library without a public willing to put in the requisite money, and fewer and fewer people are.
      [ Parent ]
      • Re:Yay! (Score:4, Insightful)

        by dlevitan (132062) on Friday November 30, @04:12AM (#21529967)

        Traditional libraries are long dead in a pretty significant percentage of the US. I live in a fairly large city, and it's pretty much useless for anything but the level of book one would expect high school students to need. No real database access, no journals, very little in the way of primary sources for anything. It's all novels, magazines, newspapers, "subject X for dummies", and out of date encyclopedias. The wireless access there has been useful at times, but that's about it. You don't get a good library without a public willing to put in the requisite money, and fewer and fewer people are.
        How many people actually want journals and technical books? You're talking about a very small portion of the population. The goal of a library is to cater to what people want - and that's mostly basic books about how to do basic things, popular fiction/nonfiction, magazines, newspapers, and basic encyclopedias. There are only two types of people who want access to journals and the like: scientists at companies and universities (who already have it as provided by their employer/school) and the few people who aren't employed in a field they want to learn about. Its not worth thousands of dollars/year/journal for a library to subscribe to even one journal when 2 people will ever read it.

        If you really want access, then you have to pay up and/or take the extra time to find somewhere you can get them for free.

        First, in my field (astrophysics) most articles are now e-printed or at least opened up after a few years. ApJ (Astrophysical Journal) has unrestricted access to all articles older than 3 years and all articles older than 1996 are available at a free NASA/Harvard site (ADS). So basically, unless you want the absolute latest articles (which for most things you don't need) you can get them for free (and even then usually through arxiv). And if you need the latest article then, as you said, pay the fee and buy it.

        Second, if you need some kind of technical book, talk to the librarians. Most of them will try to help and you can usually get it for free (or a small fee) through an inter-library loan. It might take a few weeks, but you can definitely do it without even leaving the library.

        Third, take a look at the universities near you. Most allow open access to the stacks and computers. You can spend a whole day reading a book or using the university computers to access journals without paying anything. Some even allow borrowing privileges for free or for a fee. Take a look at Columbia in New York City [columbia.edu] or UCLA [ucla.edu].

        So yes, public libraries don't have journals. They're far from dead though, because they don't serve that need. If you really want those sort of things, then you need to go out there and get access yourself.
        [ Parent ]
      • Re: (Score:2)

        In small towns, public libraries are a resource for local history. In medium towns and up, financial databases like Value Line may be available. In all cases, as you said, libraries are a source of novels, magazines, and newspapers for free, which can be a
    • Hey! CMU did not kill it.
      I continue to own large number of books as print copies (Churchill's 6 vol second world war, William Shirer's Rise and Fall of 3rd reich, Clausewitz On War, Arthashastra, etc ).
      I do own many books on mobipocket copies, but nothing
    • by truesaer (135079) on Thursday November 29, @09:04PM (#21527405) Homepage
      In the FA it stated that most of the digitization was done in India and China. Low wage poverty-level workers, how dandy. Am I the only one who found it odd/sad that "we" digitized our knowledge with uneducated, underpaid slave labor? Maybe they were allowed to read some books and get educated? Nah.


      In case you haven't noticed, the economies of India and China are booming...in large part because of the offshoring/outsourcing from more developed countries. The wages and employment opportunities only get better in India and China due to projects like this.

      [ Parent ]
    • Re: (Score:2, Informative)

      Sure, most of the digitization was done in China... but the vast majority of the books on the site are Chinese, too. Of the 1.5 million books in the collection, almost 1 million of them are Chinese. English accounts for most of the rest at 362508 books.
      • Re: (Score:3)

        > Those books are still copyrighted, the publisher won't sell you a copy, yet they
        > want to deny everyone access to it.

        They have to follow the law so I forgive them on books under copyright. But they don't appear to even want to make it easy to acce
    • by theMerovingian (722983) on Thursday November 29, @09:33PM (#21527579) Journal

      Copyright law in the US started out pretty reasonable - 20 years from the date of registration. Walt Disney spent alot of money and lobbied the government for another 20 year period. Before this could expire, they lobbied to have copyright terms extended to the life of the author plus 20 years. As a result of the Sonny Bonno act, it was expanded to the life of the author plus 75 years. (NOTE: this is a very brief approximation of US copyright law history - it was actually somewhat more complex than this and with several more twists and turns). See here for a detailed explanation. [copyright.gov]

      The functional result of this lobbying is that no US copyrighted work created since 1923 has lapsed into the public domain (unless the owner screwed up by not renewing the copyright at the appropriate juncture).

      [ Parent ]

        • It is ridiculous that drug companies can spend billions of dollars on research for a drug patent that only lasts 20 years, while any pot-smoker with a guitar can write some song and the US government will grant him a monopoly that potentially extends well
        • Re: (Score:2)

          It depends on your definition of "most people".
    • Already been done. Check this site: http://www.teach12.com/store/courses.asp?t=&sl=&s=905&sbj=Literature%20and%20English%20Language&fMode=s [teach12.com] I've listened to some of their recordings and they were pretty good.
    • by phantomfive (622387) on Friday November 30, @02:32AM (#21529523) Homepage Journal

      As an average, educated male, I hate being in a discussion with someone who name-drops a book I never heard of before, as a proof that my point is invalid because I am not well read enough. It's the ultimate bitch-slap of the intellectual boxing
      If something like that stops you, then you totally need to work on your technique. What you have there is a clear and you fell for it even though it was ONLY IMPLIED.

      If someone comes up and says, "oh, this book clearly proves my point" then you can easily come back with, "Interesting. What does it say?" And you're off again, arguing the truth against real facts. Don't let them escape by saying, "oh, it's complicated." Respond, "it's ok, I have time. Please explain."

      The point is, make your goal to find out the truth, and you will always win. Don't defend ideas anymore once you know them to be false. Switch over as soon as you know you are wrong, and you will always be right. Not to mention switching drives your opponent batty.
      [ Parent ]
    • Re:Well... (Score:5, Insightful)

      by agrippa_cash (590103) on Thursday November 29, @10:36PM (#21527967) Homepage
      You are mistaken, and for this you should be glad. It often takes several years for masterpieces to be recognized as such, so it shouldn't surprise you that nothing you like has been acclaimed. I'm not a high culture joe myself, so please don't be offended, but today's high culture may be incomprehensible to you because you aren't sophisticated enough to appreciate it. If you grow up watching Fantasia, it is easier to enjoy Stravinski. As for originality, the tale is in the telling. People of years past lived and died much as we do, a bit more fresh air and hard work maybe but basically the same. Basically. They were us first, what are you going to do? Culturally we are far, far ahead of the 1907 crowd. Your image of 1899 is almost certainly based on the western upper class (listening to Wagner) rather than the teeming western poor (listening to minstrel shows) or the uncountable colonized listening to whips, maxim guns, pickaxes and sermons.
      [ Parent ]
    • Re: (Score:2, Insightful)

      Also worth asking, are you willing to learn 2000+ year old greek to read Euclid or for Euler learn Latin (the language in of scholarship in his time)? One reason that we have and use more modern math textbooks is changes in language and notation over time
    • Re: (Score:2)

      Want to learn geometry? Read Euclid. He wrote his books thousands of years ago. Calculus? Euler is your best teacher, and has been so since 1700s.

      What about chaos theory? Theory of computation? Axiomatic set theory? Topology? Large chunks of modern probability theory?

      Mathematics is developing more new material faster than it ever has.

    • Re: (Score:2)

      Original groundbreaking technical literature is often very difficult to understand. The author struggles to describe the new concepts. Many years later, other authors can simplify explanations and remove dead ends and needless excursions into side cases. A
      • Re: (Score:2)

        With libraries, lugs, dnd, etc being supplanted by impersonal online replacements, where can a gay geek go to get some cock?

        Have you considered visting Senator Larry Craig?