Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Open Library Project Takes Flight

Posted by ScuttleMonkey on Mon Jul 16, 2007 05:34 PM
from the alexandria-green-with-envy dept.
Aaron Swartz today announced the launch of the new Open Library project. The goal of the project is to produce the world's greatest library on the Internet free for anyone to use. Starting with the Internet Archive's book scanning project and organizing the insertion of new content via a wiki-type model the project seems to be off to a great start. The demo, source code, and mailing lists were all opened up today in hopes of drawing interest from the public at large.

Related Stories

[+] Open Library Goes Online With Public Domain Books 103 comments
mrcgran writes "A competitor to Google Book Search emerges as the Yahoo-backed Open Content Alliance launches an 'open library' of its own. After several years of scanning and archiving, the Internet Archive and the Open Content Alliance this week unveiled the Open Library, their attempt at bringing public domain books to the masses. The Internet Archive has hosted texts for quite some time, but the Open Library makes fully-searchable, high-quality scans of books available, along with downloadable PDFs. It offers an experience designed to match paper: there's even a page-flipping animation as readers move forward and backward through the book. Ben Vershbow of the Institute for the Future of the Book says that when it comes to presentation, 'they already have Google beat, even with recent upgrades to the [Google Book Search] system including a plain text viewing option.'" We have previously discussed this project, though this is a bit more complete rundown on the initiative.
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • Awesome (Score:2, Funny)

    by CrazyJim1 (809850) on Monday July 16, @05:41PM (#19881975)
    (Last Journal: Sunday November 06 2005, @10:30PM)
    Project Gutenberg(sp) never really had a large enough selection to interest me. I would like to see how they do this new library.
  • PG (Score:1)

    by Yaksha42 (856623) on Monday July 16, @05:42PM (#19881981)
    Everything about this "Open Library" - from the colors to the fonts used - looks just like Project Gutenberg [gutenberg.org]. Am I missing an important difference?

    Perhaps this is going to contain books still under copyright? I doubt the full text will be available, which makes this "library" pretty useless.
    • Re:PG by drMental (Score:1) Monday July 16, @06:29PM
      • Mod parent up by Anonymous Coward (Score:2) Monday July 16, @06:38PM
  • In response to your question: (Score:4, Interesting)

    by CaptainPatent (1087643) on Monday July 16, @05:43PM (#19881987)
    (Last Journal: Wednesday April 25 2007, @08:46AM)
    FALTWSBTFA: (From a link to what should be the feature article [openlibrary.org])

    What if there was a library which held every book? Not every book on sale, or every important book, or even every book in English, but simply every book
    It would probably be sued for copyright infringement.
  • Project Gutenburg (Score:2)

    by mickwd (196449) on Monday July 16, @05:43PM (#19881993)
    Have these guys not heard of Project Gutenburg [gutenberg.org] ?

    It's been around for years, and I thought it was pretty well-known.
    • Re:Project Gutenburg (Score:5, Informative)

      by AaronSw (598481) <me@aaronsw.com> on Monday July 16, @05:53PM (#19882075)
      (http://www.aaronsw.com/)
      Hi, Aaron Swartz here. Project Gutenberg is about putting up text versions of out-of-copyright books. This project is about creating a catalog of _every_ book, with links to PG, scans, Amazon.com, PDFs, print on demand, etc. -- anything we can get our hands on. Gutenberg books are in our catalog, of course, but so are millions more.
      [ Parent ]
      • Re:Project Gutenburg by Anonymous Coward (Score:1) Monday July 16, @06:19PM
      • Re:Project Gutenburg by Reziac (Score:3) Monday July 16, @06:35PM
        • Re:Project Gutenburg (Score:5, Insightful)

          by PMBjornerud (947233) on Monday July 16, @07:22PM (#19882683)

          I find these scanned original pages FAR more restful to the eye than any other form of electronic book. This way, I can sit down and read a complete book on the screen -- without suffering the eye fatigue that comes from reading large swaths of ordinary onscreen text. I think it has a lot to do with print fonts being designed specifically for the eye, and somewhat to do with the normal yellowing of paper that produces a less glary background.
          This does not make sense. A scanned document will always have artifacts and imperfections from the scanning process and should by definition be harder to read. A well-sized font on a pleasant background should beat scannded text every single time.

          Your issue is more likely that there are a lot of crappily designed webpages out there.

          If you're reading "large swaths of ordinary onscreen text", do this:
          - Copy-paste in into any word processor
          - Choose a nice, big font. (Small is good for UI, not for 400-page-novels.)
          - Use a dark background. A page reflects light, a screen projects it. You do not want glaring white.
          - Use 8-10 words per line.
          - Profit! Err... less mental exhaustation, at least.

          Pay extra attention to words per line. It's a key reason onscreen text is often hard to read. Too many words per line, and you'll have a mental overhead every few seconds trying to figure out which line you just read and which is next. Basically, books do it right and you want to display onscreen text at a similar width. Scrolling is easy these days, and wide lines is a remnant from when computers required a click-and-drag to scroll.

          Wide books and newspapers are divided into columns. There is a reason for doing this, but almost nobody seemed to think about that when they display text on screens.

          Heck, even slashdot defaults to a glaring white background and text stretched all over my 1920 pixels. Go figure.
          [ Parent ]
        • Re:Project Gutenburg by hcdejong (Score:2) Tuesday July 17, @03:51AM
        • Re:Project Gutenburg by furzburz (Score:1) Tuesday July 17, @09:58AM
      • Re:Project Gutenburg (Score:4, Interesting)

        by Fallingcow (213461) on Monday July 16, @07:58PM (#19882897)
        (http://www.fallingcow.com/)
        What I really want are some modern, well-written footnotes and introductions to older works. Maybe throw in some good annotated maps when appropriate.

        Older books are often hard to relate to without some context, and that sort of thing is what makes or breaks many editions of the "classics", IMO. If, when shopping for books, I pick up a copy of a book that was written more than 200 years or so ago, and it has no foot notes, most of the time I won't buy it. This is doubly true of translated works.

        Wikipedia can usually stand in for an introduction, but there's nothing like footnotes to get you closer to an older text, and nothing that I know of provides that. If someone started a project to provide that kind of information for Project Gutenberg books, I'd get on board to help. Bonus points if they're also putting them in formats that don't suck (making plain text look good on the screen is a pain in the ass).

        I'd start it up myself, but alas, I am poor (college). I'd definitely help out if someone else got it going, though.

        Until someone does that, PG is practically useless to me.

        Will this project do anything like that, or do you know of anyone who's doing this?

        It seems to me that 500-1,000 really well-edited, footnoted, and formatted free books are better than 21,000 books worth of plain-text barf.
        [ Parent ]
      • Re:Project Gutenburg by overbored (Score:1) Tuesday July 17, @01:32AM
      • Re:Project Gutenburg by LeadSongDog (Score:1) Tuesday July 17, @02:48PM
  • question... (Score:2)

    by joe 155 (937621) on Monday July 16, @05:43PM (#19881999)
    (Last Journal: Wednesday September 20 2006, @10:30AM)
    someone asked a good question on the website; how does this relate to Gutenburg?

    http://www.gutenberg.org/wiki/Main_Page [gutenberg.org]

    they have a great collection of ebooks online already and your free to grab and share them. I wish that they would have the base for this though in a country which doesn't have insanely long copyright laws, then it could really add value over gutenburg
  • Relevance? (Score:2, Insightful)

    by cdrguru (88047) on Monday July 16, @05:43PM (#19882001)
    (http://www.infinadyne.com/)
    As long as it is limited to rather dusty tomes that are "out of copyright" this is going to have limited, if not zero, value to most people. What exactly is the difference between Open Library and Project Gutenberg? Aren't they going to have 99% overlapping content?

    • Re:Relevance? by RalphBNumbers (Score:2) Monday July 16, @06:08PM
  • wikipedia 2.0 (Score:2, Interesting)

    by wizardforce (1005805) on Monday July 16, @05:45PM (#19882025)
    (Last Journal: Saturday August 25, @03:49PM)
    so basically they are building a library that works a lot like Wikipedia but it is like an online library [creative commons I presume] how do they incorporate editing into the system without it having the same problems that wikipedia has? what does the project do that couldn't just as easily be done by expanding Wikipedia? any thoughts?
  • Take flight? (Score:1, Insightful)

    by Anonymous Coward on Monday July 16, @05:57PM (#19882099)
    "Taking flight" normally denotes escape from a perilous situation, not emergence as is intended by the author.

    Mod me down if you must but it's annoying when otherwise intelligent people cannot write a simple sentence and the editors are so lax in their responsibilities.

    I must be new here.

  • Not good (Score:1)

    I know their intentions are good, but for these various online text-searchable book projects to be of maximum usefulness, they really need to be merged into one big project. Or, at the very least, a search engine needs to be set up that will search them all. Right now I basically just stick to Google Books, although I'm fully aware that the content I'm looking for but can't find is likely out there in one of the other few dozen open library projects.
  • Not Project Gutenbeg (Score:5, Insightful)

    by krelian (525362) on Monday July 16, @06:19PM (#19882275)
    Don't compare this to Project Gutenberg. This is the supposed to be the Internet Movie Database" [imdb.com] for books (as far as I understand anyway). Anyway, I am pretty sure that a big part of this information can filled with calls to Amazon web services.
  • by Petrushka (815171) on Monday July 16, @06:22PM (#19882291)
    ... here it is [openlibrary.org].
  • IPL? (Score:1)

    How is this going to be different than the Internet Public Library? http://www.ipl.org/ [ipl.org]
    • Re:IPL? (Score:5, Interesting)

      by TTK Ciar (698795) * on Monday July 16, @07:04PM (#19882569)
      (http://www.ciar.org/ttk | Last Journal: Monday October 15, @05:30PM)

      OpenLibrary is a lot more complete, for one .. searching on "Ogorkiewicz" in IPL yielded no hits, while OL gave me several. The Archive is well-connected to various institutions like the Library of Congress and Bibliotech, and is able to pull a lot of help from these other organizations into making a more complete service.

      OpenLibrary is also a catalog of metadata, providing information for each book like physical format, publisher, ISBN#, number of pages, and so on. This metadata has a lot of holes for now, but hopefully that will change as publishers and/or people who own copies of these books fill in the blanks, much like the Internet Movie Database.

      Finally, OpenLibrary has its own staff which is dedicated to working with Internet Archive partners to make this the most complete catalog on the planet. IPL is cool (I like it!) but it does not seem to be very actively maintained.

      (disclaimer: I work for The Internet Archive, but I do not speak for it, and the OpenLibrary team is in a completely different department from mine so DO NOT treat this post as necessarily any more authorative or correct than any other slashdot post.)

      -- TTK

      [ Parent ]
      • Re:IPL? by hisstory student (Score:1) Monday July 16, @07:27PM
      • Re:IPL? by furzburz (Score:1) Tuesday July 17, @10:25AM
      • 1 reply beneath your current threshold.
  • is an error like this:

    <type 'exceptions.TypeError'> at /search
    unbound method remove_node() must be called with LRU instance as first argument (got NoneType instance instead)Python    /1/pharos/code/production/pharos/infogami/tdb/tdb. py in remove_node, line 607
    Web    GET http://demo.openlibrary.org/search

    Traceback (innermost first)
    /1/pharos/code/production/pharos/infogami/ tdb/tdb.py in remove_node
    ...
            node = LRU.remove_node(node) ...
    &#9654; Local vars
    /1/pharos/code/production/pharos/infogami/td b/lru.py in prune
    ...
                self.remove_node() ...
    &#9654; Local vars

    That was just my first try, but it doesn't really encourage me to try again.
  • This is great news, I hope it actually works. Related: I recently discovered my local library has about 50% of the books I usually buy. Why didn't I think of this earlier? Must of lost about $10K from that during the last decade. Now, if you'll excuse me, I must go check out a copy of "How to Make a Your Very Own Video Game in 16 Days Using ONLY...Wordstar!"
  • Gutenburg (Score:1, Redundant)

    by jshriverWVU (810740) on Monday July 16, @07:50PM (#19882851)
    Uhm, it's kinda already done with Project Gutenburg and Librivox. How is this different?
  • I'm curious how they'll make money? (Score:1, Interesting)

    by Anonymous Coward on Monday July 16, @08:14PM (#19882993)
    But I'm sure it'll come down to some banner ad/mining user data scheme. Books are old hat today, I've been cleaning house on reference and history books that are still useful if not the most current. This is also in direct contradiction to the way most librarians are seeing the world. They're gearing towards a future of information--it's all in databases and online sources, never mind books, even if condensed as online parcels on information, are still useful. The metadata/database descriptor field they're using seems to follow standard library format to a degree, the sort of stuff librarians require a master's degree to supposedly understand. Still, there's no catalog number system (Dewey Decimal, etc.) or seemingly any provision for serials. In this respect it looks more like a bookstore than a library.

    Lastly, in an age where the visits to libraries are increasing mainly to use computers, and budgets keep dropping and print collections suffer (notice how many still have science books from the fifties in them?) I wonder how will this will work since it's a private enterprise. My dream would be the Library of Congress becoming the online resource with all the books available or at least links to where you can buy OR borrow them, but that will likely never pass. Still, one can dream.....
  • Kinakuta (Score:3, Insightful)

    How about placing the servers somewhere where copyright law hold no sway?
    Are there really any working data havens?
    • 1 reply beneath your current threshold.
  • Vandalism controls? (Score:3, Interesting)

    First thing I did on the site was pull up an entry for a book my university press publishes. It had no "Buy" option. I edited the metadata to add the ISBN-10 number for it, and voila, a Buy option.

    It then took a certain amount of self-control for me not to go into various titles dealing with George W. Bush and enter the ISBN-10 of the storybook [amazon.com] containing "My Pet Goat". Purely as a proof of concept, you understand.

    This is simply the Wikipedia vandalism problem writ large. What controls will OpenLibrary put in place to guard against it?

  • Some thoughts (Score:5, Insightful)

    by harmonica (29841) on Tuesday July 17, @04:24AM (#19885453)
    I know the project is just starting, but here it goes.

    They should republish the raw data the same way Wikipedia and even IMDb does. I for one am not going to contribute to any data collection project that I can't later use myself.

    Their schema [openlibrary.org] doesn't differentiate between editions. If I understand it right, that means that for the 3000 existing editions of "Tom Sawyer" released over the years, by different publishers in different countries and languages, the book's description has to be replicated for each one. That can't be good. I don't have a quick solution to this myself. Sometimes (esp. with tech books), a new edition changes content significantly compared to the previous one, sometimes they're exactly the same.

    Collecting the cover images is a great service. However, doesn't this infringe on the publisher's copyright? Is this still fair use? What about countries like Germany without fair use laws--will German books still be OK because the data is collected in the USA (I guess)?

    Add a feature to upload book descriptions as XML. Suggest a DTD. I have a list of my book collection stored as an XML file, so have others (maybe not natively, but book collection management software usually has an export function). It should be possible to automate the process of adding book information already stored in some digital format.

    There should be some category system to pick from. Some may put Tom sawyer into "Novel, USA antebellum", others into "Novel, USA 19th century".

    Somehow connect this to Wikipedia. The more prominent books have article pages. Maybe data could be retrieved from it as well. There are currently Tom Sawyer articles in 16 or so languages.

    The edit page should group items better: stuff everyone understands (year published, title) first, then those things only specialists know.

    The edit page's descriptors shouldn't be images but text which links to an explanation page for the same reason. BISAC? LCCN? UCC13? I know, I can find out what those are with a search engine, but I shouldn't have to.

    Prepare for i18n. I guess LCCN is a library of congress code number? Those types of libraries exist in other countries, too. Each book can have a gazillion codes. Make this another tuple in the database: (book_id, code_id, code_value) instead of (book_id, lcc_id, isbn10, isbn13, 10 other codes in the same record).

    Also i18n: store language codes with all textual columns. A description is most likely going to be Hungarian for a book published in Hungary in Hungarian.

    This complicates the schema a lot. Having very few tables is tempting, but it usually doesn't work well with the real world.
  • Change the name first (Score:1, Insightful)

    by Anonymous Coward on Tuesday July 17, @07:33AM (#19886201)
    A lot of the confusion here arises from the fact that this claims to be a "library". A library is where you can borrow books. An online library would be something where you can download books. On their site you can't even read books. It's a (bookstore/library/etext) catalog at most.
    • 1 reply beneath your current threshold.
  • by dhasenan (758719) on Monday July 16, @06:26PM (#19882313)
    Or creative commons, like Cory Doctorow's work (which is on par with most similar fiction).
    Or just old, almost like James Joyce's work, which arguably nobody reads, but for Joyce at least, a lot of people talk about it.

    And as for getting stuff...at least for now, the experience of an ebook is a lot less enjoyable to most people than that of a dead tree book. Dead tree books have portability advantages as well. So if someone likes a book they find on Open Library, they might well buy it on Amazon.
    [ Parent ]
  • 5 replies beneath your current threshold.