Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Amazon Launches Full Text Book Search

Posted by Hemos on Fri Oct 24, 2003 02:34 AM
from the looking-through-it-all dept.
m00nun1t writes "Amazon have launched a new service that allows you to search the full text of books. This sounds like an incredibly useful function as well as technically impressive at this scale. I wonder if a patent is in the works." Or if a patent is already owned.
This discussion has been archived. No new comments can be posted.
Amazon Launches Full Text Book Search | Log In/Create an Account | Top | 241 comments (Spill at 50!) | Index Only | Search Discussion
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • Yeah, but.. (Score:4, Funny)

    by michaelhood (667393) on Friday October 24 2003, @02:35AM (#7298147)
    can you do it with one click?
  • Amazon... by Ianoo (Score:1) Friday October 24 2003, @02:36AM
    • Re:Amazon... by capojava (Score:1) Friday October 24 2003, @02:41AM
    • Re:Amazon... (Score:5, Interesting)

      by will_die (586523) on Friday October 24 2003, @02:50AM (#7298196)
      (http://www.google.org/)
      It is really nice, I was using amazon right as they switched it one.
      I was searching for books on Object Role Modeling(ORM), I had first done a search for ORM and did not find anything of interest. They then switched it on while I did a search of 'Object Role Modeling', this poped up a few books with the text where it was being used.
      [ Parent ]
      • Re:Amazon... by Ianoo (Score:1) Friday October 24 2003, @03:21AM
    • You can see whole pages (Score:5, Informative)

      by AlecC (512609) <alec@aleccawley.com> on Friday October 24 2003, @03:23AM (#7298290)
      (http://www.aleccawley.com/)
      You can read the page it is on and +/- two pages.

      This is equivalent of the facility you have in a physical bookstore to open a book and browse a few pages before purchasing. I can see it might be very useful, if they get the majority of books in a field accessible like this.

      I wanted a PHP book the other day, and it is very difficult to decidew which one of the plethora available I wanted. So I went to my physoical bookstore. Smaller choice, but I could open each and get an impression of whther ther were slow, detail by detail, dummies books or the sort of high-speed summary I wanted.
      [ Parent ]
    • Re:Amazon... by DerPflanz (Score:1) Friday October 24 2003, @03:56AM
    • Re:Amazon... by hdparm (Score:3) Friday October 24 2003, @04:40AM
      • 1 reply beneath your current threshold.
  • abuse (Score:4, Interesting)

    I can almost hear the screams of joy from the underground book pirates.

    How easy can this service be abused, with automatic webbots doing the searching?

    I can imagine there might be filters, time limits, and max searchs/day limits for something of this scale, no?
    • Re:abuse (Score:5, Interesting)

      by Maskirovka (255712) on Friday October 24 2003, @02:49AM (#7298192)
      How easy can this service be abused, with automatic webbots doing the searching?
      You can only browse two pages in either direction per search. You also have to be logged in. I suppose someone could script a system to create thousands of account, then use an army of zombie machines to OCR the pages from a variety of different IPs. That is assuming that Amazon has EVERY page of every book available to the service, which I doubt.

      It would probably by easier to coble together a robot built around a laptop with an ocr equiped camera and book manipulation software and set it loose in a big library at night. For 50 years.

      [ Parent ]
      • Re:abuse by Ianoo (Score:1) Friday October 24 2003, @03:34AM
      • Re:abuse by Ryan O'Rourke (Score:1) Friday October 24 2003, @10:53AM
      • 1 reply beneath your current threshold.
    • Re:abuse by chicoy (Score:1) Friday October 24 2003, @02:53AM
    • Re:abuse (Score:5, Informative)

      by Enoch Root (57473) on Friday October 24 2003, @02:59AM (#7298222)
      You 'almost', but not quite, hear the book pirates, most probably because they don't formally exist. ebooks are widely available in unencrypted format, and the latest releases, while in secure formats such as Secure MS Reader or Adobe, are probably much easier to crack than creating a bot to collect a book online page by page.

      ebooks are a pretty healthy alternative to normal books, but I don't see the publishers worrying too much about piracy. Perhaps it's because the average script kiddie who will spend 2 days downloading Matrix Reloaded from Usenet is just not the type to try and crack open a book, much less crack an ebook.
      [ Parent ]
      • Re:abuse by Enoch Root (Score:1) Friday October 24 2003, @05:51AM
      • 1 reply beneath your current threshold.
    • Re:abuse by Stalker_reklatS (Score:2) Friday October 24 2003, @03:04AM
      • Re:abuse by hikaru1 (Score:1) Friday October 24 2003, @03:30AM
    • Re:abuse (Score:4, Insightful)

      by wfberg (24378) on Friday October 24 2003, @03:47AM (#7298336)
      How easy can this service be abused, with automatic webbots doing the searching?

      Not so easily. It's easy to see why. The books will be scanned in using OCR. These days a fast and convenient and almost error-free process. But not entirely error-free. Good enough to find documents that are highly relevant to a particular keyword (if "hydraulics" occurs 9 times, what are the odds of OCR getting it wrong all 9 times?) but not good enough for entirely automated book-to-text.

      If amazon would display highlighted portions of the books contents if would probably not exceed a few lines, just like google doesn't present entire webpages in it's result screen). If they did want to show more, they'd have to show an image of the scanned in page anyway, since OCR errors would not be very pretty. (A lot of digital archiving products use a similar approach; they index PDF files that contain the OCR'ed text, invisible to the end-user, and the scanned pages as content which the end-user looks at).

      Besides, to search for each page of a book, you'd have to search for a keyword on each page of that book. Such keywords would most easily be extracted by scanning in the book via OCR anyway!
      [ Parent ]
    • From their FAQ... by eMartin (Score:2) Friday October 24 2003, @03:50AM
    • Re:abuse by MikeXpop (Score:1) Friday October 24 2003, @05:08AM
    • Re:abuse by h8macs (Score:1) Friday October 24 2003, @07:52AM
    • Re:abuse by magores (Score:1) Friday October 24 2003, @09:03AM
      • Re:abuse by *xpenguin* (Score:2) Friday October 24 2003, @10:12AM
        • Re:abuse by magores (Score:1) Friday October 24 2003, @03:48PM
    • Re:abuse - I've abused it. Sort of. by dnquark137 (Score:3) Friday October 24 2003, @12:16PM
    • 2 replies beneath your current threshold.
  • um...spell "launches" correctly please by spamchang (Score:1) Friday October 24 2003, @02:38AM
  • It works!!! (Score:5, Funny)

    by jabbadabbadoo (599681) on Friday October 24 2003, @02:38AM (#7298159)
    1) I typed 'porn'
    2) It returned a lot of results

    Conclusion: It works!!!

    • Re:It works!!! by The Ancients (Score:1) Friday October 24 2003, @02:42AM
  • Hmmm... by 00420 (Score:1) Friday October 24 2003, @02:40AM
    • Re:Hmmm... by 00420 (Score:1) Friday October 24 2003, @02:43AM
      • Re:Hmmm... by KDan (Score:2) Friday October 24 2003, @03:46AM
    • Re:Hmmm... by alphakappa (Score:1) Friday October 24 2003, @03:53AM
  • by Dancin_Santa (265275) <DancinSanta@gmail.com> on Friday October 24 2003, @02:41AM (#7298169)
    (Last Journal: Friday December 24 2004, @08:49PM)
    Back in the early days of the web, when Yahoo was still a catalog of links and not some super news/search/auction/ebusiness/do-it-all website that it is now, searches were much more fun.

    You really never knew what would turn up as you traversed the Yahoo directory structure. You start searching for blues music and you'd end up with a list of 15 or so good links with .wav samples and more than likely an artist you'd never heard of before. That was the best part, getting introduced to things you hadn't even thought to look for.

    As search techniques are becoming more refined, we are now able to do specific word searches on websites and now books. That's fine if you know exactly what you are looking for. For example if you want to get that book about 'replicants' you'll find Blade Runner, but you won't find anything else. You won't get any information except exactly the thing you are looking for.

    And I think that that is where the problem with this kind of search lies for books/music/etc. If you want to find a song or a book, it most likely isn't going to be a specific word you remember, it will be the tune or the plot, both of which are not searchable.

    I don't see this improvement in Amazon's search system as that much of an improvement. A better improvement could be made to the 'We thought you'd like' feature. Instead of finding only what I'm looking for, I'd like to find other things I might also be interested in.
  • by Anonymous Coward on Friday October 24 2003, @02:42AM (#7298170)
    I remember a teacher once telling a class I was in that our essays may be compared to other essays published online to check for plagiarism.

    Granted, Amazon.com's feature will only (for now) include 150,000 books, but this may very well be another way to catch plagiarizers. Just type in a suspicious phrase and see if there are any 'hits'.
  • No Searching Inside O'Reilly Books (Score:5, Interesting)

    by theodp (442580) on Friday October 24 2003, @02:43AM (#7298174)
    Even though he said he was 'blown away' by Amazon's new Search Inside the Book feature, Tim O'Reilly has decided not to participate in the program [wsj.com] for now. 'If they end up being a Google for published content...we need to think better about what publishers get out of it,' he said.
    • Re:No Searching Inside O'Reilly Books (Score:5, Interesting)

      by Zeddicus_Z (214454) <Zeddicus_Z@@@hotmail...com> on Friday October 24 2003, @04:18AM (#7298436)
      (http://slashdot.org/)
      As a Safari subscriber, I'd say it's probably because Full Text Search of online book content is also present at O'Reilly's own Safari [oreilly.com]online tech book site. You've been able to do the same thing Amazon is now crowing about, on every book Safari has, since launch quite some time ago (year or two perhaps?)

      Safari is more of a "service" (i.e. renting access to book content) than a "feature" of a retail website, which is all Amazon's "innovation" seems to be.

      Basically the only real different between the two (aside from what is cited above) is that Amazon just lets you know the content is mentioned, and shows you a page or two. Safari gives you the entire book. That and that Amazon has a much wider range of books in non-tech genres
      [ Parent ]
      • 1 reply beneath your current threshold.
    • 2 replies beneath your current threshold.
  • In othr news by philipdl71 (Score:2) Friday October 24 2003, @02:49AM
    • 1 reply beneath your current threshold.
  • No limits on pages viewed/searched? by JCallery (Score:1) Friday October 24 2003, @02:49AM
  • no more staying awake! by rushibhai (Score:2) Friday October 24 2003, @02:53AM
  • No more out-of-print books (Score:4, Insightful)

    by Bushcat (615449) on Friday October 24 2003, @02:55AM (#7298211)
    As the digital index builds up, we will rapidly come across the situation where the electronic book is searchable, but the printed form is out of print. If this service ultimately allows single copies to be printed for delivery, it will be an outstanding demonstration of print-on-demand technology as advocated by the Print On Demand Initiative [podi.org] and others.

    I'd love to be able to browse a giant back catalog, knowing that an original or facsimile copy could definitely be delivered to me.

  • One click search. (Score:4, Funny)

    by burtonator (70115) on Friday October 24 2003, @02:56AM (#7298212)
    In other news... Amazon announced that the USPTO has granted them a patent on their proprietary "one click search" technology.

    When questioned for comment Google CEO Eric Schmidt said "ug".

  • New age youth (Score:3, Funny)

    by Anonymous Coward on Friday October 24 2003, @03:00AM (#7298226)
    Youth in the old days: lookup 'vagina' in a dictionary.
    Youth nowadays: lookup 'vagina' in all books on this planet.
  • Wow! (Score:5, Informative)

    I'm impressed. A couple of days I want onto Amazon to find books about Singular Value Decompositions (a mathematical technique that can be used for efficient statistical analysis of large groups of documents, among other things). I wasn't particularly surprised when it returned 0 results, since anyone who puts the term "Singular Value Decomposition" in their book's title obviously doesn't know much about marketing. Of course I don't actually give a damn if the term is in the title or not; I just want to know if the books talks about this technique.

    I tried the search again today and got nearly 5,000 results, with the capability to actually look inside the book and see if the reference is useful to me. Very impressive indeed, patent or no patent.

    • Re:Wow! by real bio (Score:2) Friday October 24 2003, @08:07AM
    • 1 reply beneath your current threshold.
  • But Will They Make Them Available a eBooks? by klausner (Score:1) Friday October 24 2003, @03:04AM
  • My question is... by switched4OSX (Score:1) Friday October 24 2003, @03:06AM
  • Various worthwhile uses (Score:5, Informative)

    by emcron (455054) * on Friday October 24 2003, @03:07AM (#7298246)

    Bash Amazon all you want, but this is a very useful technology.

    In five minutes I was able to find three books that talked about findings first listed in two of my own published scientific papers, yet these books did not cite me, or anyone else, as the source of that information. My lawyer is currently preparing three letters.

    I also found two other books in which the author used verbatim quotes and original theories from various interviews I have given, yet both authors passed off the statements as their own. My lawyer is now preparing five letters.

    Aside from being used to protect my own research rights, I have found the search system useful for finding topics of interest discussed in certain books which are not referenced in any of the descriptions about the books. I just ordered three books I would not otherwise have ever purchased.

    While I don't think highly of all of Amazon's practices, I must hand it to them for whatever technical undertaking created this search feature.
  • Already been done by feste12 (Score:1) Friday October 24 2003, @03:07AM
  • Just tried it.... by L-s-L69 (Score:2) Friday October 24 2003, @03:07AM
  • Fuck Patents. by shadowxtc (Score:1) Friday October 24 2003, @03:08AM
  • Not impressed by Zog The Undeniable (Score:2) Friday October 24 2003, @03:08AM
  • ebooks vs CD/DVD (Score:3)

    by Enoch Root (57473) on Friday October 24 2003, @03:09AM (#7298255)
    I warmly welcome any initiative that makes more and more books, or parts thereof, available online.

    I used to think, like many people, that ebooks just didn't work because 'I like the feel of paper under my fingers'. Since I bought a PDA and discovered the joys of Fictionwise [fictionwise.com], I just can't go back to these clumsy wood pulp apparels.

    Amazon is pretty progressive in this regard, making a great number of their collection available electronically. It was probably fairly easy from there to make their stock searchable. And how I wish the MPAA and RIAA could work like the publishing industry...

    The existence of ebooks is NOT threatening traditional books, because people see more value in a printed book over an electronic copy. This is clearly not the case with a CD and a DVD, since most people couldn't care less about the jacket if they have the goods on the CD/DVD. I wish the MPAA and RIAA would understand how to make traditional CDs and DVDs "value-added", and make people less inclined to getting a computer file instead of shelling out the money.

    Then again, I guess the case with ebooks is that your typical DVD or CD pirate is just not interested in swapping files to get the latest Stephen King and read it on screen. Not only that, but most of History's greatest books are available for free, and one could probably read free books for the rest of their lives if they chose so.
  • Legal Implications? by wo1verin3 (Score:1) Friday October 24 2003, @03:15AM
  • more patents? by Catcher80 (Score:1) Friday October 24 2003, @03:15AM
  • already been done by Catcher80 (Score:1) Friday October 24 2003, @03:19AM
  • Why do I need to enter a credit card number? by waimate (Score:2) Friday October 24 2003, @03:25AM
  • Anyone else notice this? (Score:4, Insightful)

    by mike_lynn (463952) on Friday October 24 2003, @03:34AM (#7298315)
    You have to have an account to view the pages. Fine, great. But then it brought up this screen:

    By publishers' agreement, we are pleased to offer Amazon.com customers with a valid credit card the ability to view copyrighted pages.
    Your account will not be charged.
    This one-time process enables you to view limited copyrighted material through our Search Inside the Book feature.


    So they'll let you browse the search pages, if you can prove your identity on record and provide them with financial information. No thanks.
  • How are they indexing and scanning all the books? by eMartin (Score:2) Friday October 24 2003, @03:46AM
  • Why did Amazon take this route? by ramas (Score:2) Friday October 24 2003, @03:47AM
  • Yeah but... by Ogger (Score:1) Friday October 24 2003, @03:52AM
  • Did anyone notice the number of books? by alphakappa (Score:1) Friday October 24 2003, @03:55AM
  • 120,000 books =33 million pages.We can do better! by SPYDER Web (Score:1) Friday October 24 2003, @03:58AM
  • Scanner problems (Score:4, Interesting)

    by thrill12 (711899) on Friday October 24 2003, @04:13AM (#7298421)
    Neat idea, but some excerpts come out all wrong:
    See this for example... [amazon.com]
    Mass-OCR'ing has it's drawbacks..
  • This technology is called. . . (Score:3, Informative)

    by kfg (145172) on Friday October 24 2003, @04:16AM (#7298429)
    "grep."

    I believe there is a body of prior art for scanning in books and greping them. Is that not one of the oft repeated benefits of ebooks?

    Whether or not Amazon can get a patent on a shell script to serve up the results . . . on the web oooooooo, remains to be seen I suppose.

    They managed to get one on "Give me one of those, put it on my account and drop it by my house" a "technology" my grocer has been offering over the phone for 40 years that I'm personally aware of.

    However, since this sort of "technology" is exactly the sort of thing that the web, and the internet itself for that matter, was invented for I'd have to guess there's a lot of prior art. It's certainly obvious and trivial, but that doesn't seem to count for much these days.

    The problem with things that are so obvious and trivial that "everyone" has been doing it for decades is that it's hard to demonstrate in court because no one bothers to document it.

    Can you prove your grandfather put his pants on one leg at a time?

    Common sense tells you he did, but common sense no longer applies in an age that grants patents to perpetual motion machines and peanut butter sandwiches.

    KFG
  • Prior art on any patent by AuMatar (Score:2) Friday October 24 2003, @04:31AM
  • Patent by schnitzi (Score:2) Friday October 24 2003, @05:16AM
  • Search Dictionary? by Bobman1235 (Score:2) Friday October 24 2003, @05:24AM
  • by Enigmia Man (320896) on Friday October 24 2003, @05:31AM (#7298674)
    Article [wired.com] in December Wired talks about Amazon's book scanning, how they legally do it, who does it, how many books so far, and protections.
    • 1 reply beneath your current threshold.
  • Amazon 'partner' link URL in this story by Bob Cat - NYMPHS (Score:1) Friday October 24 2003, @06:08AM
  • safari.informit.com already does this...for a fee by fogez (Score:1) Friday October 24 2003, @06:32AM
  • Now we just need... (Score:5, Funny)

    by s88 (255181) on Friday October 24 2003, @06:57AM (#7298966)
    (http://jaimbot.sourceforge.net/)
    A full text search of slashdot, so the editors can search for duplicate articles before they post.

    Scott
  • Biased? by nimrod_me (Score:2) Friday October 24 2003, @07:06AM
  • I patented... by davidylin (Score:1) Friday October 24 2003, @07:17AM
  • How do we know _which_ books... (Score:3, Interesting)

    by dpbsmith (263124) on Friday October 24 2003, @07:24AM (#7299058)
    (http://www.dpbsmith.com/)
    ...are included in the search?

    A check on "the clocks were striking thirteen" yields seventeen hits, including the Cliff's Notes to Nineteen Eighty-Four and a reference in the Oxford Dictionary of Modern Quotations...

    but none to Orwell's Nineteen Eighty-Four itself.

    We must conclude that the coverage is spotty.
  • by op00to (219949) on Friday October 24 2003, @08:12AM (#7299358)
    What a feat of computing genius! Using computers to search through large bodies of text!!!! Has ANYONE ever done this before?!
  • OCR Quirks by frx (Score:1) Friday October 24 2003, @09:36AM
  • Unfair Use? Amazon's Free Book Giveaway Service by nettle (Score:2) Friday October 24 2003, @09:44AM
  • Patent Bashing Du Jour (Score:3, Insightful)

    by saddino (183491) on Friday October 24 2003, @10:21AM (#7300575)
    Or if a patent is already owned.

    This type of editorializing is pathetic in that its only purpose is to stir up the masses. Gee...now let's take a look shall we? 20% of the comments are "patents suck" or "isn't this some example prior art"?

    This story is about a new feature people...it's not about a patent. Wipe the froth from your mouths and comment on the merits (of lack of) the feature...not on a completely fabricated hypothetical comment meant to incite you into a frenzy.
  • super index by pensano (Score:1) Friday October 24 2003, @10:21AM
  • Useful, yes. Technically impressive/patentable, no by dwheeler (Score:2) Friday October 24 2003, @10:28AM
  • Not As Useful As I'd Like by ThePhin (Score:1) Friday October 24 2003, @10:31AM
  • A Tale of Two Cities by crimefighter (Score:1) Friday October 24 2003, @10:43AM
  • Available for Googling? by bkhl (Score:1) Friday October 24 2003, @10:52AM
  • Result Overload by Arianrhod (Score:1) Friday October 24 2003, @11:15AM
  • Already available from another service by hoochiepapa (Score:1) Friday October 24 2003, @05:57PM
  • searched on: shit eating freaks - Results: by Ralph Spoilsport (Score:1) Friday October 24 2003, @07:16PM
  • AMZN's FAQ on this feature by WaKall (Score:2) Friday October 24 2003, @11:10PM
  • Patent on full-text search by ddyer-bennet (Score:1) Saturday October 25 2003, @11:37AM
  • by Anonymous Coward on Friday October 24 2003, @02:43AM (#7298176)
    There's books about everything:

    Encyclopedia of New Media : An Essential Reference to Communication and Technology -- Steve Jones (Editor); Hardcover

    Excerpt from page 0: ". . . post-ranking system used by members the of Web message board Slashdot.org, began as a result of community self- restraint in the face of unrelenting trolls (pointlessly hostile posters). In addition, some cyberspace forums now require . . ."

    See more references to slashdot troll in this book.
    [ Parent ]
  • Re:Those crazy Brits by zabieru (Score:2) Friday October 24 2003, @02:55AM
  • Re:Amazon have? (Score:3, Informative)

    by AlecC (512609) <alec@aleccawley.com> on Friday October 24 2003, @03:33AM (#7298311)
    (http://www.aleccawley.com/)
    I don't think there is in the generall case a correct answer to whether collectives should be singular or plural - it depends upon the context.

    "Congress have failed to agreee..." because you are talking about a lod of swuablling politicians who are definitely plural.

    "Congress has past a bill..." because those politicians have managed to achiueve a consensus and act as as a single entity.

    In this case the sungular is correct, because Amnazon as an entity is offering a new service. But you could use the term collectively for all employees of Amazon.
    [ Parent ]
    • Heh by mongbot (Score:1) Friday October 24 2003, @03:41AM
      • Re:Heh by AlecC (Score:2) Friday October 24 2003, @06:57AM
    • 1 reply beneath your current threshold.
  • Re:Those crazy Brits by tiled_rainbows (Score:2) Friday October 24 2003, @04:16AM
  • Re:What if... by Robmonster (Score:1) Friday October 24 2003, @05:39AM
  • Re:Those crazy Brits (Score:3, Funny)

    by fiannaFailMan (702447) on Friday October 24 2003, @11:32AM (#7301400)
    (Last Journal: Tuesday April 24 2007, @07:35PM)
    Well at least they don't refer to a liquid as 'gas' like the Americans do when talking about petrol.
    [ Parent ]
  • 19 replies beneath your current threshold.