Slashdot Log In
Amazon Launches Full Text Book Search
Posted by
Hemos
on Fri Oct 24, 2003 02:34 AM
from the looking-through-it-all dept.
from the looking-through-it-all dept.
m00nun1t writes "Amazon have launched a new service that allows you to search the full text of books. This sounds like an incredibly useful function as well as technically impressive at this scale. I wonder if a patent is in the works." Or if a patent is already owned.
This discussion has been archived.
No new comments can be posted.
Amazon Launches Full Text Book Search
|
Log In/Create an Account
| Top
| 241 comments
(Spill at 50!) | Index Only
| Search Discussion
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Yeah, but.. (Score:4, Funny)
Re:Yeah, but.. (Score:5, Interesting)
(http://www.inter-sections.net/)
Daniel
Re:Amazon... (Score:5, Interesting)
(http://www.google.org/)
I was searching for books on Object Role Modeling(ORM), I had first done a search for ORM and did not find anything of interest. They then switched it on while I did a search of 'Object Role Modeling', this poped up a few books with the text where it was being used.
You can see whole pages (Score:5, Informative)
(http://www.aleccawley.com/)
This is equivalent of the facility you have in a physical bookstore to open a book and browse a few pages before purchasing. I can see it might be very useful, if they get the majority of books in a field accessible like this.
I wanted a PHP book the other day, and it is very difficult to decidew which one of the plethora available I wanted. So I went to my physoical bookstore. Smaller choice, but I could open each and get an impression of whther ther were slow, detail by detail, dummies books or the sort of high-speed summary I wanted.
abuse (Score:4, Interesting)
(http://haikunews.org/ | Last Journal: Tuesday January 07 2003, @09:34AM)
How easy can this service be abused, with automatic webbots doing the searching?
I can imagine there might be filters, time limits, and max searchs/day limits for something of this scale, no?
Re:abuse (Score:5, Interesting)
You can only browse two pages in either direction per search. You also have to be logged in. I suppose someone could script a system to create thousands of account, then use an army of zombie machines to OCR the pages from a variety of different IPs. That is assuming that Amazon has EVERY page of every book available to the service, which I doubt.
It would probably by easier to coble together a robot built around a laptop with an ocr equiped camera and book manipulation software and set it loose in a big library at night. For 50 years.
Re:abuse (Score:5, Informative)
ebooks are a pretty healthy alternative to normal books, but I don't see the publishers worrying too much about piracy. Perhaps it's because the average script kiddie who will spend 2 days downloading Matrix Reloaded from Usenet is just not the type to try and crack open a book, much less crack an ebook.
Re:abuse (Score:4, Insightful)
Not so easily. It's easy to see why. The books will be scanned in using OCR. These days a fast and convenient and almost error-free process. But not entirely error-free. Good enough to find documents that are highly relevant to a particular keyword (if "hydraulics" occurs 9 times, what are the odds of OCR getting it wrong all 9 times?) but not good enough for entirely automated book-to-text.
If amazon would display highlighted portions of the books contents if would probably not exceed a few lines, just like google doesn't present entire webpages in it's result screen). If they did want to show more, they'd have to show an image of the scanned in page anyway, since OCR errors would not be very pretty. (A lot of digital archiving products use a similar approach; they index PDF files that contain the OCR'ed text, invisible to the end-user, and the scanned pages as content which the end-user looks at).
Besides, to search for each page of a book, you'd have to search for a keyword on each page of that book. Such keywords would most easily be extracted by scanning in the book via OCR anyway!
It works!!! (Score:5, Funny)
2) It returned a lot of results
Conclusion: It works!!!
Fine grain searches take the adventure away (Score:4, Insightful)
(Last Journal: Friday December 24 2004, @08:49PM)
You really never knew what would turn up as you traversed the Yahoo directory structure. You start searching for blues music and you'd end up with a list of 15 or so good links with
As search techniques are becoming more refined, we are now able to do specific word searches on websites and now books. That's fine if you know exactly what you are looking for. For example if you want to get that book about 'replicants' you'll find Blade Runner, but you won't find anything else. You won't get any information except exactly the thing you are looking for.
And I think that that is where the problem with this kind of search lies for books/music/etc. If you want to find a song or a book, it most likely isn't going to be a specific word you remember, it will be the tune or the plot, both of which are not searchable.
I don't see this improvement in Amazon's search system as that much of an improvement. A better improvement could be made to the 'We thought you'd like' feature. Instead of finding only what I'm looking for, I'd like to find other things I might also be interested in.
Potential tool for discovering plagiarism? (Score:4, Insightful)
Granted, Amazon.com's feature will only (for now) include 150,000 books, but this may very well be another way to catch plagiarizers. Just type in a suspicious phrase and see if there are any 'hits'.
No Searching Inside O'Reilly Books (Score:5, Interesting)
Re:No Searching Inside O'Reilly Books (Score:5, Interesting)
(http://slashdot.org/)
Safari is more of a "service" (i.e. renting access to book content) than a "feature" of a retail website, which is all Amazon's "innovation" seems to be.
Basically the only real different between the two (aside from what is cited above) is that Amazon just lets you know the content is mentioned, and shows you a page or two. Safari gives you the entire book. That and that Amazon has a much wider range of books in non-tech genres
No more out-of-print books (Score:4, Insightful)
I'd love to be able to browse a giant back catalog, knowing that an original or facsimile copy could definitely be delivered to me.
One click search. (Score:4, Funny)
When questioned for comment Google CEO Eric Schmidt said "ug".
New age youth (Score:3, Funny)
Youth nowadays: lookup 'vagina' in all books on this planet.
Wow! (Score:5, Informative)
(http://www.allpeers.com/blog)
I tried the search again today and got nearly 5,000 results, with the capability to actually look inside the book and see if the reference is useful to me. Very impressive indeed, patent or no patent.
Various worthwhile uses (Score:5, Informative)
Bash Amazon all you want, but this is a very useful technology.
In five minutes I was able to find three books that talked about findings first listed in two of my own published scientific papers, yet these books did not cite me, or anyone else, as the source of that information. My lawyer is currently preparing three letters.
I also found two other books in which the author used verbatim quotes and original theories from various interviews I have given, yet both authors passed off the statements as their own. My lawyer is now preparing five letters.
Aside from being used to protect my own research rights, I have found the search system useful for finding topics of interest discussed in certain books which are not referenced in any of the descriptions about the books. I just ordered three books I would not otherwise have ever purchased.
While I don't think highly of all of Amazon's practices, I must hand it to them for whatever technical undertaking created this search feature.
ebooks vs CD/DVD (Score:3)
I used to think, like many people, that ebooks just didn't work because 'I like the feel of paper under my fingers'. Since I bought a PDA and discovered the joys of Fictionwise [fictionwise.com], I just can't go back to these clumsy wood pulp apparels.
Amazon is pretty progressive in this regard, making a great number of their collection available electronically. It was probably fairly easy from there to make their stock searchable. And how I wish the MPAA and RIAA could work like the publishing industry...
The existence of ebooks is NOT threatening traditional books, because people see more value in a printed book over an electronic copy. This is clearly not the case with a CD and a DVD, since most people couldn't care less about the jacket if they have the goods on the CD/DVD. I wish the MPAA and RIAA would understand how to make traditional CDs and DVDs "value-added", and make people less inclined to getting a computer file instead of shelling out the money.
Then again, I guess the case with ebooks is that your typical DVD or CD pirate is just not interested in swapping files to get the latest Stephen King and read it on screen. Not only that, but most of History's greatest books are available for free, and one could probably read free books for the rest of their lives if they chose so.
Anyone else notice this? (Score:4, Insightful)
By publishers' agreement, we are pleased to offer Amazon.com customers with a valid credit card the ability to view copyrighted pages.
Your account will not be charged.
This one-time process enables you to view limited copyrighted material through our Search Inside the Book feature.
So they'll let you browse the search pages, if you can prove your identity on record and provide them with financial information. No thanks.
Scanner problems (Score:4, Interesting)
See this for example... [amazon.com]
Mass-OCR'ing has it's drawbacks..
This technology is called. . . (Score:3, Informative)
I believe there is a body of prior art for scanning in books and greping them. Is that not one of the oft repeated benefits of ebooks?
Whether or not Amazon can get a patent on a shell script to serve up the results . . . on the web oooooooo, remains to be seen I suppose.
They managed to get one on "Give me one of those, put it on my account and drop it by my house" a "technology" my grocer has been offering over the phone for 40 years that I'm personally aware of.
However, since this sort of "technology" is exactly the sort of thing that the web, and the internet itself for that matter, was invented for I'd have to guess there's a lot of prior art. It's certainly obvious and trivial, but that doesn't seem to count for much these days.
The problem with things that are so obvious and trivial that "everyone" has been doing it for decades is that it's hard to demonstrate in court because no one bothers to document it.
Can you prove your grandfather put his pants on one leg at a time?
Common sense tells you he did, but common sense no longer applies in an age that grants patents to perpetual motion machines and peanut butter sandwiches.
KFG
Wired article: "The Great Library of Amazonia" (Score:5, Informative)
Now we just need... (Score:5, Funny)
(http://jaimbot.sourceforge.net/)
Scott
How do we know _which_ books... (Score:3, Interesting)
(http://www.dpbsmith.com/)
A check on "the clocks were striking thirteen" yields seventeen hits, including the Cliff's Notes to Nineteen Eighty-Four and a reference in the Oxford Dictionary of Modern Quotations...
but none to Orwell's Nineteen Eighty-Four itself.
We must conclude that the coverage is spotty.
Why..This would be like searching through the LOC! (Score:4, Funny)
Patent Bashing Du Jour (Score:3, Insightful)
This type of editorializing is pathetic in that its only purpose is to stir up the masses. Gee...now let's take a look shall we? 20% of the comments are "patents suck" or "isn't this some example prior art"?
This story is about a new feature people...it's not about a patent. Wipe the froth from your mouths and comment on the merits (of lack of) the feature...not on a completely fabricated hypothetical comment meant to incite you into a frenzy.
Here's a quote relevant to the parent post (Score:3, Interesting)
Encyclopedia of New Media : An Essential Reference to Communication and Technology -- Steve Jones (Editor); Hardcover
Excerpt from page 0: ". . . post-ranking system used by members the of Web message board Slashdot.org, began as a result of community self- restraint in the face of unrelenting trolls (pointlessly hostile posters). In addition, some cyberspace forums now require . .
See more references to slashdot troll in this book.
Re:Amazon have? (Score:3, Informative)
(http://www.aleccawley.com/)
"Congress have failed to agreee..." because you are talking about a lod of swuablling politicians who are definitely plural.
"Congress has past a bill..." because those politicians have managed to achiueve a consensus and act as as a single entity.
In this case the sungular is correct, because Amnazon as an entity is offering a new service. But you could use the term collectively for all employees of Amazon.
Re:Those crazy Brits (Score:3, Funny)
(Last Journal: Tuesday April 24 2007, @07:35PM)