Just One Page a Day 389

Posted by michael on Friday November 08, 2002 @10:36AM from the stuff-to-read dept.

Charles Franks writes "Two years ago I started building an online proofreading system as a way to help Project Gutenberg (PG) get more books online: Distributed Proofreaders (DP). The concept is simple, we scan books and load the image and OCR output for each page into the online system. Next, proofreaders compare the OCR text to the image making any corrections as necessary, each page gets looked at twice. Finally the output from the site is massaged into a PG e-text and submitted to PG for posting to the archive. Now, nearly 600 books and a lot of PHP code later, we have snuggled into our new home which is graciously provided by the Internet Archive and Project Gutenberg. Now that we have 'real' resources available to us (the original site ran on a Pentium 200 over my 128kbps upstream cablemodem) I would like to invite the online community at large to help us put even more books online. To this end I would like to ask everyone to do 'Just One Page a Day'. Thank you, Charles Franks"

This discussion has been archived. No new comments can be posted.

Just One Page a Day

Load All Comments

Search 389 Comments Log In/Create an Account

Comments Filter:

Stop reading this (Score:5, Insightful)

by XiC ( 207670 ) writes: on Friday November 08, 2002 @10:38AM (#4624954)

And start reading a page!
After that come back and you may continue();

Share
twitter facebook
- Re:Stop reading this (Score:3, Insightful)
  
  by H0ek ( 86256 ) writes:
  
  In fact, I feel it would be a Good Thing(tm) for our friendly Slashdot host to stick the link to this project into their Quick Link section on the main page.
  
  Of course, I've already bookmarked the page, but that's on one machine. What happens six months down the line when I need to rebuild my bookmarks? Search for the article on Slashdot? Ick.
- Proofing FAQ (Score:3, Informative)
  
  by Wanker ( 17907 ) writes:
  
  Stop reading this
  And start reading a page!
  After that come back and you may continue();
  
  ...but first read the Proofing FAQ on the site and save yourself some confusion:
  http://texts01.archive.org/dp/faq/ProoferFAQ.html [archive.org]
  Especially read section 5 for some of their typesetting-to-ASCII conventions which would be non-obvious otherwise.
And you ask the /. community.. (Score:5, Funny)

by Harald74 ( 40901 ) writes: on Friday November 08, 2002 @10:39AM (#4624962) Homepage Journal

... which is renowned for it's spelling prowess? ;)

Share
twitter facebook
- Wow, what a scary thought (Score:5, Funny)
  
  by TheConfusedOne ( 442158 ) writes: <the.confused.one@NOSPam.gmail.com> on Friday November 08, 2002 @10:44AM (#4625002) Journal
  
  Imagine the kids 200 years from now reading |-|uc||_3b3rry F1|\||\|.
  
  (That hurts my brain just trying to type it in...)
  
  Parent Share
  twitter facebook
  - Re:Wow, what a scary thought (Score:2, Funny)
    
    by foistboinder ( 99286 ) writes:
    
    |-|uc||_3b3rry F1|\||\|.
    I must get out more - I was actually able to figure that out!
- Re:And you ask the /. community.. (Score:5, Funny)
  
  by Textbook Error ( 590676 ) writes: on Friday November 08, 2002 @10:46AM (#4625022)
  
  for it's spelling
  
  Or grammer... :-)
  
  ("it's" == "it is", "its" == possessive form)
  
  Parent Share
  twitter facebook
  - Re:And you ask the /. community.. (Score:2, Funny)
    
    by jaymz666 ( 34050 ) writes:
    
    then let's not forget that grammar has no e
  - Re:And you ask the /. community.. (Score:5, Funny)
    
    by tswinzig ( 210999 ) writes: on Friday November 08, 2002 @11:10AM (#4625207) Journal
    
    for it's spelling
    
    Or grammer...
    
    Or spelling?
    
    Parent Share
    twitter facebook
    - Re:And you ask the /. community.. (Score:5, Funny)
      
      by Anonymous Coward writes: on Friday November 08, 2002 @11:16AM (#4625250)
      
      Or sense of humour?
      
      Parent Share
      twitter facebook
  - Re:And you ask the /. community... (Score:5, Funny)
    
    by Binestar ( 28861 ) writes: on Friday November 08, 2002 @12:45PM (#4625941) Homepage
    
    MY GOD! A story where nitpicking grammar and spelling is *ON* topic.
    
    This'll be a fun one to read through.
    
    Parent Share
    twitter facebook
- Re:And you ask the /. community.. (Score:5, Funny)
  
  by tswinzig ( 210999 ) writes: on Friday November 08, 2002 @10:52AM (#4625075) Journal
  
  ... which is renowned for it's spelling prowess? ;)
  
  Are you kidding? With the number of people bitching about grammar and spelling in the comments, you just know there's a pool of talent here!
  
  (BTW, there's no apostrophe in the possessive form of "its.")
  
  Parent Share
  twitter facebook
- Re:And you ask the /. community.. (Score:4, Funny)
  
  by Skirwan ( 244615 ) writes: <skerwin@NOSpAm.mac.com> on Friday November 08, 2002 @10:55AM (#4625106) Homepage
  
  And you ask the /. community..
  ... which is renowned for it's spelling prowess? ;)
  
  Is anyone else somewhat dismayed by the fact that the post pointing out our collective poor grammatical skills has a spurious apostrophe?
  
  :)
  
  --
  It's past the blind leading the blind; this is the blind and deaf leading the stupid.
  
  Parent Share
  twitter facebook
  - Re:And you ask the /. community.. (Score:2)
    
    by donutz ( 195717 ) writes:
    
    Not to mention the incomplete ellipsis on the subject line. Of course, maybe that's just a little too picky...
- Re:And you ask the /. community.. (Score:4, Funny)
  
  by orthogonal ( 588627 ) writes: on Friday November 08, 2002 @11:07AM (#4625190) Journal
  
  ... which is renowned for it's [sic] spelling prowess? ;)
  
  Not to mention it's [sic] excellence at spotting grammatical errors.
  
  Parent Share
  twitter facebook
- Re:And you ask the /. community.. (Score:3, Funny)
  
  by Erasei ( 315737 ) writes:
  
  What's even scarier is that there are this many comments telling a person that he is wrong when he so isn't. I mean, come on guys, even the Flowers know the real way to use the apostrophe: http://angryflower.com/bobsqu.gif [angryflower.com]
- Re:And you ask the /. community.. (Score:5, Informative)
  
  by CaseyB ( 1105 ) writes: on Friday November 08, 2002 @12:03PM (#4625565)
  
  I know you're joking, but in reality it doesn't matter how good your spelling is. In fact, I would imagine that any spelling errors found in the text should be reproduced intact, in the interest of accurately representing the original work. This project is about correcting OCR errors, not spelling / grammar.
  
  Parent Share
  twitter facebook
  - Re:And you ask the /. community.. (Score:4, Insightful)
    
    by JoeBuck ( 7947 ) writes: on Friday November 08, 2002 @01:42PM (#4626408) Homepage
    
    Since Project Gutenburg can only publish books whose copyright has expired, it's quite likely that a spelling "error" may instead reflect language evolution, that is, a change in the way words are spelled over time.
    
    Parent Share
    twitter facebook
  - - Re:And you ask the /. community.. (Score:3, Insightful)
      
      by Greedo ( 304385 ) writes:
      
      For one example, my current project is a cookbook published in the 1730's, and so far I've corrected Apricocr to Apricock and Lemon to Lemmon; in both cases the form I corrected it to was overwhelming used in the text.
      
      "Apricocr" I can see being a legitimate typo, but perhaps in converting "Lemon" to "Lemmon", you are eradicating one of the earliest uses (intentional or not) of the now-current spelling.
      
      My personal opinion -- and I yes, everyone on /. did ask for it -- would be to leave the spelling and typos intact, if the goal is to preserve literary creations. You are potentially losing information by changing it.
      
      Ask anyone who has studied the First Folio of Shakespeare about the importance of spelling.
      
      (And just incase you don't have a Shakespeare scholar handy: since Shakespeare's plays were almost always written down after they were first performed (and written down by someone else), there are many clues to the the original performance in how certain words are spelled, capitalized and how sentences are punctuated. Hamlet's "What a piece of worke is a man" is a good example of this.)
Excellent (Score:2, Flamebait)

by drhairston ( 611491 ) writes:

After some consideration, I propose that this system should be applied to Slashdot stories! Each Slashdot story, after being submitted by an editor, should be reviewed by at least two readers before being posted in order to correct inadvertent spelling mistakes and story duplicity. Thank you sir, for inspiration!
- And I shall call it... the wheel! (Score:3, Funny)
  
  by tiltowait ( 306189 ) writes:
  
  You mean a more communal approach than an oligarchy of "editros" that can't spot day-old duplicates? Great [kuro5hin.org] idea [plastic.com]!
- Re:Excellent (Score:3, Funny)
  
  by Draoi ( 99421 ) writes:
  
  in order to correct inadvertent spelling mistakes and story duplicity
  Not to mention malapropisms!! :-)
  http://www.dictionary.com/search?q=duplicity&d b=* [dictionary.com]
  I like the first definition better!
- Duplicity? (Score:2)
  
  by Andy Social ( 19242 ) writes:
  
  Or duplication, maybe?
Just one page a day? (Score:5, Funny)

by Adam Rightmann ( 609216 ) writes: on Friday November 08, 2002 @10:42AM (#4624972)

Sounds like Gary Condit's plan for extramarital affairs.

Share
twitter facebook
- Re:Just one page a day? (Score:4, Funny)
  
  by indiigo ( 121714 ) writes: on Friday November 08, 2002 @11:05AM (#4625174) Homepage
  
  And Bill Clinton did contain himself, except it was one page every day!
  
  Parent Share
  twitter facebook
OCR Software (Score:4, Interesting)

by Zach Garner ( 74342 ) writes: on Friday November 08, 2002 @10:42AM (#4624973)

Is there any worth-while open source OCR software? How about reasonably priced closed source OCR software for *BSD or Linux?

Share
twitter facebook
- Re:OCR Software (Score:4, Informative)
  
  by Anonymous Coward writes: on Friday November 08, 2002 @10:53AM (#4625093)
  
  Generally not used at dp. Mostly uses Abbyy Fine Reader (www.abbyy.com) which is commercial.
  
  gocr (http://jocr.sourceforge.net/) is open-source, and includes interesting bits like deskewing.
  
  As a proofreader, I really appreciate the best ocr, and the free guys are not the best.
  
  Parent Share
  twitter facebook
- Re:OCR Software -- Clara, perhaps? (Score:5, Informative)
  
  by timothy ( 36799 ) writes: on Friday November 08, 2002 @11:13AM (#4625230) Journal
  
  Though the web page was last updated in July, I find several happy references (and some less happy) to "Clara," a GPL'd OCR program.
  
  Here's the web page: http://www.claraocr.org/index.html
  
  timothy
  
  Parent Share
  twitter facebook
  - Re:OCR Software -- Clara, perhaps? (Score:3, Informative)
    
    by Zach Garner ( 74342 ) writes:
    
    I've used both clara and gOCR. Both are not yet working well enough to actually use to scan books..
- - Re:OCR Software (Score:2, Insightful)
    
    by Anonymous Coward writes:
    
    >Just get just about any scanner - it'll almost certainly come with free OCR software.
    
    Generally not nearly as good as the top two (Scansoft (http://www.scansoft.com/sdk/: seems to have engulfed the Xerox/Textbridge and Caere/Omnipage technologies), ABBYY).
    
    When you scan for public use, think about the time of *other people* you waste if your OCR is not optimal or your scans are off-register/ skewed etc.
Obvious... (Score:5, Funny)

by OrangeSpyderMan ( 589635 ) writes: on Friday November 08, 2002 @10:42AM (#4624981)

I'm shure that buy askin teh Salshdot crowd (esp. the editturs) to help, yule improove jamatically teh kwality off you're output.

:-)

Share
twitter facebook
Copyright is not an issue (Score:5, Informative)

by ardmhacha ( 192482 ) writes: on Friday November 08, 2002 @10:43AM (#4624987)

Project Gutenberg only publishes books that are out of copyright. That means Dickens is okay but you wont find the latest Stephen King

Share
twitter facebook
- Re:Copyright is not an issue (Score:3, Informative)
  
  by Twylite ( 234238 ) writes:
  
  Sadly, copyright is an issue in this sort of work. Just because Dickens' works are no longer copyright, doesn't mean you can go and pull a Dickens novel off the library/bookstore shelf and OCR it. Publishers tend to be careful to make slight alterations to the text here and there (formatting, spelling, come clarifications and corrections) which turns a copyright-expired work into a derived work over which they own the copyright. Shitty, isn't it?
- Re:Some PG books ARE copyrighted... (Score:5, Informative)
  
  by dpbsmith ( 263124 ) writes: on Friday November 08, 2002 @01:17PM (#4626212) Homepage
  
  ...Not many, but there are some Project Gutenberg books that are copyrighted and distributed with the author's permission.
  
  Also, Project Gutenberg of Australia [gutenberg.net.au] publishes a number of works that are out of copyright in Australia, but still under copyright in the U.S. It is a copyright infringement for readers in the U. S. to download these works, which include, among others, Hervey Allen's _Anthony Adverse_(1933), F. Scott Fitzgerald's _The Great Gadsby_ (1944), Khalil Gibran's _The Prophet_ (1923), D. H. Lawrence's _Lady Chatterley's Lover_ (1928), all of George Orwell's novels, most of Virginia Woolf's, etc. etc.
  
  Not exactly "the latest Stephen King" but a lot newer than Dickens.
  
  Parent Share
  twitter facebook
The best thing next to GNU/Linux (Score:5, Insightful)

by SuperDuG ( 134989 ) writes: <(be) (at) (eclec.tk)> on Friday November 08, 2002 @10:45AM (#4625011) Homepage Journal

Where's the politics? Where's the controversy? Wasn't this posted before?
Constructive criticism points me in this general direction, GOOD STORY /.!!! The above mentioned questions are null and void for a story like this. You're giving massive exposure for a project who's main goal is to make sure that even if a book goes out of a print and all copies are burned, the book will never die. This is not a novel (no pun intended) idea, this is an actual working project in which I have used on numerous occassions. If anyone can help out I would highly encourage it, this project is about as non-crontroversial as you can get, hell you can even do grammer checks in vi OR emacs.
Donate something, you'll feel better, money, skills, or whatever you think you can give to help them out. Donating money isn't nearly as rewarding as proofreading an OCR and knowing that generations upon generations will be able to see it and use it.

Share
twitter facebook
- Re:The best thing next to GNU/Linux (Score:2, Redundant)
  
  by haystor ( 102186 ) writes:
  
  Having slashdot editors post an request for proofreading does contain a certain amount of irony, if not contreversy.
  - Re:The best thing next to GNU/Linux (Score:2, Funny)
    
    by jamie ( 78724 ) writes:
    
    "Having slashdot editors post an request for proofreading does contain a certain amount of irony, if not contreversy."
    
    You spelled "controversy" wrong.
- Re:The best thing next to GNU/Linux (Score:4, Interesting)
  
  by Cheeko ( 165493 ) writes: on Friday November 08, 2002 @11:19AM (#4625273) Homepage Journal
  
  I'm sure we can find some controversy someplace. For instance I was reading through the copyright section and I was nearly incensed at the end, seeing how long it would take for recent work to reach the public domain. For instance it will be well into the next century before something like The Lord of the Rings will be in the public domain. Under the initial copyright laws requiring a renewal it would enter the public domain in about 5-10 more years, instead of 30+. Also of note in reading the section was how the laws are constantly being changed, in what seems like an obvious attempt to PREVENT works from becoming public domain. My personal feeling on all this, is that books are supposed to be works of intelligence, why in that case would we NOT want them to be freely available to as many people as possible. Something like Project Gutenberg shows the true intillectual power of the internet, and at the same time the governments ability to get in the way.
  
  Parent Share
  twitter facebook
A better use of time (Score:5, Insightful)

by Apreche ( 239272 ) writes: on Friday November 08, 2002 @10:45AM (#4625013) Homepage Journal

I think a better use of time would be to have all these programmers here develop a better OCR. Then you wouldn't need the proofreading and could just feed books into the scanner. I mean there are lots of things wrong with OCR and reasons why it can't be absolutely perfect, but it CAN bet better. If we just write one line of code a day each we'll have better OCR in no time.

Share
twitter facebook
- Re:A better use of time (Score:2, Informative)
  
  by scottcain ( 209570 ) writes:
  
  Perhaps, but the page I just proofed was from a book publish in the 1850's, so it was not the best image quality, and still the OCR did a great job. The most common mistake I corrected was converting I's to !'s. It got right things that I had to look at pretty closely to make sure it was right.
- No, not really (Score:4, Insightful)
  
  by Codex The Sloth ( 93427 ) writes: on Friday November 08, 2002 @11:46AM (#4625461)
  
  OCR Engines are not email programs. You can't just add a line of code and all of a sudden it works better. Usually you have to spend time developing a complicated algorithm. Usually this is more than a line of code. Then you have to test it against known text (ground truth) to make sure it's a benefit, rather than a problem over a broad selection of pages. It's quite often the case that something that improves one page makes another worse.
  
  Actually, having people make verifications against the OCR results establishes the ground truth which someone could use to improve the OCR engine so by doing a Page a Day, you are helping to make future Open Source OCR engines better.
  
  Parent Share
  twitter facebook
- Re:A better use of time (OK, here's mine) (Score:5, Funny)
  
  by gosand ( 234100 ) writes: on Friday November 08, 2002 @11:58AM (#4625543)
  
  If we just write one line of code a day each we'll have better OCR in no time.
  OK, here's mine:
  #include stdio.h
  next...
  
  Parent Share
  twitter facebook
server test under load (Score:2, Funny)

by lovebyte ( 81275 ) writes:

Instead of proofreading the books, I think this guy is asking for his new server setup to be tested!
Dirtributed OCR? (Score:4, Interesting)

by edwilli ( 197728 ) writes: on Friday November 08, 2002 @10:49AM (#4625054) Homepage

Have each client do the OCR (if you can find GPL software). Or maybe there's a company willing to donate it. That way you could farm out most of the processing too.

Share
twitter facebook
Graphics (Score:4, Interesting)

by mallfouf ( 585018 ) writes: on Friday November 08, 2002 @10:51AM (#4625073) Homepage

Very good idea.
Will there be any support for proofing in other languages (french, spanish, arabic, etc...)?
What about books published in other countries. Would we be able to post those books if they're not copyrighted in the US but copyrighted in other countries? or vice versa.

Share
twitter facebook
- Re:Graphics (Score:4, Informative)
  
  by dvdeug ( 5033 ) writes: <dvdeug@[ ]il.ro ['ema' in gap]> on Friday November 08, 2002 @11:23AM (#4625303)
  
  Will there be any support for proofing in other languages (french, spanish, arabic, etc...)?
  
  DP has had books in Dutch, French, Spanish and German. No Arabic - no one has mentioned being able to do it, for one thing.
  
  Would we be able to post those books if they're not copyrighted in the US but copyrighted in other countries?
  
  Project Gutenberg only worries about the US copyright. If it's not copyrighted in the US, they'll do it.
  
  Parent Share
  twitter facebook
use proofreading meta-data to improve OCR! (Score:5, Interesting)

by tomlouie ( 264519 ) writes: on Friday November 08, 2002 @10:52AM (#4625081)

What if they kept track of every time the human reader finds an OCR-error. Couldn't you then build a profile of what words/phrases/letters the OCR software has the most problems with?

Then, couldn't you just selectively have the humans review the highest probably error prone sections of a book, instead of every single word of every single page?

What do you think?

Share
twitter facebook
- Re:use proofreading meta-data to improve OCR! (Score:4, Insightful)
  
  by Big_Breaker ( 190457 ) writes: on Friday November 08, 2002 @11:09AM (#4625197)
  
  Different book - different font - different problems.
  
  It might help a bit but most OCR programs already tag letters that it is unsure about. They don't mention in the article if the distributed system incorporates OCR ambiguity in prioritising proofreading.
  
  As an aside why not just store the raw image for any ambiguous text within the documents in the PG archive (Think of an HTML sort of thing). As people read the document just poll them as to what they think the letters in the bitmap are.
  
  I guess a lot of the stategy rests on how frequently the ocr software makes an error or find ambiguity.
  
  Parent Share
  twitter facebook
- OCR errors mostly caused by poor scan quality (Score:4, Informative)
  
  by oob ( 131174 ) writes: on Friday November 08, 2002 @11:11AM (#4625217)
  
  I've just proofed four pages, a mix of modern English, quoted Cockney and religious babble (Jonah 4:13, 9 etc.)
  
  OK it's only four pages, but the errors I've corrected so far have been when the scan has been poor and the OCR software has had to make a guess.
  
  Parent Share
  twitter facebook
- Re:use proofreading meta-data to improve OCR! (Score:3, Informative)
  
  by dmoynihan ( 468668 ) writes:
  
  Actually, they're working on that.
  
  The program is Gutcheck, [sourceforge.net] was developed by PG's Jim Tinsley.
  
  Catches a lot!
Read? (Score:5, Funny)

by uneek ( 107167 ) writes: on Friday November 08, 2002 @10:58AM (#4625118)

Don't you mean run a compare tool in the background using CPU idle time right?

You don't actually want us to read a
page of literature do you?

Share
twitter facebook
A better way - have computers do more work. (Score:5, Interesting)

by lawpoop ( 604919 ) writes: on Friday November 08, 2002 @11:02AM (#4625147) Homepage Journal

I was thinking -
In order to make the proofing faster, maybe you could OCR a document 2 or 3 times, and then have only the disagreements proofread.
We use omnipro here at work, and I'm surprised at how well it works, even recreating page formats.
Of course, it doesn't work 100%, but it sure does get about 95%. If you were to OCR a document 2-3 or more times, and most of it was identical, it would save a lot of time if you had humans going over only the parts that the different OCRs didn't agree on.
Steve Lefevre

Share
twitter facebook
- Re:A better way - have computers do more work. (Score:2, Insightful)
  
  by hands ( 173986 ) writes:
  
  In order to make the proofing faster, maybe you could OCR a document 2 or 3 times, and then have only the disagreements proofread.
  
  This may eliminate some of the OCR errors, but it won't speed up the process because a good editor reads every word. You are asking for more errors when you ask your editors to become lazy and skip words.
  Most OCR will probably misread the same character incorrectly every time (read 'B' as '13', for example). That kind of error will not be flagged, and will be overlooked by editors who are used to only looking for flagged errors.
- Re:A better way - have computers do more work. (Score:4, Informative)
  
  by noodlez84 ( 416138 ) writes: on Friday November 08, 2002 @12:13PM (#4625629)
  
  Although your method of "proofreading" is actually useful for most documents, it is _not_ a good method for Project Gutenberg (as a contributor to DP, I can attest to this).
  
  The works put out by Project Gutenberg are going to be around for decades, if not, centuries. 95% accuracy is shit for those purposes. An issue that comes up on the PG mailing list (gutvol-d) every once in a while is whether or not to correct spelling mistakes that appear in the real, dead-tree versions of the books. What if, for example, it's obvious to almost any reader that the author meant the word "by" instead of "bye". Surprisingly (or not, depending on the way you look at it), the general response is *not* to correct those kinds of "mistakes". The rationality being that PG is -not- an editor, but simply a library (which is actually its legal status).
  
  So, in short, for works with millions of characters that are going to be around for many decades, 95% accuracy. The "bar" might be high, and, when proofreading for DP, I strive for 100%.
  
  Parent Share
  twitter facebook
- Re:A better way - have computers do more work. (Score:3, Insightful)
  
  by leuk_he ( 194174 ) writes:
  
  [i] it doesn't work 100%, but it sure does get about 95%[/i]
  
  THAT IS 2000/20=100 errors per page.(That is the way OCR works, if it 99% ok, it is still 20 errors per page.
  
  And that doesn't include "strange" formatting like things scribbleing things in margins or heading above pages, italics and extra spaces.
  
  By the way you are not supposed to correct spelling errors made in the original pager. especially since this is often "old" english.
Better make it quick (Score:3, Funny)

by CatWrangler ( 622292 ) writes: on Friday November 08, 2002 @11:05AM (#4625173) Journal

The new congress might extend copyright protection to Shakespeare's great great great great great great great great great great great great great grandson's nephew's out of wedlock kid's son whose paternity is in question.

Share
twitter facebook
will this work? (Score:2, Interesting)

by smeg168 ( 92477 ) writes:

I have a little problem with the logistics here. I can understand why every page is being sent to 2 people for proof reading in an effort to eliminate errors, but the problem arises that these arent 2 computers doing simple computations, if both of these people have different versions of a corrected page, as im sure they will. what happenes then? who does the final proof reading, and if there is someone doing the final proof reading that kinda eliminates the need for the distributed part. I could almost guarentee that any 2 people checking the same full page of data in their free time will find/create different errors. I hope I'm missing some large concept here, becouse i do love PG, they keep my palm stacked with good reading for free.
- Re:will this work? (Score:3, Insightful)
  
  by GiMP ( 10923 ) writes:
  
  These are humans comparing identical books to text.. if they have the IDENTICAL book they won't have this problem.
  
  Gutenburg often has published the same 'book' but of different publications due to slight variations in the text.
- Re:will this work? (Score:4, Informative)
  
  by clonebarkins ( 470547 ) writes: on Friday November 08, 2002 @11:32AM (#4625378)
  
  who does the final proof reading, and if there is someone doing the final proof reading that kinda eliminates the need for the distributed part.
  
  charlz has a workflow diagram [archive.org] for the works that go through his site. As you see, each book has a project manager, who has final processing/proofing responsibilities.
  
  Also, I'm not sure you get the idea of two rounds of proofing. They don't see different versions of a corrected page -- the first one sees the straight OCR output (or, sometimes the project manager will do some automated corrections on it first) and then the first round proofer edits the text. Then, when all the pages have gone through the first round, the second round proofer reads the text as it was edited by the first round proofer. This helps because it builds off the edits of the first round proofer and allows the second round proofer to perhaps catch things not caught in the first round.
  
  When proofreading, you're never going to capture all the mistakes with one pair of eyes. A distributed proofreading effort is very beneficial to the goals and efforts of Project Gutenberg, and I applaud the efforts of all those who have proofed even one page.
  
  Having said that, I've done over 300 (under a different name).
  
  Parent Share
  twitter facebook
Why he came to slashdot (Score:2, Funny)

by cachapa ( 558513 ) writes:

I think he was just watching all his volunteers working on one page a day and thought:
"Imagine a beowulf cluster of these!"
Books read to you while commuting (Score:3, Interesting)

by dudemaster ( 228232 ) writes: on Friday November 08, 2002 @11:19AM (#4625271)

How about this.... use an open source speech synthesis tool/API that can play these text books (especially as more get added) over a PDA, laptop, etc while cruising in on the way to work and home. Something like:

http://www.cstr.ed.ac.uk/projects/festival/
(no plug, just did a quick freshmeat search)

would be pretty cool to get some good novels read to you w/o buying the tapes.

Share
twitter facebook
Just one page a day, huh? (Score:5, Funny)

by WIAKywbfatw ( 307557 ) writes: on Friday November 08, 2002 @11:23AM (#4625304) Journal

Sure, it starts as just one a day. But, before you know it, you're doing two, then five, then ten.

You stop going out with friends or even returning their calls, personal hygiene takes a back seat and even Counter Strike and Warcraft III become unappealling. And, finally, after countless chapters and hundreds of pages you realise that you're friends were right: you're an addict.

Just one page a day, huh? Yeah, right.

Opium. Pot. Cocaine. Now pages.

It might not be your older brother's drug, or your Daddy's or your grandfathers, but, trust me, this stuff can be dangerous.

Do what I do. Just say no.

Share
twitter facebook
What books need to be done? (Score:3, Interesting)

by Alethes ( 533985 ) writes: on Friday November 08, 2002 @11:29AM (#4625348)

Is there a list of books that are out of copyright and perhaps the status of those books on the Gutenberg Project website or anywhere else?

Share
twitter facebook
- Re:What books need to be done? (Score:3, Informative)
  
  by clonebarkins ( 470547 ) writes:
  Check out the following for a start:
  
  Books in Progress and Requested [upenn.edu]
  
  Steve Harris' PG To-do List [steveharris.net]
  
  David Price's In-Progress Page [freeserve.co.uk] (some have been "in-progress" for quite awhile now, so they are probably free to grab)
Possible Enhancements (Score:5, Interesting)

by Niles_Stonne ( 105949 ) writes: on Friday November 08, 2002 @11:31AM (#4625361) Homepage

This a great project... But after doing my first page I found a couple of possible enhancements.

Add a "Quality" stat for each person. Base it on the number of things that were missed(another words, the number of things that the second-string proofer finds).

Use more than just two proofers. Have one "First String" proofer, who could be anybody, but have two second string proofers (who both get the output of the first string proofer). If the second string proofers have any differences in their output(with the exception of white space), then another second string proofer should be used. Only proofers with a certain quality rating(slightly higher than what a newbie's would be) should be able to do the second string proofing.

The "User rating" should be a combination of the number of pages done and the quality rating of those pages. Note that quality rating would only be increased by doing first string proofing. Page count would go up for any proofing.

Quality could be a float, starting at 1.0 for newbies. Every page that is completed and has a second-string person check would then go into a calculation like:

_new_quality_ = _old_quality_ + (0.01 - (_num_differences_between_their_proof_and_final_pr oof_ / 1000))

Thus, for every page proofed that requires NO corrections by the second string the user's quality would go up by 0.01. ( 0.01 - 0/1000 = 0.01 )

if there were more than ten errors in the proofing, their quality would go down ( 0.01 - 10/1000 = 0.00 ), (0.01 - 20/1000 = -0.01)

Have a threshold of 1.10 or some such for second string proofers... That way it would require the user to do at least 10 perfect pages, or 20 pages with 5 errors, etc, before they could do the second string proofing.

Obviously, make sure that the second string proofer can't see who the first string proofer is.

The "User Rating" (mentioned above) could just be a multiplication of the Quality and Page Counts...

Share
twitter facebook
- - Non-native proofers (Score:3, Informative)
    
    by Sangui5 ( 12317 ) writes:
    
    are actually the preferred way to proof text. A project to create "The Collected Works of Edmund Spenser" is headquartered here, and the English-types were looking for people to work on some software for them. The current most accurate way to create an electronic copy is to hire people without even a passing familiarity with the alphabet you are targeting, train them to identify the letters themselves (using the font you're targetting, which may be very much non-standard, esp. for work as old as Spencer's), and have them enter it in character by character. You then have another illiterate person do the same, and have 1 editor (English graduate student) check both copies. Then any differences have to be handled by another editor (English PhD), and the final copy signed off by yet another editor (PhD).
    
    A very very expensive way to do it.
    
    See, an illiterate person won't introduce any bias into the text. They will faithfully duplicate any spelling mistakes that they find. In the case of an English scholarly collection, the mistakes are amoung the most important part, since they can identify different print runs, and how language shifts over time.
    
    As a side note, the software project is hopeless. The best that cann be managed is to automate the administration of their current systems--no OCR will ever meet the level of accuracy that their current system provides.
ASCII Only? (Score:5, Insightful)

by vondo ( 303621 ) writes: on Friday November 08, 2002 @11:34AM (#4625389)

Reading the blurb at the page-a-day site, it says ASCII only where bold is converted to ALL CAPS, the English pound symbol is rendered as "L," etc. No preservation of figures, drawings, or photos.

This seems very short sighted to me. Devices that can only display ASCII are becoming rarer and rarer. Why not, instead, store docs in some sort of SGML format to handle the special markup (which must be rare) and then down convert to ASCII when needed.

I've tried reading these things on my Palm. Very difficult. But if I could get a nice typeset PDF version, that would be a whole different story (no pun intended).

Share
twitter facebook
- Re:ASCII Only? (Score:5, Informative)
  
  by Robotech_Master ( 14247 ) writes: on Friday November 08, 2002 @11:42AM (#4625440) Homepage Journal
  
  Check out Black Mask [blackmask.com] for a lot of nicely-formatted pubdom e-books, including many from Gutenberg but also some that Gutenberg doesn't have.
  
  Parent Share
  twitter facebook
- Re:ASCII Only? (Score:4, Informative)
  
  by rusty0101 ( 565565 ) writes: on Friday November 08, 2002 @12:53PM (#4626013) Homepage Journal
  
  When the project was started, SGML varients were not widly used, and the option of including images was a concern for storage space.
  
  Using things like BOLD and L for british pound were workarounds to have a common way of presenting the data. I suspect that it would be trivial to build a formating filter in perl, or another language that would convert BOLD to bold though it would require a bit more work to recognize that it really should be Bold or even that it should be BOLD.
  
  Converting monetary symbols would require a bit more work, but would also not be impossible.
  
  Re-inserting any diagrams, figures, illustrations or other graphics would require more work. If the original scanned pages are still available, as this part of the project suggests, even that would not be impossible.
  
  One variation is the free bookmobile project that is out there. They use scans of the original book to build a new book for kids. Preparation for printing involves downloading the book over the internet, via a dsl speed sattelite link. I am not sure however if the working material is suitable for e-book reading however.
  
  -Rusty
  
  Parent Share
  twitter facebook
- Re:ASCII Only? (Score:3, Informative)
  
  by quinto2000 ( 211211 ) writes:
  
  From actually proofing a few pages, this depends entirely on the particular project and when it was started. Some of the newer ones allow special characters.
Distributed Proofreading has a "high score" table. (Score:3, Insightful)

by Lovepump ( 58591 ) writes: on Friday November 08, 2002 @11:37AM (#4625413)

How long before someone writes a script to hit "Save and get another Page" and they shoot to the top of the ladder claiming to have proofread 13,450,213 pages per day...

Share
twitter facebook
Scanning without damaging the book? (Score:3, Interesting)

by mttlg ( 174815 ) writes: on Friday November 08, 2002 @12:01PM (#4625559) Homepage Journal

I have a few books that are old enough to be well out of copyright (and obscure enough not to be found online already), and for a while I have been considering typing them in. OCR would be a lot easier, but getting a good image from a flatbed scanner would seriously damage most of these books. Even a handheld scanner would be impractical in some cases, and a digital camera seems even less likely to work. Is there any reasonable way to scan in pages from something like a 100+ year old 1.5" thick wire-bound paperback book that only opens about 60 degrees before putting up a fight?

Share
twitter facebook
- Re:Scanning without damaging the book? (Score:5, Informative)
  
  by jpetts ( 208163 ) writes: on Friday November 08, 2002 @12:19PM (#4625697)
  
  Is there any reasonable way to scan in pages from something like a 100+ year old 1.5" thick wire-bound paperback book that only opens about 60 degrees before putting up a fight?
  
  Yes indeed! *Any* decent academic library should have a photocopier which can do this. Older models tend to have a glass platen which extends right to the edge of the photocopier, and the side slopes away at around 60 degrees rather than dropping at a right angle. Newer models, such as the Minolta PS3000 will support the book in a cradle, face up, so that contact with the pages is minimised. They also tend to have a host of features, such as automagically erasing the gutter shadow that one gets with such a system.
  
  Parent Share
  twitter facebook
Are any of these resources distributed? (Score:3, Insightful)

by wls ( 95790 ) writes: on Friday November 08, 2002 @12:21PM (#4625715) Homepage

It seems like every few years I turn around and notice that some massive archive collection gets sued, goes out of business, has funding pulled, gets tangled in legal action, has a university board go into panic mode, etc. and suddenly it disappears without warning or notice to the frustration of many. I'm certain you also can name a number of services, collections, and resources that spontaneously vanished when hosted at friendly sites. History has proven that despite best intentions, nothing lasts forever unless we go out of our way to protect it.

So that work isn't lost or destroyed, are any of the mega-sized projects replicated elsewhere in the event that a "it'll never happen" situation crops up to this unsuspecting resource?

Share
twitter facebook
Can't get through? Try ibiblio (Score:3, Informative)

by gbnewby ( 74175 ) writes: on Friday November 08, 2002 @01:15PM (#4626198) Homepage

The main Gutenberg page is slashdotted right now, but you can get nearly the same access to the books via the main ibiblio page at ibiblio.org/gutenberg [ibiblio.org], which is the main distribution site for the collection.
It looks like the texts01.archive.org/dp site is holding up fairly well! If you cannot get through today, though, please check back later. Slashdot effect aside, it's usually quite speedy and has a decent 'net connection. If you want to keep informed of current events, get on one of our mailing lists via (when it's not slashdotted) our subscriptions page [promo.net].
Dr. Gregory B. Newby
Chief Executive and Director
Project Gutenberg Literary Archive Foundation http://gutenberg.net
A 501(c)(3) not-for-profit organization with EIN 64-6221541
gbnewby@ils.unc.edu // 919-962-8064

Share
twitter facebook
Looking for proofreaders on slashdot !! (Score:5, Funny)

by tadas ( 34825 ) writes: on Friday November 08, 2002 @02:44PM (#4626908)

If they're looking for proofreaders here, the project is in deep trouble...

Share
twitter facebook
- Re:Legal Implications (Score:2, Informative)
  
  by phil reed ( 626 ) writes:
  
  I can't decide if this is a joke or not.
  
  You do know about Project Gutenberg [promo.net], right?
- Re:Legal Implications (Score:4, Informative)
  
  by Junta ( 36770 ) writes: on Friday November 08, 2002 @10:42AM (#4624985)
  
  The only works that go into PG are works in the public domain. While publishers sell dead-tree copies still, they have no copyright over the original text contained within. (Which is why these works are typically available through multiple publishers.
  
  Parent Share
  twitter facebook
  - Re:Legal Implications (Score:5, Interesting)
    
    by stinky wizzleteats ( 552063 ) writes: on Friday November 08, 2002 @10:50AM (#4625060) Homepage Journal
    
    While publishers sell dead-tree copies still, they have no copyright over the original text contained within.
    
    What? You mean to suggest that you have an actual example of a publisher making money without tyranny over the content?
    
    Gasp!
    
    Parent Share
    twitter facebook
  - Re:Legal Implications (Score:2, Insightful)
    
    by astrosmurf ( 546405 ) writes:
    
    The only works that go into PG are works in the public domain. While publishers sell dead-tree copies still, they have no copyright over the original text contained within
    
    But the publishers still have copyright on their specific printing. Distributing scanned copies of pages probably still violates their copyright, even if distributing the OCR output does not.
    - Re:Legal Implications (Score:2, Informative)
      
      by Anonymous Coward writes:
      
      >But the publishers still have copyright on their specific printing.
      
      Nope. Copyright holders (not necessarily the publisher) would have copyright on editorial corrections and (for music: a weird case) some on appearance, but not on the original text.
      
      Publishers often claim copyright on the entire contents of 300 year old works, but they have no legal basis for this.
- Re:Legal Implications (Score:2)
  
  by Chundra ( 189402 ) writes:
  
  Not when the authors have been dead for 300 years.
- Re:Legal Implications (Score:4, Informative)
  
  by seizer ( 16950 ) writes: on Friday November 08, 2002 @10:43AM (#4624996) Homepage
  
  It helps if you read the FAQ list.
  
  Due to copyright laws, it is only legal to do this with older books (copyrighted 75 or more years ago). As a result, Project Gutenberg is mostly comprised of the "Classics."
  
  Parent Share
  twitter facebook
- Mod Parent 'Twat' (Score:3, Funny)
  
  by henben ( 578800 ) writes:
  
  Nuff said.
- Re:Book Pirating? (Score:2, Informative)
  
  by phil reed ( 626 ) writes:
  
  So are the books they are digitizing all in the public domain?
  
  Yup.
  It doesn't seem like there would be that many books in the public domain that haven't already been made available on the net.
  
  How do you suppose they make it to the net? Most of the public domain books were written before word processors, so there's no electronic text around.
  
  Of course I could be wrong.
  
  Yeah. Go look at Project Gutenberg's site - think of it as you homework assignment for the weekend.
- Re:Book Pirating? (Score:4, Informative)
  
  by raju1kabir ( 251972 ) writes: on Friday November 08, 2002 @10:49AM (#4625045) Homepage
  
  So are the books they are digitizing all in the public domain? It doesn't seem like there would be that many books in the public domain that haven't already been made available on the net. Of course I could be wrong.
  
  And you probably are. The best efforts of our duly elected Congressional representatives notwithstanding, copyright still does expire. After that, a work passes automatically into the public domain. That means there are hundreds of thousands of books available.
  
  In fact, if you've previously seen the classics online, they probably came from this project, which has been around for almost as long as I can remember.
  
  Parent Share
  twitter facebook
- - Re:How do I get to plug my online website? (Score:2, Insightful)
    
    by Anonymous Coward writes:
    
    a wonderful resource for poor areas.
    
    And where do the poor get online? In libraries.
    D'oh!
    - Re:How do I get to plug my online website? (Score:2, Funny)
      
      by Anonymous Coward writes:
      
      And where do the poor get online? In libraries.
      
      Hey, shut the fuck up. This site is about technology for technology's sake. We talk about humanitarian things just to justify it to our own conscience to relieve the guilt. Don't make us think logically!
      
      Remember, it's TECHNOLOGY = GOOD. WE ARE FUZZY BUNNIES THAT LOVE EVERYONE AND THINK WE'RE COOL 'CAUSE WE WRITE "HELLO, WORLD" IN C.
- Re:copyrights? (Score:2)
  
  by A Commentor ( 459578 ) writes:
  
  The 'Project Gutenberg' is about making old books that have (finally) fallen into public domain available to whoever wants it. Those are the books I'm sure that they want to have proofed.
- Re:copyrights? (Score:4, Informative)
  
  by Jeremy Erwin ( 2054 ) writes: on Friday November 08, 2002 @10:47AM (#4625036) Journal
  
  Copyrights aren't perpetual. The Gutenberg project aims to publish books that are no longer, or have never been under copyright.
  
  Parent Share
  twitter facebook
  - Re:copyrights? (Score:2, Insightful)
    
    by Anonymous Coward writes:
    
    Copyrights aren't perpetual In Theory. But isn't disney and microsoft (MS wrt printed works esp) working hard to insure they're perpetual In Practice?
  - Re:copyrights? (Score:3, Insightful)
    
    by msouth ( 10321 ) writes:
    
    Copyrights aren't perpetual. The Gutenberg project aims to publish books that are no longer, or have never been under copyright.
    
    Well, copyrights weren't perpetual. Whether they will be or not remains to be seen.
- Re:Which books are getting converted? (Score:5, Informative)
  
  by teeker ( 623861 ) writes: on Friday November 08, 2002 @10:50AM (#4625064)
  
  The books that are being converted are whatever people feel like contributing.
  
  Don't think your favorite authors are being represented? Can you demonstrate that the work is out of copyright? Make the conversion yourself!
  
  Doing the hard work yourself is the best way to guarantee your interests are represented.
  
  Parent Share
  twitter facebook
- Re:Which books are getting converted? (Score:2)
  
  by Chundra ( 189402 ) writes:
  
  I'm sure interrest could be affected if people could, say, vote on what would be converted. Or do I make any sense?
  
  I'm trying to make sense of this, please help me out. Are you saying that if people could vote on which books are converted (or "electronificated" as we sometimes call it in the industry), that more people might be interested in the project?
- Re:public domain books? (Score:2, Informative)
  
  by teeker ( 623861 ) writes:
  
  True, but Project Gutenberg is a repository for digital copies of literature that are public domain. To remain a legitimate entity, they can't publish copyrighted works (without the author's consent).
  
  So, the answer to your question is no. But that's what p2p is for ;-)
- Re: (Score:2, Informative)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
- Re:Umm... (Score:5, Interesting)
  
  by jandrese ( 485 ) writes: <kensama@vt.edu> on Friday November 08, 2002 @11:17AM (#4625256) Homepage Journal
  
  Someone needs to do a google search on " Public Domain [benedict.com]". Public domain is there for a reason. Just as Copyright is available to give the artist a means of supporting himself, it was never ment to last his entire life. The purpose is to give the artist an incentive to work, current copyright law fails in this respect because an artist only needs to create one successful work and can immediatly switch to being a leech on society for the rest of his (and his childrens, and childrens childrens) life. Having the works pass into the Public domain is a good idea for two reasons:
  1. It is for the greater good of society as other people build on earlier works.
  2. It keeps the artist busy as they were supposed to have to keep releasing work to feed themselves as their early work passed into the public domain, just like any other job.
  
  Parent Share
  twitter facebook
  - Re:Umm... (Score:4, Insightful)
    
    by Twylite ( 234238 ) writes: <twylite@cryptCOW.co.za minus herbivore> on Friday November 08, 2002 @12:04PM (#4625571) Homepage
    
    Copyright law is supposed to give incentive to create, for the betterment of society, and allow the creator to derive direct benefits as a reward. An artist who has created a work so successful that (s)he can live on it indefinitely has arguably provided a suitable level of betterment to society.
    
    Saying that copyright law is an incentive to "work" is accepting mediocracy. Artists who produce works that society values more highly should (have the opportunity to) receive more benefits.
    
    On the other hand, I don't necessarily agree that copyright should last the lifetime of the creator (although there are strong arguments for this in the case of a natural person). But what is a "fair" limit?
    
    Is 5 years enough? Almost certainly not. Many authors only achieve popularity after 10 or more years, and then make a fair amount of money off increased sales of their older works. A good number accept this as a risk, and plan to use this phenomenon to their benefit - work up a good number of titles with varied content, and you'll pull more readers, who are then likely to try some of your other titles.
    
    Is 20 years enough? Maybe. But some of our best-loved authors were 15-20 years ahead of their time in terms of what readers wanted.
    
    Is life enough? Strangely, no. If an aging star has just completed his/her autobiography, concludes the publishing deal, and dies ... well, the family could well be screwed.
    
    Maybe the answer lies in a compromise, rather than an all-or-nothing approach. Copyright over a work lasts for the greater of 10 years or the creator's natural life (which gets very interesting when we get eternal life medications ...). But some rights fall away after the LESSER of those two times, such as exclusivity over derivative works (but not translations).
    
    This allows society to (culturally) enrich itself by building on a work after a shorter amount of time, while the creator (and/or family) can still derive value from the original work for a longer time.
    
    In the case of books this is easily understood: author writes book; 10 years later other people can write preludes and sequals, extend the world and characters, etc; 30 years later author dies and original book falls into public domain.
    
    Parent Share
    twitter facebook
- Re:Umm... (Score:2, Interesting)
  
  by Big_Breaker ( 190457 ) writes:
  
  Lots of books aren't copyrighted anymore as the copyright expired. You see back before Disney bought legislation from people like Sonny Bono copyrights would be allowed to expire after about 50 years or so.
  
  Beowulf, Moby Dick, Shakespearre's plays, etc are all free as in speach and beer. Edited versions of the original text can be copyrighted. Examples of that are edition of Shakespearre's plays with "translations" next to the original text. You can buy his complete works, unedited, for very little $ these days. The only cost for the publisher is printing and typesetting.
- Re:I am programmer, let's automate this (Score:3, Insightful)
  
  by Sloppy ( 14984 ) writes:
  
  There has to be a better way, a programming way, to get this done without having to look at all of the files with human eyes.
  It's not human eyes that are needed, it's human brains. If it is possible to automate, then the OCR doesn't need checking; it just needs to be upgraded to include whatever algorithm that you're about to invent.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Stop reading this (Score:5, Insightful)

Re:Stop reading this (Score:3, Insightful)

Proofing FAQ (Score:3, Informative)

And you ask the /. community.. (Score:5, Funny)

Wow, what a scary thought (Score:5, Funny)

Re:Wow, what a scary thought (Score:2, Funny)

Re:And you ask the /. community.. (Score:5, Funny)

Re:And you ask the /. community.. (Score:2, Funny)

Re:And you ask the /. community.. (Score:5, Funny)

Re:And you ask the /. community.. (Score:5, Funny)

Re:And you ask the /. community... (Score:5, Funny)

Re:And you ask the /. community.. (Score:5, Funny)

Re:And you ask the /. community.. (Score:4, Funny)

Re:And you ask the /. community.. (Score:2)

Re:And you ask the /. community.. (Score:4, Funny)

Re:And you ask the /. community.. (Score:3, Funny)

Re:And you ask the /. community.. (Score:5, Informative)

Re:And you ask the /. community.. (Score:4, Insightful)

Re:And you ask the /. community.. (Score:3, Insightful)

Excellent (Score:2, Flamebait)

And I shall call it... the wheel! (Score:3, Funny)

Re:Excellent (Score:3, Funny)

Duplicity? (Score:2)

Just one page a day? (Score:5, Funny)

Re:Just one page a day? (Score:4, Funny)

OCR Software (Score:4, Interesting)

Re:OCR Software (Score:4, Informative)

Re:OCR Software -- Clara, perhaps? (Score:5, Informative)

Re:OCR Software -- Clara, perhaps? (Score:3, Informative)

Re:OCR Software (Score:2, Insightful)

Obvious... (Score:5, Funny)

Copyright is not an issue (Score:5, Informative)

Re:Copyright is not an issue (Score:3, Informative)

Re:Some PG books ARE copyrighted... (Score:5, Informative)

The best thing next to GNU/Linux (Score:5, Insightful)

Re:The best thing next to GNU/Linux (Score:2, Redundant)

Re:The best thing next to GNU/Linux (Score:2, Funny)

Re:The best thing next to GNU/Linux (Score:4, Interesting)

A better use of time (Score:5, Insightful)

Re:A better use of time (Score:2, Informative)

No, not really (Score:4, Insightful)

Re:A better use of time (OK, here's mine) (Score:5, Funny)

server test under load (Score:2, Funny)

Dirtributed OCR? (Score:4, Interesting)

Graphics (Score:4, Interesting)

Re:Graphics (Score:4, Informative)

use proofreading meta-data to improve OCR! (Score:5, Interesting)

Re:use proofreading meta-data to improve OCR! (Score:4, Insightful)

OCR errors mostly caused by poor scan quality (Score:4, Informative)

Re:use proofreading meta-data to improve OCR! (Score:3, Informative)

Read? (Score:5, Funny)

A better way - have computers do more work. (Score:5, Interesting)

Re:A better way - have computers do more work. (Score:2, Insightful)

Re:A better way - have computers do more work. (Score:4, Informative)

Re:A better way - have computers do more work. (Score:3, Insightful)

Better make it quick (Score:3, Funny)

will this work? (Score:2, Interesting)

Re:will this work? (Score:3, Insightful)

Re:will this work? (Score:4, Informative)

Why he came to slashdot (Score:2, Funny)

Books read to you while commuting (Score:3, Interesting)

Just one page a day, huh? (Score:5, Funny)

What books need to be done? (Score:3, Interesting)

Re:What books need to be done? (Score:3, Informative)

Possible Enhancements (Score:5, Interesting)

Non-native proofers (Score:3, Informative)

ASCII Only? (Score:5, Insightful)

Re:ASCII Only? (Score:5, Informative)

Re:ASCII Only? (Score:4, Informative)

Re:ASCII Only? (Score:3, Informative)

Distributed Proofreading has a "high score" table. (Score:3, Insightful)

Scanning without damaging the book? (Score:3, Interesting)

Re:Scanning without damaging the book? (Score:5, Informative)

Are any of these resources distributed? (Score:3, Insightful)

Can't get through? Try ibiblio (Score:3, Informative)

Looking for proofreaders on slashdot !! (Score:5, Funny)

Re:Legal Implications (Score:2, Informative)

Re:Legal Implications (Score:4, Informative)

Re:Legal Implications (Score:5, Interesting)

Re:Legal Implications (Score:2, Insightful)