Project Gutenberg's 32nd Birthday

Project Gutenberg's 32nd Birthday 178

Posted by timothy on Friday July 04, 2003 @02:04PM from the read-franklin's-autobiography dept.

David Moynihan writes "July 4th marks the 32nd anniversary of that day in 1971 when Michael Hart first sped an all-caps version of the Declaration of Independence to anyone and everyone then on what later became the web, thus founding Project Gutenberg. Thanks to an army of volunteers and the Distributed Proofreaders, this is the last year PG will have fewer than 10,000 titles. Strangely, Microsoft picked this dual anniversary of literacy and freedom to re-launch their Reader product, with three free bestsellers a week, if you activate the new version with Passport, sign a EULA, etc. Real reason for the upgrade might be that the DRM on MS's old Reader was cracked. If you're not into giving away data, or are running a system other than Windows, maybe you could take the time to tell a friend about free books online, or even help out by visiting the Distributed Proofers and editing one page per day."

Project Gutenberg's 32nd Birthday

This discussion has been archived. No new comments can be posted.

Search 178 Comments Log In/Create an Account

Comments Filter:

More free books (Score:5, Informative)

by Cruciform ( 42896 ) writes: on Friday July 04, 2003 @02:18PM (#6368324) Homepage

The Baen Free library [baen.com] has a number of titles available in several formats.

It's a great way to introduce readers to a series or a talented new author.

Re:XML please (Score:5, Informative)

by starseeker ( 141897 ) writes: on Friday July 04, 2003 @02:37PM (#6368410) Homepage

I think they discuss this somewhere. The whole point of ASCII is that it can be accessed simply, by almost any machine. It is as stable a format as you will find for data storage, anywhere. They are commited to these books being widely readable, and ASCII is the best way to assure this.

However, I agree that some books (most actually) lose something in ASCII. What I would like to see is a project which works off the basic Gutenberg texts and formats them in a readable way, preserves illustrations, etc. But it should be an add on to the project, not the main project. Also, remember that that level of preservation is much harder than just typing in and proofreading - you have to consider formatting and scanning images as well.

As a temporary measure, it would be nice to see someone do an XML markup that can be easily translated into LaTeX, so people can have pdfs with nice fonts, table of contents, title page, etc. That would be a step up. But to do it properly would take a separate effort, and a very large scale one even by Gutenberg standards. Worthwhile, yes. But involved.

Re:XML please (Score:3, Informative)

by DarkOx ( 621550 ) writes: on Friday July 04, 2003 @02:45PM (#6368445) Journal

The entire point of the project is to preserver the content in a format that is both human and machine readable. See if I don't have any software from the present here in fifteen years and XML is long dead I will still be able to read standard ASCII text even if I am just cat(ing) it through less or printing it as is. I can't resonably read a book that is filled with XML tags and if there is no longer software to parse them then its not to useful. I am not saying that it would be hard to write such software but, the concept is to make sure its easy and always easy to get the data. Also they do put chapter breaks in as text so if you want to find one most wordprocs and e-book readers these days even the fifteen year old ones can find text strings.

Re:XML please (Score:5, Informative)

by Teancum ( 67324 ) writes: <robert_horning AT netzero DOT net> on Friday July 04, 2003 @02:56PM (#6368511) Homepage Journal

Michael Hart has repeatedly made mention that he does not want to get caught up into the fad of the moment with text formatting issues, and that plain old ASCII is one constant that hasn't needed changing. Indeed, you can open up the original Declaration of Independence document with your standard web browser, and you can still read it just fine. I dare you to try and find any other data format that was commonly used 32 years ago that you can still read with current equipment.

With that said, I believe that XML is perhaps going to have the staying power that ASCII text has had for the past many years. And there are many volunteer projects that you can get involved with that do this including:

The HTML Writers Guild [hwg.org] - Originally they were trying to convert all of the gutenberg texts to HTML, which has been admittedly a resonable standard for a good number of years. Currently they are now going to a version of XML with some standard headings for titles, copyright info (or lack thereof), chapter headings and so forth. More is on this website.

Project Gutenberg XML [pgxml.org]This is a group more dedicated to the XML, but has a very similar purpose.

The point here is that once the data is put into ASCII text format, projects like this can and are being done. If you really feel that you want to help with the effort, please join one of these. Also, at any time you can also take the Project Gutenberg files yourself and do this, but at least this gives you a forum to share your work once you are done.

Re:XML please (Score:4, Informative)

by belbo ( 11799 ) writes: on Friday July 04, 2003 @03:01PM (#6368529)

The final ASCII version is also produced by hand. After two rounds of proofing, the text gets into a queue. From that queue, a 'post-processor' checks it out and reformats it according to the Gutenberg guidelines, along with any error corrections that might still be necessary. Then she or he uploads the final version to Project Gutenberg, where the 'whitewashers' check the text yet again before posting it to the archive.

About the XML: You are in fact welcome to produce an XML version, I believe some fellows at DP indeed do that already. However, the main version is the simple text version, since you can read that with everything. But nothing keeps you from also posting an XML or PDF or TeX or whatever version.

belbo, post-processor at DP

(Boy I do hope there are no spelling errors in this *g*)

Thanks for support, plans for future (Score:5, Informative)

by gbnewby ( 74175 ) writes: on Friday July 04, 2003 @03:58PM (#6368789) Homepage
Thanks to everyone who has helped contribute eBooks and other support to Project Gutenberg! If you haven't already, please visit Distributed Proofreaders [pgdp.net] and proof a page today!

Lots of plans for the future:
- Post-#10000 formatting changes. We'll be rearranging our directories to make it easier to find things. Likely we'll go with something OAI (OpenArchives.org) compliant
- Conversion on the fly to many formats. We'll putting eBooks into XML format (mostly using teixlite.dtd, we think) for conversion on the fly to many other formats.
- New ways to donate. "Sponsor a book"
- More contemporary content. We receive donations nearly every week from currently published authors who want to make their stuff available to a wider audience (i.e., our Doctorow's Down and Out [ibiblio.org])
- Your ideas! Visit gutenberg.net [gutenberg.net] to sign up for newsletters, find out how to get started producing an eBook, and find eBooks
Thanks especially to our main and backup distribution sites, iBiblio [ibiblio.org] and The Internet Archive [archive.org]. And thanks to the THOUSANDS of volunteers who have brought us nearly to our 10,000th eBook.

Dr. Gregory B. Newby Chief Executive and Director Project Gutenberg Literary Archive Foundation http://gutenberg.net A 501(c)(3) not-for-profit organization with EIN 64-6221541 gbnewby@pglaf.org
Re:'reader' books not much cheaper (Score:1, Informative)

by Anonymous Coward writes: on Friday July 04, 2003 @04:10PM (#6368837)

Passing a hardcover or paperback book on to a friend is not a copyright violation in the U.S. and does not make you a criminal.

The principle that protects you is not Fair Use, but First Sale Doctrine -- which says that once a copyright holder distributes a copy of a work, the copyright holder loses any right to control further redistribution of that copy.

First sale doctrine (Score:3, Informative)

by yerricde ( 125198 ) writes: on Friday July 04, 2003 @04:12PM (#6368847) Homepage Journal

The law specifically says you can not distribute a work that is copyrighted without the copyright holders permission.

True, 17 USC 106 [cornell.edu] says that, but it limits itself "Subject to sections 107 through 121", such as 17 USC 109 [cornell.edu]:

Notwithstanding the provisions of section 106(3), the owner of a particular copy or phonorecord lawfully made under this title, or any person authorized by such owner, is entitled, without the authority of the copyright owner, to sell or otherwise dispose of the possession of that copy or phonorecord.

fair use laws, but the DMCA removed most of those

From the DMCA: "Nothing in this section [cornell.edu] shall affect rights, remedies, limitations, or defenses to copyright infringement, including fair use, under this title."

XML conversions look lacking. (Score:2, Informative)

by CryptOntology ( 597072 ) writes: on Friday July 04, 2003 @05:04PM (#6369088)

I just looked over the links in earlier replies (PGXML and HTML-Writers) and was surprised: HTML-Writers hasn't touched only converted 20-odd etexts from Jan to Feb 2000; and PGXML hasn't even the ability to do valid HTML curled quotes.
Both look like amateur do-gooders, and we need more of those; but these efforts should be folded back into the organisation of PG, where they may find a permanent home. The alternative is to go adrift, due to too few people being involved (only _two_ people do PGXML) to round out the abilities (and future efforts of) XML uber-format-goodness.
One major reason why I'd be interested in a longer toolchain, from scans into TXT, and TXT into XML, is to make translation easier. All the older Gutenberg etexts are in different, revised formats. Try making a parser than automagically transforms the dozen or so revisions of the one true "TXT" into XML to see what I mean. (I have; there will always be some books that break important Gutenbrth formatting placeholders).

Re:A sterling mistake (Score:3, Informative)

by dvdeug ( 5033 ) writes: <dvdeug&email,ro> on Friday July 04, 2003 @08:15PM (#6369966)

The fact is that the Gutenberg people think they're using ASCII, but are actually using Latin1. So Gutenberg texts will display correctly on any system that's localized for the U.S., Canada, or Western Europe. But not elsewhere.

Excuse me? The Gutenberg people know quite well when they're using ASCII and when they're using Latin-1. If you'll look at the books that are posted, some of the books posted from DP are posted just in ASCII, and some in 7foo.txt and 8foo.txt files, where 7foo is ASCII and 8foo is Latin-1, and a few just in Latin-1.

Re:XML please (Score:3, Informative)

by dvdeug ( 5033 ) writes: <dvdeug&email,ro> on Friday July 04, 2003 @08:28PM (#6370002)

And the conventions for math and science formulas and equations produces a complex linear format I can't believe anyone would actually want to read.

It's basically TeX, the one true math typesetting system. Most mathematicans and many scientists know it quite well. It beats the heck out of MathML (one example in a MathML tutorial was 8 characters in TeX, and about 50 in MathML.)

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Project Gutenberg's 32nd Birthday 178

Project Gutenberg's 32nd Birthday More Login

Project Gutenberg's 32nd Birthday

More free books (Score:5, Informative)

Re:XML please (Score:5, Informative)

Re:XML please (Score:3, Informative)

Re:XML please (Score:5, Informative)

Re:XML please (Score:4, Informative)

Thanks for support, plans for future (Score:5, Informative)

Re:'reader' books not much cheaper (Score:1, Informative)

First sale doctrine (Score:3, Informative)

XML conversions look lacking. (Score:2, Informative)

Re:A sterling mistake (Score:3, Informative)

Re:XML please (Score:3, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot