Project Gutenberg's 32nd Birthday 178
David Moynihan writes "July 4th marks the 32nd anniversary of that day in 1971 when Michael Hart first sped an all-caps version of the Declaration of Independence to anyone and everyone then on what later became the web, thus founding Project Gutenberg. Thanks to an army of volunteers and the Distributed Proofreaders, this is the last year PG will have fewer than 10,000 titles.
Strangely, Microsoft picked this dual anniversary of literacy and freedom to re-launch their Reader product, with three free bestsellers a week, if you activate the new version with Passport, sign a EULA, etc. Real reason for the upgrade might be that the DRM on MS's old Reader was cracked. If you're not into giving away data, or are running a system other than Windows, maybe you could take the time to tell a friend about free books online, or even help out by visiting the Distributed Proofers and editing one page per day."
More free books (Score:5, Informative)
It's a great way to introduce readers to a series or a talented new author.
Re:XML please (Score:5, Informative)
However, I agree that some books (most actually) lose something in ASCII. What I would like to see is a project which works off the basic Gutenberg texts and formats them in a readable way, preserves illustrations, etc. But it should be an add on to the project, not the main project. Also, remember that that level of preservation is much harder than just typing in and proofreading - you have to consider formatting and scanning images as well.
As a temporary measure, it would be nice to see someone do an XML markup that can be easily translated into LaTeX, so people can have pdfs with nice fonts, table of contents, title page, etc. That would be a step up. But to do it properly would take a separate effort, and a very large scale one even by Gutenberg standards. Worthwhile, yes. But involved.
Re:XML please (Score:3, Informative)
Re:XML please (Score:5, Informative)
With that said, I believe that XML is perhaps going to have the staying power that ASCII text has had for the past many years. And there are many volunteer projects that you can get involved with that do this including:
The HTML Writers Guild [hwg.org] - Originally they were trying to convert all of the gutenberg texts to HTML, which has been admittedly a resonable standard for a good number of years. Currently they are now going to a version of XML with some standard headings for titles, copyright info (or lack thereof), chapter headings and so forth. More is on this website.
Project Gutenberg XML [pgxml.org]This is a group more dedicated to the XML, but has a very similar purpose.
The point here is that once the data is put into ASCII text format, projects like this can and are being done. If you really feel that you want to help with the effort, please join one of these. Also, at any time you can also take the Project Gutenberg files yourself and do this, but at least this gives you a forum to share your work once you are done.
Re:XML please (Score:4, Informative)
About the XML: You are in fact welcome to produce an XML version, I believe some fellows at DP indeed do that already. However, the main version is the simple text version, since you can read that with everything. But nothing keeps you from also posting an XML or PDF or TeX or whatever version.
belbo, post-processor at DP
(Boy I do hope there are no spelling errors in this *g*)
Thanks for support, plans for future (Score:5, Informative)
Lots of plans for the future:
Thanks especially to our main and backup distribution sites, iBiblio [ibiblio.org] and The Internet Archive [archive.org]. And thanks to the THOUSANDS of volunteers who have brought us nearly to our 10,000th eBook.
Dr. Gregory B. Newby
Chief Executive and Director
Project Gutenberg Literary Archive Foundation
http://gutenberg.net
A 501(c)(3) not-for-profit organization with EIN 64-6221541
gbnewby@pglaf.org
Re:'reader' books not much cheaper (Score:1, Informative)
The principle that protects you is not Fair Use, but First Sale Doctrine -- which says that once a copyright holder distributes a copy of a work, the copyright holder loses any right to control further redistribution of that copy.
First sale doctrine (Score:3, Informative)
The law specifically says you can not distribute a work that is copyrighted without the copyright holders permission.
True, 17 USC 106 [cornell.edu] says that, but it limits itself "Subject to sections 107 through 121", such as 17 USC 109 [cornell.edu]:
fair use laws, but the DMCA removed most of those
From the DMCA: "Nothing in this section [cornell.edu] shall affect rights, remedies, limitations, or defenses to copyright infringement, including fair use, under this title."
XML conversions look lacking. (Score:2, Informative)
Both look like amateur do-gooders, and we need more of those; but these efforts should be folded back into the organisation of PG, where they may find a permanent home. The alternative is to go adrift, due to too few people being involved (only _two_ people do PGXML) to round out the abilities (and future efforts of) XML uber-format-goodness.
One major reason why I'd be interested in a longer toolchain, from scans into TXT, and TXT into XML, is to make translation easier. All the older Gutenberg etexts are in different, revised formats. Try making a parser than automagically transforms the dozen or so revisions of the one true "TXT" into XML to see what I mean. (I have; there will always be some books that break important Gutenbrth formatting placeholders).
Re:A sterling mistake (Score:3, Informative)
Excuse me? The Gutenberg people know quite well when they're using ASCII and when they're using Latin-1. If you'll look at the books that are posted, some of the books posted from DP are posted just in ASCII, and some in 7foo.txt and 8foo.txt files, where 7foo is ASCII and 8foo is Latin-1, and a few just in Latin-1.
Re:XML please (Score:3, Informative)
It's basically TeX, the one true math typesetting system. Most mathematicans and many scientists know it quite well. It beats the heck out of MathML (one example in a MathML tutorial was 8 characters in TeX, and about 50 in MathML.)