Project Gutenberg's 32nd Birthday

Project Gutenberg's 32nd Birthday 178

Posted by timothy on Friday July 04, 2003 @02:04PM from the read-franklin's-autobiography dept.

David Moynihan writes "July 4th marks the 32nd anniversary of that day in 1971 when Michael Hart first sped an all-caps version of the Declaration of Independence to anyone and everyone then on what later became the web, thus founding Project Gutenberg. Thanks to an army of volunteers and the Distributed Proofreaders, this is the last year PG will have fewer than 10,000 titles. Strangely, Microsoft picked this dual anniversary of literacy and freedom to re-launch their Reader product, with three free bestsellers a week, if you activate the new version with Passport, sign a EULA, etc. Real reason for the upgrade might be that the DRM on MS's old Reader was cracked. If you're not into giving away data, or are running a system other than Windows, maybe you could take the time to tell a friend about free books online, or even help out by visiting the Distributed Proofers and editing one page per day."

Project Gutenberg's 32nd Birthday

This discussion has been archived. No new comments can be posted.

Search 178 Comments Log In/Create an Account

Comments Filter:

Now for the marketing... (Score:5, Insightful)

by Blaine Hilton ( 626259 ) * writes: on Friday July 04, 2003 @02:08PM (#6368272) Homepage

Now all we need is more people promoting this in schools and printing the books. Much like the IA Bookmobile [archive.org]. It seems like the people who could use this the most, don't even know it exists.

Re:Now for the marketing... (Score:1, Insightful)

by Googa ( 608200 ) writes: on Friday July 04, 2003 @02:17PM (#6368321)

Yes, I can agree with this. We people here won't benefit from it half as much as needy school districts who could use the texts. Methinks what they really need to do is work on some awareness program, distributing the books to teachers... or even letting know that such a resource exists. With more technology in the classroom, Gutenberg shouldn't be out of reach to many teachers.

'reader' books not much cheaper (Score:4, Insightful)

by Chmarr ( 18662 ) writes: on Friday July 04, 2003 @02:19PM (#6368332)

Just on a whim, I decided to see how much cheaper titles in microsoft reader format was over a physical book.

I went to the MS Reader site and followed the links to the on-line publishers sites (such as B&N and amazon). In most cases, the reader format is only $1 cheaper, and sometimes $2 more expensive, than the corresponding paper book (soft or hardcover).

So... why in the world would anyone want to use a format that ties them to the computer?? With a paperback, I can read it anywhere, read for as long as I want without having to change batteries, and even pass the book onto a friend.

If they want to make the electonic formats more attractive, they need to make them a LOT cheaper than the corresponding paper version.

Huh??? (Score:2, Insightful)

by lilricky ( 632829 ) writes: on Friday July 04, 2003 @02:20PM (#6368335)

"...to anyone and everyone then on what later became the web..." What?? In 1971 http protocol was around? Or is the author trying to suggest that the internet became the web? I thought the web was part of the internet, not a replacement for. Perhaps Im misreading the article.

XML please (Score:4, Insightful)

by DrXym ( 126579 ) writes: on Friday July 04, 2003 @02:23PM (#6368352)

Gutenberg is great and all, but it really needs to dump the text format. So much information is lost that it makes reading some texts extremely difficult. Some format that preserved chapter headings, footnotes, illustrations etc. would be a massive step forward.

Re:Really great work by the guys behind the projec (Score:4, Insightful)

by Cthefuture ( 665326 ) writes: on Friday July 04, 2003 @02:45PM (#6368447)

Yes, they need something like that badly.

I remember poking around on PG not long ago but soon forgot about it.

If you're not looking for something specific then the site is kinda, meh. As you suggested, they need a news site, ratings, and other stats so you can see what's available.

And sections. "Technical", "Poetry", etc. Otherwise it's not very useful to the casual browser.

Re:XML please (Score:4, Insightful)

by Eloquence ( 144160 ) writes: on Friday July 04, 2003 @02:50PM (#6368480)

I can't resonably read a book that is filled with XML tags and if there is no longer software to parse them then its not to useful.
This is complete bullshit. With a proper setup you would convert the source into multiple output formats, including TXT, but you would keep the source in a format that maintains meta information such as formatting, chapters and pages. XML is used in the entire industry exactly with the expectation that it will be around for decades. Even if it won't, the open source code that we have to parse it will not magically disappear -- PG would keep using it to generate output texts from the XML source through all these years. You might as well argue that ASCII will go away.

Re:XML please (Score:2, Insightful)

by GigsVT ( 208848 ) writes: on Friday July 04, 2003 @02:55PM (#6368503) Journal

With a proper setup you could read MS Word 2000 docs 100 years from now too. The whole point is to not make it reliant on any particular software, or any particular fad.

XML hasn't been around long enough to say whether it is a fad or not. ASCII has been around longer than most of us have existed.

Re:XML please (Score:5, Insightful)

by fm6 ( 162816 ) writes: on Friday July 04, 2003 @03:02PM (#6368533) Homepage Journal

The whole point of ASCII is that it can be accessed simply, by almost any machine.

Just because you store something in XML, doesn't mean people have to use XML to read it. The whole point of XML is to have a format that you can easily transform. Transforming in ASCII is particularly easy.

XML markup that can be easily translated into LaTeX

If it's a good content-oriented XML app, it's easily transformed into LaTeX, or anything else. If it isn't a good content-oriented XML app (the StarOffice native format comes to mind) then it shouldn't be used for an online document repository.
I think the basic problem with the Guttenberg/DP people is that they've been doing things a certain way for so long, and they don't want to retool. And I can see their point -- changing over to XML is a lot of work. And the core DP team already seems pretty busy keeping the web site going.
On the other hand, I do wish they'd make it a priority. Right now I'm a volunteer proofreader, concentrating on getting out the famous Britannica 11th edition [wikipedia.org]. The amount of information that gets lost in scanning in Greek and other text with weird phonological conventions is just appalling. And the conventions for math and science formulas and equations produces a complex linear format I can't believe anyone would actually want to read.
Then again, it wouldn't be that hard to go back and insert proper markup. For 90% of the text there's a simple transform between the Gutenberg conventions and a reasonable XML format. The other 10% probably need another look anyway, and wouldn't be hard to do if they've saved the scan images. I haven't had the heart to ask if they do.

Re:XML please (Score:3, Insightful)

by Vann_v2 ( 213760 ) writes: on Friday July 04, 2003 @03:07PM (#6368554) Homepage

With some works the layout itself is an important part in comprehending them. Do blindly remove the formatting so that everyone can read it is an injustice to the original author.

Re:XML please (Score:4, Insightful)

by DrXym ( 126579 ) writes: on Friday July 04, 2003 @03:10PM (#6368568)

Yeah but the entire point of XML is that it defines structure not presentation. If you want to go off and produce something which is readable in some other format (e.g. text), feed the document through some XSL transformation or perl script and it pops out the other end in any way you desire. Someone else can feed it through something that produces a PDF, someone else a Palm e-Book, someone else braille. And this can all be automated on the server. Everyone is happy.

As for XML being long dead, this is highly unlikely. XML is just structured data and is itself just text. It would be trivial 5, 10, or even 100 years from now to pull out the data from the xml format in any way you please. Unless the grammar is horribly mangled (MS Office), it would even be possible to infer it without even knowing the grammar. I would trust Gutenberg to collectively come up with a format which would be simple for proof readers and parsers alike.

Re:Now for the marketing... (Score:2, Insightful)

by AndroidCat ( 229562 ) writes: on Friday July 04, 2003 @03:18PM (#6368610) Homepage

and speedy internet connection
The first Gutenberg books I came across were being passed around BBSs at 2400 bps or so. When they started 32 years ago, 110, maybe 300 bps. Who cares? Check the size of the files, these aren't Word documents, you know.

Re:Huh??? (Score:3, Insightful)

by dissy ( 172727 ) writes: on Friday July 04, 2003 @03:22PM (#6368629)

> "...to anyone and everyone then on what later became the web..." What??

I think they are saying in 1971 it was distributed to anyone and everyone...
Then, on what later became the web, they distributed it there too.

Keeping in mind the web ripped most of its ideas from gopher, and FTP before that, so the web wasnt a breakthrough idea out of nothingness.
But i dont think they meant it as 'distributed on one medium which later that medium turned into the web'

Thats atleast how i believe it was suppost to be read.. Hard to tell without commas and what not ;}

We should all actually read this (Score:5, Insightful)

by tie_guy_matt ( 176397 ) writes: on Friday July 04, 2003 @03:33PM (#6368692)

Putting a flag on your front porch is a great way to celebrate the 4th of July. An even better way to celebrate the United States' birthday would be to go to this site and actually read the documents that define us as a country.

In this day in age when it seems everyone is a suspected terrorist and our liberties are stripped one by one in the name of homeland security, and in the name of the rights of large companies, I wish some of our elected officials would actually read these documents sometime.

A red white and blue flag isn't what makes this country great, nor does an extremely high gross domestic product -- it is the set of ideas that where written over 200 years ago that makes the USA great.

So everyone go to this site and read those documents. Even if you aren't American you should still read those documents because everyone has the right to the freedoms that our founding fathers wrote about.

Cheaper, but useful? (Score:3, Insightful)

by yerricde ( 125198 ) writes: on Friday July 04, 2003 @04:21PM (#6368896) Homepage Journal

A speedy internet connection and tons of computers wouldn't be needed to print out documents from Gutenberg.

It still costs money to turn downloaded digital copies of works into printed copies for 100 students in a grade level.

they would realize that it would be cheaper in the longrun to get texts off Gutenberg, instead of buying pre-bound books elsewhere.

Public domain etexts, such as those offered by Project Gutenberg, would be useful in schools only under limited circumstances. Though they would be useful in literature classes in high school (and possibly middle school), forget about them in elementary school, where most books are illustrated, because most PG editions leave out illustrations. Forget about them in science classes as well; the 1911 Encyclopaedia Britannica [1911encyclopedia.com] contains outdated views of anything scientific, and anything significantly newer is tied up forever in the Bono Act and its obligatory sequels. And what keeps a publisher from tying purchases of its science books to purchases of its literature books?

Re:XML please (Score:5, Insightful)

by fm6 ( 162816 ) writes: on Friday July 04, 2003 @04:59PM (#6369068) Homepage Journal

... that plain old ASCII is one constant that hasn't needed changing.

I think you're a little unclear as to what ASCII is. As the "A" in "ASCII" indicates, it's oriented towards American applications. And it consists of a mere 127 characters, which includes 32 control characters that you don't use in text.
In point of fact, Project Gutenberg has long outgrown the 96 graphic characters in ASCII, though I think they themselves are ignorant of the fact. The seem to have experimented with characters until they found a set that displays the same on "normal" Windows, Macs and Unix/Linux. The result is something they call "extended ASCII" but that's actually subset of both ISO's Latin1 character set [czyborra.com] and Microsoft's Latin1 code page [microsoft.com].
When is this an issue? Well, I'm a DP volunteer, and I'm concentrating on the Britannica 11th edition. Lots of geographic entries, all of which contain degree symbols. This symbol is not in ASCII! If you follow the DP instructions, you end up entering byte 186 (decimal). If you're using the ISO or Microsoft Latin1 set (and if your computer is localized for the U.S., Canada, or Western Europe, you probably are) then 186 does in fact display as a degree symbol. But if your system is localized for Eastern Europe, you're probably using Latin2, and this byte stands for an S with a cedilla accent!
In short, "ASCII" is actually less universal than well-formed HTML. In which you represent the degree symbol with a character entity (°) that's the same everywhere.
Indeed, you can open up the original Declaration of Independence document with your standard web browser, and you can still read it just fine.

Hardly a representative example. The Declaration of Independence [archives.gov] was hand-written, and thus doesn't include a lot of fancy fonts or formatting. A better example is a contemporary novel, such as 1984.
As it happens I just finished re-reading this one. I read a Plucker [plkr.org] file that somebody had transformed from an HTML version [adelaide.edu.au], which in turn came from the Project Gutenberg "ASCII" version. Readable enough. But all the typographic nicities -- italics, boldface, etc. -- were reduced to ALL CAPS in the text version, and that was retained in the HTML version. Pretty distracting -- made me feel like somebody was shouting at me. Double Plus Ungood! Thoughtcrime!
...once the data is put into ASCII text format, projects like this [XML] can and are being done.

You make it sound easy. A lot of information is lost when your primary version is "ASCII". It all has to be put back by hand. There's no avoiding this for the large body of existing Gutenberg texts. And of course as recently as 5 years ago, there wasn't a real choice anyway. Even HTML had issues, and serious XML tools didn't exist.
But now XML technology is pretty mature. It makes sense to store new Gutenberg texts in XML. If people still want "ASCII" copies, the XML is easily transformed into that. Though I a lot more people will want the HTML version -- a format which is actually accessible to more people than "ASCII".
There are two reasons this won't happen soon.
The first is that somebody will have to design and implement the necessary XML apps for inputing and proofreading the texts. (Which would alsio elminate a lot of the errors proofreaders make, like entering [Greek: Tau] when they mean [Greek: T].) A huge project. As it stands, the people who maintain the DP web site have their work cut out just to keep the existing software working. That's a vali
Read the rest of this comment...

Size (Score:3, Insightful)

by Beryllium Sphere(tm) ( 193358 ) writes: on Friday July 04, 2003 @05:20PM (#6369144) Journal

My entire CD collection fits in my pocket with my iPod. If I could fit my entire book collection in my pocket, that would be a dream and a delight.

Re:XML please (Score:3, Insightful)

by jeremyp ( 130771 ) writes: on Friday July 04, 2003 @05:29PM (#6369185) Homepage Journal

Using ASCII presupposes that all the important texts you want to preserve are in American English. Since a fair amount of the important pieces of literature come from mainland Europe (actually even the British £ sign isn't in ASCII), it is clearly not up to the job and should be replaced.
Further, authors often use devices like italics or bold to add emphasis to their work and nowadays even completely different fonts and typefaces. Translating these works to ASCII with no markup actually destroys some of the information in the original works.
I'm not an enthusiastic fan of XML - too many people advocate it as a silver bullet - but this this sort of thing seems to be an ideal application.

Re:XML please (Score:3, Insightful)

by Mr. Piddle ( 567882 ) writes: on Friday July 04, 2003 @05:31PM (#6369198)

You might as well argue that ASCII will go away.

ASCII is simply 127 or 255 characters or so. Writing software to translate it is trivial, and it can even be decoded by hand, if necessary.

XML adds a lot of complexity beyond this, which hampers a person's ability to read a file with practically no software tools.

Also, XML is not as ubiquitous as you think, and huge numbers of people don't know how to use the tools to work with it.

A sterling mistake (Score:3, Insightful)

by fm6 ( 162816 ) writes: on Friday July 04, 2003 @05:56PM (#6369315) Homepage Journal

Since a fair amount of the important pieces of literature come from mainland Europe (actually even the British £ sign isn't in ASCII), it is clearly not up to the job and should be replaced.

As a matter of fact, the DP web interface allows you to enter the pound sterling symbol even if you don't have it on your keyboard. It also has a lot of accented characters that aren't in English. The fact is that the Gutenberg people think they're using ASCII, but are actually using Latin1. So Gutenberg texts will display correctly on any system that's localized for the U.S., Canada, or Western Europe. But not elsewhere.
You made a similar mistake when you entered that character, since you just entered it from your keyboard. (A natural mistake if you have a British keyboard, as I assume you do.) On some web sites, this would only read correctly on systems similarly configured. However, Slashdot puts out the header:

Content-Type: text/html; charset=iso-8859-1

which should prevent that. Still, the character entity £ is more portable, and will work even when the web page doesn't specify a character set -- and most do not.
On the other hand, Slashcode sometimes mangles eight-bit characters when it archives them. So if you seek true immortality, use the character entity!

How to sperad the word... (Score:5, Insightful)

by evilviper ( 135110 ) writes: on Friday July 04, 2003 @11:27PM (#6370651) Journal

Here's what I did...

A while back, I used wget to mirror the entire Project Gutenberg works. (I did it off-hours, and contacted them to see if it was a problem, or if there was some other more effecient way to do things)

Anyhow, with my GBs of text, I used bzip2 -9 to compress each text file. In the end, the entire collection of PG was able to fit on one CD. Since most people don't have bzip2 support I also included the free archiver, Ultimate Zip [ultimatezip.com] on the CD as well. I also put a read-me on the CD (that would appear as the first file) with basic instructions what to do.

One of the great things about CDs is how easy they are to transfer... One stamp, and a 5cent CD envelope, and you can send 2 CDs anywhere in the country (this predated Netflix AFAIK).

Anyhow, I sent these CDs to two different people, and the next time I talked with them, I found out they'd made several coppies of it. Basically, they heard someone mention some subject that related to one of the files on the CD, brought up the CD, and offered to make a copy for them. This happened a few times that I know of, and quite possibly many times that I don't know of. Quite as easy way to spread the word.

Of course, with that said, I don't read the PG texts myself... There are two reasons. The first is that I have yet to come across decent software designed for long-term reading. Something that saves your place (automatically?), something with a legible font, and something with light colored text on a dark background, which brings me to my next point...

The second reason is that monitors are all backlit... That means, reading on a computer screen is like reading text on a floursent lightbulb. It's possible for a while, but your eyes are quickly fatigued. The only screen I have that doesn't do that is my 640x240 B&W LCD screen on my Psion handheld. As good as that is, it's just too small for efective reading. Someone needs to create a non-backlit LCD screen, approx 6" (about the size of a book page) that is small, light, silent, compatible with everything, and most importantly, it needs to have good software that makes reading less work than it normally is on a computer... Until then, relectronic reading isn't going to really be feasable. Screw electronic paper, just give me a screen that doesn't hurt my eyes, and I'm set to go.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Project Gutenberg's 32nd Birthday 178

Project Gutenberg's 32nd Birthday More Login

Project Gutenberg's 32nd Birthday

Now for the marketing... (Score:5, Insightful)

Re:Now for the marketing... (Score:1, Insightful)

'reader' books not much cheaper (Score:4, Insightful)

Huh??? (Score:2, Insightful)

XML please (Score:4, Insightful)

Re:Really great work by the guys behind the projec (Score:4, Insightful)

Re:XML please (Score:4, Insightful)

Re:XML please (Score:2, Insightful)

Re:XML please (Score:5, Insightful)

Re:XML please (Score:3, Insightful)

Re:XML please (Score:4, Insightful)

Re:Now for the marketing... (Score:2, Insightful)

Re:Huh??? (Score:3, Insightful)

We should all actually read this (Score:5, Insightful)

Cheaper, but useful? (Score:3, Insightful)

Re:XML please (Score:5, Insightful)

Size (Score:3, Insightful)

Re:XML please (Score:3, Insightful)

Re:XML please (Score:3, Insightful)

A sterling mistake (Score:3, Insightful)

How to sperad the word... (Score:5, Insightful)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot