Digital Future of the Library of Congress 141
lesinator writes "On Monday the 28th the US Library of Congress
is holding the eighth lecture in its series on
Managing Knowledge
and Creativity in a Digital Context. Previous speakers include
David Weinberger on blogging,
Brewster Kahle -
founding member
of archive.org and the wayback machine, and
Lawrence Lessig on intellectual
property
and the creative commons. After the lecture questions will be taken from the audience and the internet. C-Span
will be broadcasting the lecture
live at 6:30 PM EST, and also has
archives of previous lectures. Audio archives of previous lecture are available at Audible.com in the Selected Free Media section."
At last! (Score:3, Funny)
Re:At last! (Score:5, Insightful)
Would you store each page of each book as an image? As flat ASCII text (except of pictures and diagrams, of course!)? What kind of indexing would you do? Basic indexing of book names? Full-text indexing of the contents? All that storage adds up!
In summary, the library of congress (depending on the method used) could probably fit into something ranging from a couple of gigabytes to a couple of petabytes.
Re:At last! (Score:5, Interesting)
They might even be able to generate revenue by having the ascii text freely available and searchable, while the images would cost money. That way folks just interested in the text can find it easily, while scholars and others who need to see the source material can have access at a moderate price.
Yes, and yet...no. (Score:3, Insightful)
Of course, your second paragraph shows that clearly those assumptions can't be true -- why would someone pay more for something without an additional benefit?
And you wouldn't maintain seperate databases -- pictures aren't searchable. You'd want to use any OCRd (preferably vetted afterwards) as the basis for inde
Re:Yes, and yet...no. (Score:2)
Re:At last! (Score:2)
So keeping the scanned images shouldn't really require such a tremendous amount of space.
Re:At last! (Score:2)
Indexing, by what, subject, author, and title? 1% overhead at most. Fancier googlesque searching though, could be a big hit.
And correct me if I'm wrong, but there are quite a few videos too.
Not to mention some historical stuff that can't even be digitized all that wel
Re:At last! (Score:3, Interesting)
I was working on this project just a few years back (2001-2002).
Our estimates projected that by 2005, it would be take about 4 TB of digitization EACH day to keep pace.
The first storage phase called for 180TB server.
Small representations. (Score:3, Interesting)
'Course, the problem is that these representations work if you're entering in the content with that method in the first place.
--grendel drago
Re:At last! (Score:1)
Re:At last! (Score:2, Insightful)
The "20 TB" figure comes from the smallest possible measure, treating the flat books as ASCII text. Even just considering current digital content, it's also inaccurately small by >1 order of magnitude.
It's a really really really big library.
Re:At last! (Score:3, Interesting)
What was the first reference / usage of ``LoC'' as a unit of knowledge measurement?
The first time I recall seeing it was in Michael Gear's novels, _The Artifact_ if memory serves, ~1976.
Anyone have an earlier instance?
William
Re:At last! (Score:2)
10 Terabytes: Printed collection of the U. S. Library of Congress
Re:At last! (Score:1, Flamebait)
Re:At last! (Score:1)
Re:At last! (Score:1)
Here's an idea related to audio archiving (Score:5, Insightful)
Re:Here's an idea related to audio archiving (Score:2)
Re:Here's an idea related to audio archiving (Score:2)
"I am your father's brother's nephew's cousin's former roommate."
Dude, I bremembah you now. Why didn't ya say so sooner?
all the best,
drew
http://www.archive.org/search.php?query=creator%3
That's the right idea .. carry it further (Score:5, Insightful)
The Library of Congress should insist that all 'publications' be submitted to it in open formats. What good is it if they have something on file that nobody can read! The extreme is that they have to have a licensed copy of every piece of software that ever created a file. If all the formats have to be open then at least historians can cobble together something that can read a file of interest.
With the ip laws as stupid as they are now, we run the real risk of losing the record of our age.
Re:That's the right idea .. carry it further (Score:1, Insightful)
I wouldn't say nobody. The paying members of a private club would be able to read it.
Re:That's the right idea .. carry it further (Score:2)
The government would then have to get into some emminent (SP?) domain type takings. Right?
all the best,
drew
Re:That's the right idea .. carry it further (Score:3, Insightful)
Why even have it on any digital media. I want the original records. Screw having computerized copies. This is the nations library, where a copy of everything in its' original form must be.
I have no problem with the card catalogue system
DRM and archiving are so diametrically opposed... (Score:4, Insightful)
I think the obvious solution is to archive it in a non-DRM, non-proprietary format, but transcode to a DRM/proprietary format when retrieved, if the content is not in the public domain.
Re:Here's an idea related to audio archiving (Score:2)
all the best,
drew
I was indeed taught that "beggers can't be choosers," but I am not begging, just giving "a word to the wise."
http://www.archive.org/audio/audio-details-db.php ? collection=opensource_audio&collectionid=JohnConst antakisdrewRobertsRainwaterBlues [archive.org]
Dammit! (Score:2, Insightful)
Re:Dammit! (Score:1)
Re:Dammit! (Score:1)
Re:Dammit! (Score:5, Funny)
Re:Dammit! (Score:1, Funny)
Re:Dammit! (Score:1)
-----
Check out the Uncyclopedia.org [uncyclopedia.org]:
The only wiki source for politically incorrect non-information about things like Kitten Huffing [uncyclopedia.org] and Pong! the Movie [uncyclopedia.org]!
Re:Dammit! (Score:2)
From the submitter: C-Span will be broadcasting the lecture live at 6:30 PM EST, and also has archives of previous lectures. .
Well, if it's a LIVE broadcasting, I'm pretty sure that C-Span will have to air it at whatever time the lecture will be happening. =]
Chill out, they archive their boardcasts.
Re:Dammit! (Score:2)
Nice, but how long? (Score:2, Funny)
Anyone have a good approximation? I'd like to know in Burning Libraries of Congress (BLC) please.
I'm guessing somewhere around 10-200 BLC.
Re:Nice, but how long? (Score:4, Interesting)
I'm not quite sure about the length of a BLOC, but this is a job for not-quite-manual labor. Each book requires a simple task: Scan page 1, flip page, scan page 2, page 3, flip, ad infinitum.
One way to save on time would be to contact the publshers of any book made after 1985-ish, where you can get electronic copies from the author. Some older books may have been already digitized, but it's still going to take more than 25 years unless there's a massive army working on this.
Re:Nice, but how long? (Score:4, Informative)
Also, you would generally split the load between 4-6 of these scanners for a job this big. The software is automated, and will OCR/Convert/Archive the file is one step.
As a general rule, you can fit 10,000 b/w text pages in 1GB of storage.
Re:Nice, but how long? (Score:1)
Re:Nice, but how long? (Score:2, Informative)
They are very expensive, but cool as hell.
Re:Nice, but how long? (Score:1)
As an unrelated asidem some even scan in color, but your storage requirements go way up if you do anything other than bitonal (even greyscale eats up the bytes pretty quick).
Re:Nice, but how long? (Score:1)
Cut the binding off?
Re:Nice, but how long? (Score:2)
Uhmm, no. You cut the binding off and run the pages through a document feeder, then rebind the book, using these things that some people refer to as "machines"
Some ideas (Score:5, Insightful)
i) What if the Apostles had had technological means to prevent the reproduction of the New Testament?
ii) Would our culture be diminished if the people who rediscovered Beowulf had been unable to decrypt the manuscript?
iii) Is the continual repitition and reworking of myth and fable through the Oral Tradition disrespectful of the content creators who first recorded these stories?
Re:Some ideas (Score:4, Insightful)
They wouldn't have prevented the distribution of the story their mission it was to distribute, that's for sure.
Re:Some ideas (Score:1)
Re:Some ideas (Score:4, Interesting)
In 1954, the American "New International" edition just editted the trial dialog and "re-interpreted" "it is you who said it" into "I am the Son of God." I don't think the European and Catholic churches have editted that part yet.
Re:Some ideas (Score:1)
Publication of New Testament (Score:3, Interesting)
Re:Publication of [The New Balancing Act] (Score:1)
God is on the side of DRM (Score:2)
Re:Some ideas (Score:2)
iv) Why do people of oral traditions get no legal protections for their work? (From those outside their tradition who would fix it and lock them out from their own work?) Why must it be fixed?
I know that is at least halfway to zany, but please try to give a halfway to reasonable answer.
all the best,
drew
Re:Some ideas (Score:2)
Next series (Score:4, Funny)
Sponsored by Apple and Microsoft!
Re:Next series (Score:1)
More like: Sponsored by the RIAA and Microsoft!
If you aren't happy with the DRM on the iTMS songs, I suggest the HYMN project [hymn-project.org].
Hello, Project Gutenberg?!? (Score:5, Interesting)
Michael Hart was digitizing books before digitizing books was cool, as far back as 1971, and the Project's efforts have been hugely successful on very little money. Nevertheless, I rarely see any official or media acknowledgment of the Project's efforts. If anyone should be on that panel for their ability to give advice from practical experience and performance in this field, while on a shoestring budget, it would be Hart!
No money is precisely Why (Score:1)
The business of charity does not want competition from groups that create better products for less money, as that would put pressure on them to create a reasonable amount of value themselves, without the benefits of cushy offices and hefty salaries.
The business of education also does not want competition from organizations that produce greater value at
Conspiracy much? (Score:2)
--grendel drago
Re:Conspiracy not so much? (Score:1)
(1) People in many "charitable" organizations and "educational" establishments are quite corrupt; or
(2) People in many "charitable" organizations "educational" establishments are amazingly, astoundingly stupid.
Neither bodes well, but only corruption seems to explain all the facts, especially in the case of the "education" establishment.
Baldur of Asgard
Re:Conspiracy not so much? (Score:1)
(3) People in many charitable organizations are out DOING charity, not talking about it. Kind of like Project Gutenberg.
I suspect it's the (3)s that make charity work, and make people want to keep it alive, but it's the (1)s that make the most noise and draw the most money.
IMHO there's an unfortunately large class of people who specialize in smelling the flow of money, and inserting themselves into that flow. The world would be for the most part better off without them.
Re:Hello, Project Gutenberg?!? (Score:1)
Re:Hello, Project Gutenberg?!? (Score:1)
Re:The problem I see with Project Gutenberg... (Score:2, Informative)
(2) A person who transcribes a book that is in the public domain can CLAIM a copyright on it, but this is not enforceable unless they have changed the text significantly enough for it to be
Outsource parts of LOC to Google or Amazon? (Score:5, Insightful)
Although LOC could never be replaced by a Google or Amazon, these private companies could provide services that augment or reduce the cost of LOC-like services. For example, if Amazon scans a book, why should LOC scan it too?
Re:Outsource parts of LOC to Google or Amazon? (Score:2, Interesting)
Re:Outsource parts of LOC to Google or Amazon? (Score:1)
Recent IP law allows copyright on aggregations of data even if the data itself is public domain. So if Google were to digitally archive a bunch of public domain books (copyright expired on each book) then the searchable database could still be copyrighted and owned by Google.
In order to outsource the digitization of the collection to a private company, the LOC would have to license its own collection back from that company!
Re:Profit (Score:2)
Dude, that is some business plan/method! Did you try to patent it yet?
all the best,
drew
I would have given you +1 Funny
What about a backup copy? (Score:4, Interesting)
Re:What about a backup copy? (Score:1)
.
.
.
Oh. Apparently it had occurred to them. Well, thanks, just the same. You think of anything else, please, drop us a line!
Re:What about a backup copy? (Score:1)
This just in... (Score:5, Funny)
Re:This just in... (Score:3, Funny)
Merger with the CIA? (Score:2)
(It's a joke.)
Are they requiring publishers to submit PDF files? (Score:5, Interesting)
I can see that some publishers may just say, "oh, my book isn't gonna be in libraries if I don't submit PDF, so much the better, I'll sell more copies". I hope these fellas realize how badly they're shooting themselves in the foot.
Or even the sources. (Score:2)
It would certainly be smarter than scanning them in themselves, or demanding extra work on the publishers part to to conv
That's the beauty of PDF (Score:2)
DRM? (Score:2)
I'd say... (Score:1)
it might be cool if... (Score:1)
Re:How many.... (Score:1, Troll)
Re:How many.... (Score:1, Troll)
Re:How many.... (Score:2)
Re:it has to be said... (Score:2)
Re:it has to be said... (Score:2, Funny)
Re:it has to be said... (Score:2)
Re:it has to be said... (Score:1)
Re:Riiiight.. (Score:1)
Have a nice day, and good karma to you!