Slashdot Log In
On Data Obsolescence and Media Decay
Posted by
Cliff
on Mon Jan 31, 2000 06:35 AM
from the data-that-is-still-readable-20-yrs-later dept.
from the data-that-is-still-readable-20-yrs-later dept.
mouthbeef asks: "What's the future of storage media? With CDs and tapes prone to relatively speedy decay, and hard-drives an entropic nightmare of moving parts, how
will we keep our data safe over the long haul? I just got some e-mail from a writer pal who isn't really technologically sophisticated, alarmed because someone told him that his backup CDs would decay and rot in 20 years. He's an sf writer, and he was thinking "big picture:" a coming infopocalypse in which sysadmins devote their every waking moment to re-archiving their old backup data." Is such a scenario likely? Why or why not? (More)
"I wrote back that I didn't think that would happen, because:
- Every time I buy a computer, it's got more storage on-board than all the computers I've owned until then, and I just migrate all the data files I've ever created or saved to the new box, like a hermit-crab changing shells
- With broadband becoming more real and more cheap, it makes sense that in the long run we'll store most (if not all) of our data on remote servers -- encrypted, of course -- that are managed by trained pros with access to mirror drives, climate-controlled vaults, etc. etc.
- Even if this doesn't happen, most of your data files will be in stupid, proprietary formats like Word 3.0 that won't be openable, anyway
How reasonable does this seem to you folks? What do you do with data that you need to preserve for the ages? "
This discussion has been archived.
No new comments can be posted.
On Data Obsalescence and Media Decay
|
Log In/Create an Account
| Top
| 382 comments
(Spill at 50!) | Index Only
| Search Discussion
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Man - stone tablets are the way to go! (Score:3)
Only garuanteed storage mechanism! Good for thousands of years.
Capacity: 2Kb/tablet
I/O: 1byte/hr
Media cost: £50/tablet
Error rate*: 1 per 100bytes
Note: Error rate assumes fully qualified and certified stone mason.
Re:Just curious, but.... (Score:3)
Don't buy cheap CD-R's (Score:3)
By saying that I'm refering to how I bought my first CD-R about three years ago, and of the 20 or so Maxell disks that I've archived data onto, only one is still readable by any CD-ROM/CD-R that I insert it into. By contrast, every one of the verbatim disks that I've burned, which were stored in exactly the same environment as the Maxell's are fully-readable, and I haven't had any problems with them.
I've also used a few Sony and Memorex disks with which I haven't had any problems (that I'm aware of) but I have found my verbatum disks to be incredibly durable. I burned 20 or so Audio CD's onto verbatim disks two years ago before leaving on a cross-country road trip, and despite vast changes of heat and cold, as well as being literally tossed around my car, every one of those CD's is also still working.
Again, this is just my personal experience, but whenever I see someone picking up a spindle of 50 or so no-name brand disks at a local computer store, I have to wonder how important the data they're putting on there must be...
--Cycon
constant migration and documented formats (Score:3)
As others have pointed out, the exponential increase in storage capacity makes it relatively easy to "keep buying more disk" and migrating your data all the time. Certainly the convenience of having everything online is nice, too. And everything on line should have periodic backups happening. I've managed to do this for the past decade with my data, but I've lost the eight or so years before that, and I miss some it.
But there's logical as well as physical bitrot. The media itself deteriorates, making it hard to get the information back, but understanding what that bitstream represents after a few years can be a real problem. If you've got binary word processor files from an Apple2 or C64, you'll probably not be able the read them unless you also have the binary and can get it running in an emulator. Given the amazing progress that's been made in the last 150 years deciphering the records of dead civilizations, I wouldn't say that reading your MS Word 5 documents will be impossible in twenty years, but it might not be worth the effort. Open standards and open source really help alot with this issue. If you can find a document describing the file format, you're saved. And the same applies to hardware formats. Also, it's much easier to keep open source software alive--essentially carrying the 'make a copy on the new system' over to executables.
I'd say the solution is pretty much that simple: keep track of your data, plan to make a complete copy every 5-10 years, and choose formats and that are publicly documented and that (you hope) will be easy for future software to support.
Re:The ultimate backup (Score:3)
Many of these texts are not yet broadly available in digital form and are not important or interesting enough for enough people to be kept handy. Try looking for some older book by a not so famous author. Even encyclopaedic works are worked over for each new edition and older bits of information have to make place for newer ones.
With historical facts it's even worse, in most cases there's at least two versions of one event and who was in the right is mostly determined by who survived. Just have a look how warfare now concentrates on media control or try to imagine the twisted version of history if the nazis had won WWII, even now there are some denying the existence of the holocaust.
I think all this information is well worth keeping, and since it's difficult to see today what later generations might find worthy the 'evolutionary' approach (if i/we don't want to keep it later generations won't want it either) doesn't work. And it doesn't suffice to just keep this information somewhere, it has to be kept in an accessible form, on media readable with modern equipment (who will go through the trouble reading an old magnet tape) and indexed (if you have 1GB of unsorted texts/textfragments on a harddisk are you ever going to wade throgh that to get that piece of information presently of interest?)
Re:Most of the data becomes useless (Score:3)
I disagree almost entirely.
Very little of the data volume becomes useless, because we don't know what "useless" will be to the readers in the future. Contemporary archaeologists spend much useful time sifting the contents of rubbish pits and latrines - if that turns out to bhe interesting, how can we ever say that data won't be. Maybe your schoolwork is dull and uninteresting to you, but how about an educational historian in a century or so ? Wouldn't you like to know how teaching was carried out in the past ?
Also the majority (by volume) of data will always automatically generated sensor data (humans can't type fast to keep up), and that tends not to become useless with time. NASA have already lost interesting telemetry data.
Authors have definitely lost early book drafts because modern WPs don't open old WP formats. Word 1.0 isn't old ! that's not even a decade ago. What about stuff from the '70s on hardware formats that no longer have players ? CP/M WP formats used by some of the first great novelists to work digitally ? (mind you, losing the whole of Pournelle is fine by me). Personally I'd find it very hard to read my own degree work, and I'd probably have to do it by scanning in the paper copies
Solutions ? I'm not a hardware guy, so I can only talk about the soft data side of it. I think XML (and similar) has a big part to play here. Let's stop thinking of data formats subjectively as "the data format that belongs to SprongWriter 4.2a" and instead work with formats that have objective definitions that extend beyond the client app of the day. Why should I need a copy of that particular WP to open the data, if the data is already in a format that's inherently accessible. We already have the technical skills and tools for this, I call on all developers to make use of them and to stop writing these proprietary data oubliettes.
Book Recommendation: The Clock of the Long Now Stewart Brand Why this sort of thing matters, and what a few people are trying to do about it. Best book I've read this year.
PS - SciAm also had a piece on digital data loss, a year or so back.
Some data points (Score:3)
This is not a new problem. People have been dealing with the question of recovering data from old media for years. As a first data point, a number of years ago, about 5 IIRC, some people finally decided that some old music tapes had to be rescued.
The method used was to find this old RCA gentleman how had retired more than a few years before then. They then went to the Smithsonian and got the last remaining version of the tape recording/play back device that had been used to make the original master tapes. The RCA guy used the specs and his knowledge to tune the tape deck to perfection. They then put a high quality amp and spliter down stream of the tape deck to feed 2 digital tape decks (The professional version, not DAT, more bits and a bit faster sampling rate) and a couple of analog tape decks as well.
After testing, they carefully placed one of the Master tapes on the deck, started all the recorders and press "play". As the Master tape played it just came apart. They had to keep the heads clean but this was a one time, one chance thing. They succeeded.
From the recordings they made some wonderful CDs. Amazingly enough, the Master tape had almost no "hiss" in it.
Data point two. MIT I believe it was, decided to move some of their older theises to CDROM for easier online access. The first thing they noticed is that many of the data tapes they had stored things on were 7 track tapes, and of course they had no 7 track tape drives any more. Again people went to the museums got out a 7 track drive, spent the time to fix it and make it work, then built an interface box to connect it all up and away they went.
3rd data point. Somebody sent out to a mailing list that they were looking for some old code to run on a mulator for a PDP11(?). We ended up going into our machine room and found some old release tapes. This included a copy of BRLUNIX (Based on a BSD release) and I think, an AT&T Sixth addition. These were 9 track reel to reel tapes. We went into the machine room, powered up the tape drive, copied the tapes verbatium to disk. We set it up to do the least amount of reading. These tapes were around 15 or 20 years old.
Because of this rescue which happened late last year, we saved the tape drive when the machine was tossed due to "Inability to prove Y2K compliance". So the tape drive still sits on the machineroom floor. The operators turn it on and clean it once a week. But it isn't currently hooked up to anything, but we expect it to be hooked up to something again in the next year or two. Just to be able to read all those old tapes we still have.
At home I use EXABYTE-8200s for my back ups. I have 3 drives and you can still get them referbished. While each tape only holds 2GB (Compared to a max of 150MB for a 9 track tape). The media is small and low cost. The exabyte encoding also has a great deal of redundency in it making it an exclent choice for long term storage.
At work they do much of their backups EXABYTE 8500s. For the Crays, they use to use IBM 3480 tape cartrages, when they changed tape formats, they spent a few weeks moving all the data from the older format to the new format.
Of course our most reliable storage medium to date has been our paper tape and punch cards. While they maybe low density and sometimes we've had to make readers for them (Auto feed to a flat bed scanner which scanned the card. Process the card for holes and voloa).
CDROMs have the problem of decaying do to light contamination. If you want to keep them for years and years and years, they have to be kept out of sunlight. And because our long term, low cost, storage methods keeps dropping in cost and increasing in size, I suspect that what we will find in 3 years is that everybody is carefully copying all their data from CDROM to DVDs which will have a twenty year life span.
The basic rules on saving your data for the long term are:
Chris
Data availailability (Score:3)
Apart from with MAME, nobody is making any effort to archive old computer games. The BBC managed to destroy a lot of valuable origional video tapes (Apparently they taped over their copy of the moon landings). These show that data is kept around much longer if copying is encouraged rather than discouraged.
Re:Snake Oil (Score:4)
The filter of decay has served mankind well? How illogical, when you have no idea of what has been destroyed how do you know mankind has been served well? Was mankind well served by the destruction of the Library of Alexandria, the Aztec library destroyed by the Spanish, the historical libraries destroyed by the Serbs in the Balkans?
Sure CDs may last 100 years (we really don't know) but it is unlikely they will be able to be read by anything. Paper is still the most stable format available (although it is impractical for many reasons to transfer digital data to paper as some of my colleagues are prone to doing) and there are many vast libraries of data open to the public. We had well over 40,000 researchers use our library last year and less than 1 percent were scholars.
My profession is wrestling with two technology related questions.
1. How to make paper collections accessible electronically. For example the papers of ONE congressman (approx. 400k documents)took 5 years and nearly 3 million dollars to digitize. We have one collection which has 32M documents. Sure digital copies are cheap - IF the original was electronic and in a form easily translated.
2. How to preserve much of the information which currently only exists in electronic form, be it governmental databases, personal computer files or web pages. We did an interesting experiment a couple of years ago when we captured about six dozen web sites which documented the devestating Red River flood in Minnesota, North Dakota and Canada. Most of these sites existed on the internet for only 2-3 months and were disappearing as we captured them. I think it will be possible to study how the internet was used as a tool in response to catastrophe from the governmental level to local churches and organiqations. Of course current copyright law makes it illegal for us to post this database of websites on the internet but thats another issue.
Aging Newbie is correct in the assertion that only a small percentage of data need be preserved, yet I feel that conscious, reasoned choices about what should be saved serves mankind far better than the filter of decay. I also believe tha solution ultimately will involve a combination of strategies including electronic.
Skavvy(whose firewall apparently won't allow him to register)
THIS IS A PROBLEM NOW! (Score:4)
Snake Oil (Score:4)
However, I think he was mistaken. Ancient societies left stone tablets, cave paintings and the like behind, and there's no-one who fully understands the languages or the contexts (when an archaeologist says an object is of "ritual significance" he actually means he doesn't know what it's for). We do have the technology now, as the poster says, to migrate our data ever forwards into new storage, assuming no cataclysm occurs. And even if it does, it is far more important, in terms of recovering data, that the language (source code) survives, rather than CD ROM drives, Minidisc players etc (the binaries), because then data recovery is an essentially straightforward task.
I expect acid-free paper to survive long enough after an ecological catastrophe or, say, a meteor strike, to be useful to the survivors (better start moving the engineering textbooks down into the bunkers). And of course, Ship-It awards will outlast the end of time, not to mention non-biodegradeable shopping bags.
As a civilisation, if we wish to preserve a legacy, we currently posess the skills and technologies to do so - if we choose to.
Re:Snake Oil (Score:4)
Stored properly, writable CD's last 100 years or more while each holds well in excess of an encyclopedia. The problem of preservation is considerably simplified as compared to paper. By 100 years paper documents are of limited utility and only scholars can access them. With digital media, copies are simple and cheap so anyone could have a copy if they wanted.
I think the challenge of the future will be one of sorting the trash; i.e. selecting moon landing data from a mountain of memos, reports, and minutae surrounding it. But, that would seem to have been the problem since history began.
For all of our ego, I think we might have only a few times more real value to save for posterity than did our counterparts at the turn of the century or in the '50s. People seem comfortable with what we saved in the past - why not admit that we are really not that much more advanced and that the real value of our lives and era can be summarized on a few (or a few thousand) CD's a year. Not enough to cause an information apocalypse or anything but a shelf in a library...
CD lifespan (Score:4)
From what I've understood, the lifespan of a CD-R is around 20yr for those which are based on cyanine or AZO (and which appear blue or blue-green when you look at them) and around 100yr for those based on phtalocyanine (which appear golden to the eye).
Of course, it depends very much on the way you treat those CD. If you put one in a light-free, dust-free, safe deposit box, it can probably survive several kyr (uh, thousands of years) without damage.
The unfortunate thing, however, is that because the error correcting codes work so well, it is not always easy to tell that a CD has begun noticeably deteriorating until the data is actually unreadable, and then it is too late. It would be nice if the drives could return some sort of ``CD quality'' status.
I always write down (on paper) the md5 fingerprint of the raw ISO image when I burn a CD. In that way, I can be sure whether I have pristine data yet. (And if I make copies, I can be sure the copy is exactly identical to the original.)
This information is provided in the hope that it will be useful but WITHOUT ANY WARRANTY. Without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Shelf life of recorded CD-R longer than 20 years (Score:4)
Yamaha CD-R site [yamahayst.com]
Josh
The ultimate backup (Score:5)
The internet will always save your best work [google.com] and discard the junk [waldherr.org].
Re:Data availailability (Score:5)
Recovering the data from just a portion of the tapes requires substantial amounts of time and money due to the labor intensive nature of the task. Think of copying 20,000 LP records to CD-R disks.
With limited budgets, NASA and other scientific research agencies are often in the unhappy position of having huge amounts of potentially valuable data on rapidly deteriorating media, of which only a fraction can be saved. Unless someone invents a time machine, the data is irreplaceable.
For many years, magnetic tape has been the medium of choice for storing spacecraft data. Storing it on an on-line system, on disk, just wasn't practical or affordable. Huge amounts of data were archived on 7-track 1/2" digital computer tapes, the same kind of tapes that you see in cheesy science fiction movies from the 1960s. Try to find one of those tape drives today, or a computer that can talk to it.
acid free paper (Score:5)
I'm sorry to hear that. I've been fascinated by this phenomenon in our university library. Up until the 1930's somewhere, journals are pretty well preserved. Then they suddenly get awful as paper mills switched to new methods. Pages are yellowed and brittle. In the 1950's the error was discovered and pages become white again with the switch back to acid-free paper.
Let's hope we don't make the same mistake with digital media. And it could be worse: almost all the film from the first half of the century is lost to self-rot and enviromental damage. For all its faults, DVD is probably the best thing that's ever happened to film from a historical perspective.
Most of the data becomes useless (Score:5)
Modern word processing still opens really old file formats like Windows
Floppy disks are degrading rapidly, but most people's floppy collection can fit on a single CD-R. Then again, most people just don't care about their floppy collection, and will just let it die. The data contained on it isn't useful anymore.
Let's see about Audio CDs. They degrade over time (scratches) and possibly rot. I believe that what will happen is that we're going to convert them to some format like MP3. I'm fairly certain that MP3 capability will continue to be implemented in computer for a very long time.. And if it shows signs of getting phased out, then you might simply batch-convert everything to the new format. Or just rerip your Audio CDs that are sitting in storage, if you really care about the quality (since batch conversion will result in degradation, unless we find a way to actually enhance the audio quality... which might or might not happen...)
Movies. VHS tapes degrade... Probably, we'll be converting what we really want onto some kind of optical disk in the future. And the rest willl decay, and we won't care about it decaying. When the format (DVD-R perhaps ?) is being phased out, since it's in digital format, it should be possible quite easily to simply transfer our DVD-Rs to the higher capacity medium... Perhaps 10 discs on a single one... Saving a lot of space, and having the format live another 20 years. After all, how hard will it be to include MPEG-2 decompression in next generation video players ? The cost of an MPEG-2 decoding circuit probably won't be very high anymore.
The other possibility I see is that bandwith gets cheap enough so that we may consider remote storage vaults. That has a couple of privacy issues I'm certain you can see... But it's incredibly convenient and will probably be adopted by everyone if we just find a way to have a high speed switched pipe to everybody's home at a reasonable cost..
If we do indeed have high bandwith in every house, I see that the media companies might also get their acts together and start putting up their own gigantic media-archive. They could offer a monthly media-license that'd give you access to any music or movie you want. Or perhaps just make you pay for every access to the archive. Of course, such a thing.. I can think of so many ways it could go wrong. What if they decide to have only censored material on the archive ? What about independant artists ? Perhaps we'll just see a protocol to access and pay for access to media archives, and have a dozen appear. Let's say, DisnABCTimeAOL could have theirs, AndoTransmeVAMicrosoChryslerDaimler could have theirs...
This could be so horrible if not properly done - a lot of "non approved" content could suddenly become unavailaible if you killed the distribution channels except those media-archives... So. Is this just an incoherent rant ? Would you care to add any constructive comment to it ? Answers ? Questions ? Anything at all.
An old idea... but still a threat. (Score:5)
In many later books Lem refers to an informatic catastrophe: sometimes it is caused by a necro-virus, a product of a computer evolutions (the arm race was banned from Earth and transported to the Moon, where sophisticated computer systems worked automatically on weapon development. Each nation was allowed to get the weapons back on Earth, but that meant others could equally prepare; somehow, the automata on the Moon get out of control and start evolving, finally leading to a nanobot-virus thriving on silicon chips - therefore the title, "Peace on Earth"), sometimes by basic physical properties (in a humorous story "Prof. A. Donda" the title hero discovers a basic equality between energy, mass *and* information, and one of the consequences is that if information achieves a certain density it changes into matter, that - a new universe. God's word was counting from infinity to zero in an infinitely small time :-) ).
I admit - I was gestaltet by Lem's writing. Many of his ideas from sixties and seventies came to life in the nineties (e.g. virtual reality or sciences which deal only with information retrieval). I do believe that information storage is a problem - but not because the medium would not last forever, but because of the signal / noise ratio you have even in your personal files. As I look on the four Macs we work with in our lab, and the couple of Gigabytes of data, and then dozens of GB of backups, different versions, obsolate versions, alternate versions, gel pictures you have no idea where they came from and who needs them, and so on, and so on... Yes, there are better solutions than using a Macintosh in a multiuser environment, but that's not the point. I've been using Linux for years and have my personal data at home, and I seem to have a GB or so of data I'm to afraid to remove just in case. And there are so many alternatives of storage, backup, databases... and I'm just a simple biologist!
Returning to Lem - yes, I do believe we are approaching a critical point, like a bifurcation in a chaotic equation, and the word "chaotic" fits here in especially well. What happens next? He who cometh and giveth us a system (not OS, but an information retrieval system), he hath the power and our souls. Well, mine at least. Hope he doesn't come from Redmont, though.
Regards,
January