Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×

Archiving Digital Data an Unsolved Problem 405

mattnyc99 writes, "It's a huge challenge: how to store digital files so future generations can access them, from engineering plans to family photos. The documents of our time are being recorded as bits and bytes with no guarantee of readability down the line. And as technologies change, we may find our files frozen in forgotten formats. Popular Mechanics asks: Will an entire era of human history be lost?" From the article: "[US national archivist] Thibodeau hopes to develop a system that preserves any type of document — created on any application and any computing platform, and delivered on any digital media — for as long as the United States remains a republic. Complicating matters further, the archive needs to be searchable. When Thibodeau told the head of a government research lab about his mission, the man replied, 'Your problem is so big, it's probably stupid to try and solve it.'"
This discussion has been archived. No new comments can be posted.

Archiving Digital Data an Unsolved Problem

Comments Filter:
  • by UbuntuDupe ( 970646 ) * on Monday November 20, 2006 @05:56PM (#16921476) Journal
    I can't wait to hear Microsoft's explanation why the project should use one of their proprietary formats.
  • by Electrode ( 255874 ) on Monday November 20, 2006 @05:56PM (#16921482) Homepage
    "for as long as the United States remains a republic."

    So, they're shooting for about 10 years then?

    • Re: (Score:3, Funny)

      by MECC ( 8478 ) *
      "for as long as the United States remains a republic."

      So, they're shooting for about 10 years then?

      10 years or the next presidential election - whichever comes first

    • Re:Not too long... (Score:5, Interesting)

      by eln ( 21727 ) on Monday November 20, 2006 @06:01PM (#16921580)
      Your timeline may be a little off (at least I hope so), but you're right that it's a silly goal. Whether the US has 10 or 1000 years left, history shows us it will most likely fall at some point, and that point will be fairly soon when compared to the entirety of human history.

      Making a format that will survive a thousand years so long as our advanced civilization is still around and still cares is pointless, because as long as there is a continuous line of people that care, they will be willing to transfer at least the more important stuff to new media. The trick is coming up with something that will still be readable when archaeologists dig it up 10, 50, or 100 thousand years from now.
      • Re: (Score:3, Insightful)

        by Anonymous Coward
        I've been wondering, with our global nature now, will we need archeologists in the future? While I believe cililiziations will surely 'collapse', won't we all be around to immediately take note of it, and update Wikepedia? Seriously, I don't think we're going to be digging for stuff from this time, the global nature of our society leads me to that conclusion. It's not like when Greek society fell.
        • Re:Not too long... (Score:4, Interesting)

          by FooAtWFU ( 699187 ) on Monday November 20, 2006 @07:29PM (#16922774) Homepage
          I've been wondering, with our global nature now, will we need archeologists in the future? While I believe cililiziations will surely 'collapse', won't we all be around to immediately take note of it, and update Wikepedia?

          Archaeology is the search for fact. Not truth. If it's truth you're interested in, Doctor Tyree's Philosophy class is right down the hall. So forget any ideas you've got about lost cities, exotic travel, and digging up the world. We do not follow maps to buried treasure, and 'X' never, ever marks the spot. Seventy percent of all archaeology is done in the library. Research. Reading.

          -- Indiana Jones and the Last Crusade
      • Re:Not too long... (Score:5, Insightful)

        by thelost ( 808451 ) on Monday November 20, 2006 @06:19PM (#16921840) Journal
        the trick is... hoping that in a hundred thousand years people still care at all about their past. The slow realization as I read Isaac Asimov's Foundation saga about the origins of the Galactic Empire chilled me, mostly because the people of the empire had become so numb to their past as to have made it vanish entirely.
        • Re: (Score:3, Insightful)

          by nine-times ( 778537 )

          As much as anything, it seems like we might worry about people rewriting the past. It'd be hard to edit part of one of the original copies of the US Constitution without anyone being able to tell the difference, because we actually have a really old piece of paper that someone would have to get access to, somehow erase some ink, and write over top with identical ink.

          But a historical document in the form of a text file on someone's hard drive? That can be edited without a trace.

        • Forgotten (Score:3, Insightful)

          by Colin Smith ( 2679 )
          Look.

          In 100 years, you will be forgotten.
          In 1000 years, your country will be forgotten.
          In 10000 years, your civilisation will be forgotten.
          In 100000 years, your species will be forgotten.

          One thing you can absolutely count on is that you and everything you find familiar will be lost and forgotten. Nothing that you accomplish, no matter how famous, infamous or worthy will be remembered in 10,000 years.

          There is only one contribution you can make which will have any lasting effect at all, and I'll let you work
      • by dbIII ( 701233 )

        Whether the US has 10 or 1000 years left, history shows us it will most likely fall at some point

        Rome fell a very long time after it stopped being a republic. We've already had our equivalent of pirates burning Ostia and leaders trying to be king afterwards. I think George has blown his shot at being George III and things will stay as a republic - barring unlikely actions by uncontrolled intelligence agencies going rogue if a future leader tries to reign them in.

      • Re: (Score:3, Interesting)

        No one seriously working in digital preservation is trying to make a single thing that will last for 50, 100 or 1000 years. The point is not to preserve information in the event of a total civilization collapse, to make it easier for future archaologists, or some such scenario. The point is to keep our historical digital records *currently* readable at any given point in time. If our civilization collapses, it will be up to those who come after to figure out what we were up to.

        There are two basic strate
  • by zappepcs ( 820751 ) on Monday November 20, 2006 @05:59PM (#16921550) Journal
    than the previous ages where all information was kept on paper or in spoken words? The problem isn't so much how to invent something that will always be readable, but some way to always have the applications to read it. If it were not for the Rosetta Stone, much of what we know about the ancient world might still be a mystery.

    • by quanticle ( 843097 ) on Monday November 20, 2006 @06:11PM (#16921736) Homepage

      Its different because of the sheer volume of information being created today. Ancient cultures were not creating millions of pages of information every day.

      Your Rosetta Stone analogy is inappropriate. We have not discovered any sort of Rosetta Stone for the ancient Maya hieroglyphs but we have had success in deciphering them because we can apply linguistic analysis techniques to figure out what words correspond to what actions/things. Its a little more complicated for abstract concepts, but you can figure out a surprising amount from basic language knowledge.

      • by ThosLives ( 686517 ) on Monday November 20, 2006 @06:27PM (#16921964) Journal

        It's not so much the Rosetta stone, but the fact that a "Rosetta stone" has a built-in context - it's obviously communication or artwork of some kind. If you have a big pile of digital data, what is it? An image? Compressed text? Audio? Just a sequence of numbers? The thing "printed" information gives you is that the presentation of the data gives you an idea of what it is - we don't yet have any digital data formats for which the presentation of the data gives an idea of the content; in fact, most digital storage mechanisms present all types of information in identical manner.

        That's the real challenge - devising a digital storage format in which presentation can be used to apply context to the data.

        • by Tim Browse ( 9263 ) on Monday November 20, 2006 @08:29PM (#16923490)

          If you have a big pile of digital data, what is it? An image? Compressed text? Audio? Just a sequence of numbers?

          That's what MIME types are for. Duh.

        • Re: (Score:3, Insightful)

          by smoker2 ( 750216 )
          Rubbish.

          If you found a bunch of punch cards then what would you make of them ?

          They are obviously some kind of communication, because they have no artistic value. Whether they are designed to communicate with humans or a Jaccard loom is not the point. They convey information. Same goes for digital. Once someone discovers discrete patterns of ones and zeros, then the intention can be deduced. If the repeating pattern is "knit one pearl two" then you're probably reading a knitting pattern. If it says "Four sco

          • by toddestan ( 632714 ) on Monday November 20, 2006 @09:49PM (#16924164)
            You're assuming far too much. Remember, there are entire written langauges from 2000+ years ago that we barely know how to read. And we have the context of what they were written on, formatting, what the characters look like and things like that. Now, in 2000 years, if someone came upon your harddrive, or flash memory card, or whatever - assuming they could even read it, they aren't going to be able to pop it into a computer and see c:\My Music\ and C:\Documents and Settings\, and the only challenge left is to figure out what the hell an OGG file is. They aren't going to see files. They are going to see 1's and 0's. Lots of them - billions on a memory card and trillions on a harddrive. They won't have a clue know how to interpet the file system, even for something relatively simple like FAT16. They may not even know that a byte is 8 bits. They won't have context, they will be baffled by the fact that most every OS writes files in fragments all over the drive. They likely won't even be tell areas that were marked as deleted but not wiped from the actual data, let along figure out what the swap file is. I seriously doubt that someone in the future, given a working harddisk but nothing else to go on, would be able to pull anything meaningful from the drive. Heck, look at modern day examples - how long did it take Linux to be able to read and write to NTFS, given the number of very smart people working on it who already had a pretty good idea how it functioned?
            • by adrianmonk ( 890071 ) on Monday November 20, 2006 @10:51PM (#16924630)
              They aren't going to see files. They are going to see 1's and 0's. Lots of them - billions on a memory card and trillions on a harddrive. They won't have a clue know how to interpet the file system, even for something relatively simple like FAT16. They may not even know that a byte is 8 bits.

              They might not know that a byte is 8 bits, but with a little analysis, it shouldn't be hard to figure out. There are numerous statistical properties that can be exploited to figure this out relatively easily. For example, with most types of data, the higher-order bits (in any size byte) are more likely to be 0 than the lower-order bits are. Think about how booleans are stored in most systems. Think about the characters in this message: 100% of them have a zero high-order bit. To put it a little differently, there is more entropy in the lower-order bits.

              So, to figure out how many bits there are in a byte, you take your data, and for all reasonable sizes of bytes (say, from 4 bit bytes up to 36 bit bytes), you compute the function that maps bit position (low- or high-order) to an entropy value for that bit. Then you can tell by the shape of that curve which guess about bits per byte was the right guess. Heck, it should be such a strong trend that you can probably automate it!

              Remember that future civilizations will probably also use digital data as well, at least ones sophisticated enough to try to read the optical and magnetic media. They may not know the FAT32 filesystem, but they will have invented statistics and information theory, and they will be able to make some awfully good guesses at things. And yeah, it might take them 10 or 20 years to be able to read a FAT32 volume correctly if some poor college student of the distant future has to do it on a shoestring budget of grant money, but if they're reading 10,000 year old data, how much does that matter?

              • Re: (Score:3, Funny)

                That assumes they know which are the zeros and which are the ones. And that in turn assumes they know there are zeros and ones in the first place.
    • Re: (Score:3, Insightful)

      by nine-times ( 778537 )

      How is this different than the previous ages where all information was kept on paper or in spoken words?

      Paper actually holds up rather well as an archival medium. Plus, you don't need specialized technology to read it.

      • Re: (Score:3, Funny)

        by autophile ( 640621 )
        Paper actually holds up rather well as an archival medium. Plus, you don't need specialized technology to read it.

        I do. (Adjusts glasses)

        --Rob

      • Re: (Score:3, Insightful)

        (IANAA - I am not an archivist, but I do work in an archives library for a media organization)

        Paper can hold up - we have proof of that in centuries-old paper. But when you look at the percentage of paper that's survived the last few thousand years compared to a) the amount of paper produced and b) the amount of information lost, it's staggering.

        There's no one answer, but rather a set of keys that'll help. These include regular backups, widely adopted standards, multiple backup formats, important backups on
    • by s20451 ( 410424 ) on Monday November 20, 2006 @06:23PM (#16921918) Journal
      Say western civilization is disrupted for a period of time that is short by historical standards -- 40-50 years would be enough. Electrical power is only sporadically available, and as a result the Internet collapses and PCs become useless. With much more important issues to deal with, such as finding food, people ignore digital data storage.

      The era of restoration comes. However, when people blow the dust off those old DVDs and players, they discover that the DVDs have decayed to the point of unreadability. Massive quantities of archived data and knowledge are irretrievably lost.

      The main problem in our age is thermodynamics -- information is stored so densely that it tends to decay naturally, on its own. By contrast, ancient stone carvings (as well as their keys, such as the Rosetta stone), are sufficiently durable to last (basically) for ever.

      • The era of restoration comes. However, when people blow the dust off those old DVDs and players, they discover that the DVDs have decayed to the point of unreadability. Massive quantities of archived data and knowledge are irretrievably lost.

        There goes my copy of Just Like Heaven [imdb.com]! Oh the humanity!

      • by Dun Malg ( 230075 ) on Monday November 20, 2006 @08:52PM (#16923688) Homepage
        The main problem in our age is thermodynamics -- information is stored so densely that it tends to decay naturally, on its own. By contrast, ancient stone carvings (as well as their keys, such as the Rosetta stone), are sufficiently durable to last (basically) for ever.
        Of course, preserving the data is only half the battle. Figuring out what it says is the second part. This is, of course, nothing new. We still can't read Linear A [wikipedia.org]. In the case of the Rosetta Stone we were simply lucky to find something relating hieroglyphics to a language we knew. The Rosetta Stone is rather unusual. Normally we have nothing so convenient.
    • by Marxist Hacker 42 ( 638312 ) * <seebert42@gmail.com> on Monday November 20, 2006 @06:56PM (#16922354) Homepage Journal
      Now that's the right problem. What is needed isn't some mysterious Universal Translator Format- it's storing the read hardware, with programs in ROM that understand the format, along with the electronic copy. Hell, store the whole thing in ROM chips with a well documented interface printed on the outside of the chip. Libraries could be made up of whatever reading technology exists at the time the library is built- with this common pin-level interface.
      • Re: (Score:3, Insightful)

        by elronxenu ( 117773 )
        That's the kind of thinking which almost doomed the modern Domesday book. They thought that preserving the retrieval hardware was enough. Wrong! It required a massive restoration effort to get the material off the laserdiscs and onto the web.

        Ultimately, information only survives if it has been duplicated. The Domesday book laserdisc format wasn't easy to duplicate. It wasn't usable on home PCs, only on specially constructed reader machines in libraries. Consequently, it gathered dust in the "cathedral" it

  • by IWantMoreSpamPlease ( 571972 ) on Monday November 20, 2006 @05:59PM (#16921558) Homepage Journal
    Worked for the Egyptians didn't it?
    • by Shoeler ( 180797 ) *
      Isn't the solution to at least the format readability problem pretty simple? Print out schematics for a reading device on a format that will last the longest. Store said format with all media.

      Of course that doesn't fix the problem of archive stability. Tapes are supposed to be relatively long-lived compared, say to a simple CD-R, but haven't we all had one or many more fail on us?
    • No it didn't work for the Egyptians. The destruction of the Library at Alexandria is considered one of the greatest losses to our understanding of the ancient world. This library was considered to be the Library of Congress of its day and was totally destroyed in a fire. To be fair though, the Greeks had taken over by this time so it wasn't really hieroglyphics.
  • by csoto ( 220540 ) on Monday November 20, 2006 @06:00PM (#16921568)
    Working at a University, this is not a subject I'm not unfamiliar with. We've had lots of discussions about this. Everyone always talks about how many zillions of "pieces of information" are out there. The number of web pages in existence is always brandied about. My point in these discussions is that most of what's out there is crap. Humanity is not lessened by its loss. Good stuff gets reproduced, reviewed, studied, dissected, etc. and survives. It *is* stupid to try to solve this problem, because the problem doesn't need solving.
    • by failedlogic ( 627314 ) on Monday November 20, 2006 @06:07PM (#16921682)
      Things like music, TV shows, movies, literature, toys, magazines etc are all cultural products. For future generations we need to keep records of there items as much as family trees, great stories, buldings, etc.

      Besides, who's to decide what is 'crap' or not. It might be that to the untrained eye, a clay pot from Egypt might not look interesting. The color, shape, its condition, etc might tell someone who used it, why, what cultural value (symbology, usefullness, etc) the pot actually had. And culture evolves from culture. Keeping a record of everything we product allows future generations to inform themselves of who we were and what we did. Quality of the information itself is really unimportant.

      Only thing I'd have to add: I wish future generations all the luck in sorting through our garbage piles and recycling/salvaging what they can. If anything, this amount of waste - or crap - is a record of us as much as anything. I can agree with you on this point about crap in our culture!!! ;)
      • Re: (Score:3, Insightful)

        by Trespass ( 225077 )
        Yes, exactly. It's the ephemera that tells you what life was like in any given era, not the palaces, official monuments, etc.

        I'll wager you could reconstruct far more about the culture of early 21st century from the contents of a convenience store than that of the White House. There's a big gulf between who a people are and the mask they present to the world.
      • I wish future generations all the luck in sorting through our garbage piles and recycling/salvaging what they can. If anything, this amount of waste - or crap - is a record of us as much as anything.

        If this is the case, then archaeology will not have changed much. The most useful findings in archaeology are often those found in the waste piles ("middens") of the site.
    • by kfg ( 145172 ) on Monday November 20, 2006 @06:09PM (#16921722)
      Expanding copyright protection to a term equal to two lifetimes means that now even some of the good stuff is being lost because it is not allowed to preserve it.

      If preservation is outlawed, only outlaws will be preservationists.

      I believe Ray Bradbury had something to say on this subject.

      KFG
    • by inKubus ( 199753 )
      Not to mention whomever controls the library can decide what stays and what doesn't. Which means they can create any picture of history they want. Which, as we all know, means they can control the future.

    • Re: (Score:2, Insightful)

      by blaster151 ( 874280 )
      I doubt that a historian would see it your way. How many records, judged by their contemporaries as irrelevant, have helped historians piece together valuable perspectives about times past! Like the monks who deemed it appropriate to copy over Bach manuscripts, isn't there hubris when we declare with certainty what is and is not worth preserving? Perhaps we don't have enough perspective to reliably do that.
    • by pclminion ( 145572 ) on Monday November 20, 2006 @07:21PM (#16922660)

      Working at a University, this is not a subject I'm not unfamiliar with. We've had lots of discussions about this. Everyone always talks about how many zillions of "pieces of information" are out there. The number of web pages in existence is always brandied about.

      Where can I attend these meetings, where people speak in triple negatives and much brandy is available?

    • Re: (Score:3, Insightful)

      by wall0159 ( 881759 )
      I think you're wrong (and you use a double negative ;-)

      Most people are disinterested in history, hence there is no guarantee of a verbal knowledge continuum in the event of widespread hardware failure.
      We know that the hardware always eventually fails.
      We know that hardware always becomes obsolete.
      We know that civilisations always fall.
      We also know that these things have happened in the past, resulting in the loss of knowledge (in some cases it was because the language became extinct, and has never been decip
  • I've seen this very thing happen where I work -- we've lost data over the years because of incompatiblity issues. On the other hand, as with many things, it's a huge problem but not an insurmountable one. The key is in planning an anti-obsoloscence strategy into every IT decision. Store data files in open formats on robust media and put someone in charge of ensuring the archives are maintained and accessible.

    It's not easy, sure, but neither are many of the other tasks we take on as humans.
    • Funding (Score:3, Interesting)

      by Detritus ( 11846 )
      Don't forget funding. I've seen vast amounts of data disappear when nobody was willing to pay for its storage. This is common in large bureaucracies. You've spent years building and maintaining a library, and then it all ends up in a dumpster when the parent organization is eliminated.
  • by OfNoAccount ( 906368 ) on Monday November 20, 2006 @06:02PM (#16921598)
    Since I shoot RAW, I also burn a copy of dcraw.c [cybercom.net] onto every disc - so even if the current platforms get lost by the wayside, there will be code to convert them still.

    Storage itself? Currently burning onto Delkin Archival Gold [delkin.com], storing cool and dark, and in two physically distant locations.

    They're also stored on my harddisk, and the best are backed up onto a USB drive.

    If it looks like the DVD-ROM drive is becoming obsolete I'll burn them on to whatever comes along next.

    If you're truly paranoid you can always print them on archival quality paper using pigment based inks ;)
  • IBM will NEVER shoot that baby in the head, so there will be Notes databases around when my grandkids are long dead.
  • by Daniel_Staal ( 609844 ) <DStaal@usa.net> on Monday November 20, 2006 @06:03PM (#16921606)
    There are only two ways of doing this: keeping a copy of every program used to create these files (and a system to run them on) or converting them to some open and well-supported format.

    For text documents, HTML is probably the best bet. It is so widely used and supported readers are almost garunteed to exist as long as computers do in their current form. (And if something ever truely supersedes it, a mass-conversion program will be written anyway.) HTML probably works for basic spreadsheets too. Graphics support for GIF, JPEG, and PNG is probably at that level as well, and MP3 for music.

    As a bonus, most of the native programs for the documents to be preserved have translators to these formats already.

    Beyond that I have no idea.
    • Re: (Score:3, Interesting)

      Keeping 'a copy of every program' is tractable, 'and a system to run them on' however is not. Data (programs) can be easilly copied to new media and thus live forever (as long as people are around to order new media, install it and copy the data anyways but thats just a staffing problem). But hardware is not so easilly ported, that is unless you have an open, easy to port, emulator that will run your programs. Preferably this emulator should require very little say just a functional C compiler for future
      • Keeping 'a copy of every program' is tractable, 'and a system to run them on' however is not.

        Depends on how deep your pockets are. There's a warehouse in eastern PA that has a MicroVAX, a couple of VT240s and an extensive collection of TK50s holding scads of MOL files and pre-clinical trials data "just in case". Not sure if they'll revive everything and re-package it now that there are VAX emulators available, but if you've got data worth (potentially) several hundred million dollars, you'll go to extensi

      • If future systems are so wildly different from those we have today that they can't have a PNG viewer written for them, how easy will it be to write an emulator now that will run on such wildly different systems, yet faithfully emulate our existing environments?
      • I think the best solution currently available, is to include with each copy of your data (or on each backup volume) some source-code implementation of a document reader or parser, in a commonly understood and well-documented language, probably ANSI C (although Ada has all of its documentation in the public domain, so you could include it as well).

        This wouldn't help you if you expect people to lose the ability to read the media that you're storing the data and source code on, but that's a much more complicat
    • I think your general sentiment is worthwhile, but HTML for word processing documents, JPEGs for pictures, MP3s for audio? Geeze, lets at least be thinking more along the lines of ODF/PDF, PNG24, and FLAC.

      However, that doesn't really address the question of medium. It'd be nice to have some sort of nearly-indestructable medium to store all this.

      • by tomjen ( 839882 )
        Hmm - the problem with ODF/PDF is that it cannot be chanced by hand - however LaTeX source code can.

        As for music I agree Either FLAC or Wav depending on what you want.

        Media? Codac used to make some gold cds that they claimed lasted 12 times as long as the average cd. Other than that you should look at something like a good oldfasioned stone. They last a real long time.
      • ODF/PDF are probably about at the required level of use for this; I did consider them for a moment. A little harder to convert to, but they preserve more and are nearly as future-proof. I did mention PNG, though I thought about leaving it out: if MS's support doesn't improve it could eventually fade away as a format. (Another alternative would be TIFF, which is quite well supported.)

        FLAC... Isn't widely enough used, in my opinion. WAV or AIFF perhaps.

        Of course, these are all of the top of my head. Giv
  • Thats stupid. (Score:2, Insightful)

    by CDPatten ( 907182 )
    This isn't the 80's and almost any file being saved in Archives are in formats that many programs can open. Meaning that the specifications for those formats are known... regardless of whether or not it is legal. Even word files are viewable by a number of applications, and nobody is archiving historical information with advanced macros so don't even post with that macro crap.

    Also to assume that future generations won't have the sense or ability to figure out how to open files we write is silly.

    Because "
  • by susano_otter ( 123650 ) on Monday November 20, 2006 @06:05PM (#16921640) Homepage
    From TSA: "Popular Mechanics asks: Will an entire era of human history be lost?"

    Obviously not; Popular Mechanics itself has preserved much of the era in traditional hardcopy formats, making it no less lossy than previous printed-word eras.

    Of course, understanding the era from such incomplete and unreliable records will be a challenge to archaeologists and historians; again, not much different from previous eras.

    In conclusion: doesn't matter, hardly news.
  • by ThatsNotFunny ( 775189 ) on Monday November 20, 2006 @06:05PM (#16921644)
    When Thibodeau told the head of a government research lab about his mission, the man replied, 'Your problem is so big, it's probably stupid to try and solve it.'"


    I'd trust that guy. If there's one thing our governrment knows, it's stupidity.
  • Interestingly, This Slashdot article is shown to me with advertisement for HD-DVD, which has a data format "forgotten" by design.

  • The solution (Score:4, Interesting)

    by alexwcovington ( 855979 ) on Monday November 20, 2006 @06:07PM (#16921672) Journal
    In this era of virtualization, the solution for x86 software is as easy as retaining a copy of the primary partition of a computer originally used to work with the desired files. Searchability could be a problem for proprietary data formats, but the move to open standards in the future will mitigate that.

    The real problem is 60 years of archives of antiquated, proprietary, task-spcific and mainframe computer data cards and tapes whose original programmers are halfway to cedar boxes; if the government can't get their support in time it may as well call all the early stuff a loss and hand it over to archaeologists.
  • Comment removed based on user account deletion
  • "how to store digital files so future generations can access them"

    Quite simply, you don't store them in one format. Just move everything every 10 years or so. In fact, with Moore's Law and all, you will probably be able to store everything you had before in 1 of whatever is new 10 years later. Hire some part timers to move it or something. It's not a hard problem. It's just an inconvenient one.
  • by pclminion ( 145572 ) on Monday November 20, 2006 @06:09PM (#16921704)

    It really isn't a question WHETHER we will be able to read old digital data in the future. After all, humans invented these formats, flawed as they may be, and humans can decipher them with enough effort. We can crack cryptography -- a deliberate attempt to make it as difficult as possible to decipher certain information. So it's hard to imagine any data format that could not be deciphered in the future with some honest effort.

    Instead it is a question of whether the data is WORTH the effort. From an anthropological standpoint, this is valuable historical data, and its value is not decreased by our inability to interpret it. The benefit of digital data is that it can be copied even if we don't know what it means. It will not erode or decay like other historical artifacts, if we put in the small effort required to preserve it. Assuming humanity doesn't self-destruct, there will be plenty of time in the future for historians to decipher and interpret the data when a need arises for it.

  • There's an infinite amount of trivia that could be recorded. We could all go around recording "my life in HDTV" recorded at 900GB/hr uncompressed, but it just wouldn't be meaningful. Sure, a certain sample of "everyday life in $foo" is useful, but on the whole who cares. And with digital media, this should be simpler than ever since you with proper redundancy should never experience data loss. Obscure image format? Find a decoder, store is as PNG. Yes, it'll be a lot bigger but you'll never have to worry ab
  • Stuff I can't read (Score:3, Interesting)

    by Animats ( 122034 ) on Monday November 20, 2006 @06:34PM (#16922056) Homepage
    Media I actually have useful data on:
    • MacOS floppies. (Maybe on an older Mac.)
    • MacOS-only CD-ROMs. (Could be read on a Mac, if I still had one.)
    • 4mm DAT-II tapes from NT systems compressed with HP's hardware compression. (I still have a drive for this.)
    • 1600BPI 9 track open reel magnetic tape, UNIX TAR format. (I managed to get that copied before the last 9 track drives at Stanford died.)
    • 8" floppies for the IBM Series/1 minicomputer controller for the IBM RS-1 industrial robot. (Not really very useful at this point, but it would be nice to look at that work again.)
    • IBM PC/AT 5.25" high-density floppies in compressed Fastback backup format for DOS. (Years of DOS work, now obsolete)
    • 8" floppies for the Marinchip 9900 (A small theorem prover, in Pascal)
    • UNIVAC UNISERVO steel tape, 8 tracks, 200bpi, written on an UNIVAC UNISERVO IIA on a UNIVAC 1107. (A compiler I wrote as an undergraduate, plus some very early 3D graphics software.)
  • UK/BBC Domesday book (Score:3, Interesting)

    by bLanark ( 123342 ) * on Monday November 20, 2006 @06:36PM (#16922106)
    It happened recently. When I was a lad, the BBC and UK schools composed a "domesday book", which was supposed to be a parallel to the original Domesday book [wikipedia.org], which was a bit more than a cencus from the UK made in 1086.The modern one used the popular home PC the BBC Micro (made by Acorn). It was made on laserdisk, and distributed around the UK to the schools that had compiled the information.

    Well, 15 years on, it was useless. The then-proprietary format was not readable on anything modern, and there was not much of the old hardware around either. You can google for it ("UK domesday bbc data" should do it), the first link I saw was on the Guardian Online [guardian.co.uk].

    I've still got stuff on floppies, but no-one builds PCs with them anymore. I've got two old laptops with floppy drives, the other three computers have none. (OK, I also have two corpses with floppy drives, and the controllers on two of the new PCs will accept floppy drives, but, please take my point - they're going out of fashion.)

    In 20 years time, there will probably be no CD/DVD drives, we'll all be using a new more portable, more backupable, lighter, faster, probably online-only storage medium. Kids won't recognize laserdisks, floppies, or USB ports. They might not recognise keyboards either - who knows?
  • by wkitchen ( 581276 ) on Monday November 20, 2006 @06:45PM (#16922228)
    Open and widely published formats are good, of course. But if you're looking for a really long term solution (as in multiple millennia), then I think the prime requirement other than physical durability should be easy reverse engineering. This way the data has some hope of recovery even if the knowlege of the format has been lost. This generally means that simpler is better. Things like plain ascii text. Uncompressed and unencrypted image and/or audio data. Verbose ascii based vector graphics. Things like that. Put it all on a durable, low density, and simply formatted media that will easily give up its secrets to relatively low-tech and completely non-specialized tools like a microscope. It's not the most efficient way to store data, but it's much more likely to be useable by future archaeologists than things like MS-Word files, WMA files, JPG's, MP3's, etc.
  • by kosmosik ( 654958 ) <kos@ko[ ]sik.net ['smo' in gap]> on Monday November 20, 2006 @06:47PM (#16922244) Homepage
    Backups are for wimps. Real men upload their data to an FTP site and have everyone else mirror it. -- Linus Torvalds
  • This is one of the reasons open standards are important [sytes.net]. Not that open formats last forever, but at least they are documented, which means there's some hope of deciphering them after the software that does so is no longer maintained. Of course, that doesn't solve the problem of how to make the actual data survive...hard disks and tapes demagnetize, optical disks become translucent or otherwise unreadable, etc.
  • "Will an entire era of human history be lost?"

    How ironic that in an age where we have the highest capability to preserve our history, it can become obsolete in a matter of decades. Take the 5 1/4" floppy disc. Assuming that the disk didn't loose it's magnetically bound data, I would be hard pressed in 2006 to find a drive that couuld read it. I don't even have a 3 1/2" drive anymore.

    Another example. My father has a magnetic real from the 30's with a radio recording of my great grandfather. We have

  • "for as long as the United States remains a republic."--So, what does he want to do with data created after 1/21/2001?
  • It's called "Google"
  • Thibodeau and the rest of the people at NARA have been thinking about this problem for awhile, as have other researchers around the world. If you're interested in such things, there are a few places to start looking.

    CAMiLEON http://www.si.umich.edu/CAMILEON/ [umich.edu]
    Cedars http://www.leeds.ac.uk/cedars/ [leeds.ac.uk]
    InterPARES http://www.interpares.org/ [interpares.org]
    DSPace http://www.dspace.org/ [dspace.org]

    Lockheed Martin won the NARA contract to develop the Electronic Records Archives.
    http://www.archives.gov/era/acquisition/option-awa rd.html [archives.gov]

    After hearin
  • Cool, someone got it right for a change.
  • by dghcasp ( 459766 ) on Monday November 20, 2006 @07:18PM (#16922630)

    I'm doing my part by working on a project where I'm copying every single MySpace page onto stone tablets.

    When future archeologists dig them up and see "LOL Bobby Ray Sucks!" and "D00d 1 pwnz3r3d U!!1!", they'll understand that our civilization didn't just decline; our only choice was to destroy ourselves because we were so lame.

  • by Panaqqa ( 927615 ) on Monday November 20, 2006 @08:01PM (#16923200) Homepage
    Unless I miss my guess, Google will continue towards its stated objective of making all the world's information searchable and retrievable. Want something archived, Google will take care of it. And if Google fails, my suspicion is the entity that takes their place will take it on.

  • by Ralph Spoilsport ( 673134 ) on Monday November 20, 2006 @08:22PM (#16923430) Journal
    The average slashdot user, as a fan of digital technology, will weep and moan, accuse me of being Flamebait or a Troll, but the fact is, YES, it will disappear. Our digital technology is completely predicated on a vast and complex array of technologies - some super advanced like lasers, others more prosaic such as mining rare and precious metals as well as petroleum out of the ground.

    The question isn't IF it will disappear, the question is really WHEN and HOW. Printing to paper-based hardcopy helps for a few hundred years. It can be recopied from paper to paper easily - it's a very low context solution: ink on paper followed by ink on paper. So, important information about our society can be transferred across generations, even if the generations have no electricity at all. This is how we know Shakespeare, for instance.

    Many people say "Oh, but we'll have some NEW technology that will take care of it". This assumes that the resource base for a new technology will be as generous and dense as our present resource base provides. This is a VERY unwise presumption, as there is categorically no proof that such will be the case. In fact, there are a variety of intense warning signs that suggest quite the contrary. [msn.com]

    From the evidence I have found, and, oddly, I've studied this for a number of years now, I am fairly well convinced that industrial civilisation will simply erase itself from the human record as little more than a horribly polluted stain that destroyed itself through overpopulation and environmental stupidity. All the music you hear, all the shows you watch, all the films you cried at, it will all go away. Poof. This also means that self-absorbed hucksters like Madonna, Britney Spears, Michael Jackson, Tom Cruise, and their supporting technology of TV, Radio, DVD/CD, etc will also disappear - just the flotsam of "entertainment" culture.

    The long term future will be people chasing bison/cows across the prairie or living in small agrarian villages bound by localised population bursts and die-offs. But it will take several centuries to get their. In the meantime we've got our MTV and Orange Crush. The most important thing to remember is this: not getting to that Star Trek future IS NOT A BAD THING. We pissed away the globe's resources on our Xbox's, SUVs, jetset vacationlands, and all the other minutae and ephemera that makes a society "civilised" and provides "leisure activity". All societies have that, to varying degrees. We just had more of it, thanks to our insane and unrelenting exploitation of resources, petroleum, and electrical generation. But it will all go away, and THAT'S OK.

    We will disappear. We Are Atlantis.

    RS

  • Long Time Gone (Score:4, Insightful)

    by Doc Ruby ( 173196 ) on Monday November 20, 2006 @08:24PM (#16923444) Homepage Journal
    "
    The documents of our time are being recorded as bits and bytes with no guarantee of readability down the line. And as technologies change, we may find our files frozen in forgotten formats. Popular Mechanics asks: Will an entire era of human history be lost?"


    I ask: has this ever happened before?

    Not necessarily in electronic bits and bytes. Not the "Alexandria Library" that was mostly duplicated in other libraries or private collections. Maybe like the Inca quipu, mats of knotted strings that recorded all their empire's operational records, other than the ceremonial records in statues and murals. But some quipu survive, despite Spaniards destroying most of them in the mid-1500s. Enough that we can at least recognize that they did have records of lots of transactions.

    No, something more transient, as transient as our bits, read/written by something more transient than our metal/plastic/glass machines. Maybe songs or other performed stories, like tribal Australians. Maybe woven in more degradable material, like uncured plant matter. Maybe both, like the Pacific star navigation lore taught in temporary woven stics, but carried in the mind. Maybe patterns in some other loseable medium, like animal pelt patterns no longer readable now that the code has been lost, or interbred back into "blankness".

    If it can happen to us, it could have happened before. Our civilization rose from meager beginnings only about 12K years ago, after the last Ice Age that lasted about 12Ky. There was another one before that, with people accumulating knowledge between. And probably a half-dozen or so others since we became as genetically developed as we are today, between 7Mya and 200Kya. We don't even have many records from the first half of the last 12Ky. Could we be reinventing the wheel, literally, every 25 thousand years?
  • by robson ( 60067 ) on Monday November 20, 2006 @09:57PM (#16924222)
    As a game developer, it's profoundly disturbing how casually we treat games just a few years old. Hardware will continue to evolve and OSes will change; we really need a way to secure our ability to play old games.

    Console games are semi-okay because you can at least keep the (static) hardware around, but PC games are in bad shape. PCs evolve gradually, and it only takes one small OS or video driver change to render a game unplayable. Because games are a commercial medium, games simply aren't supported once it's no longer financially beneficial.

    As long as there are programmers out there willing to write emulators, I suppose we're okay... but it still makes me nervous.
  • by FridayBob ( 619244 ) on Monday November 20, 2006 @10:19PM (#16924384)
    ... just print all the ones and zeros out on paper, so that later on others
    can just read it all back in again with OCR! Oh, I know we could use
    punch cards instead, but we don't want our kids to laugh at us, do we?
    Besides, if we print the ones and zeros real small, we can achieve higher
    data densities.
  • by frdmfghtr ( 603968 ) on Tuesday November 21, 2006 @12:30AM (#16925264)
    This reminds me of the study done for the Waste Isolation Pilot Plant (http://downlode.org/Etext/wipp/#executivesummary) . The study looked at how to mark the site in such a way that the purpose of the site would be indicated for 10,000 years.

    While the WIPP site won't have the benefit of constant updating of the media (it's designed to be survive on its own for 10,000 years) it does address some of the same points; longevity of the media, a format that will be usable into the future, and ability of future civilizations to understand the message.

    Off-topic perhaps but an interesting read.
  • Pre-IBM Compatible (Score:3, Insightful)

    by StarWreck ( 695075 ) on Tuesday November 21, 2006 @09:54AM (#16929680) Homepage Journal
    To most people, any of the files they used on computers before their first "IBM Compatible" is probably lost forever already. Think of how many files are "frozen" on 5.25" floppy disk for the Commodore 64 alone!

    That dosen't have to be the case though, you can retrieve files from disks of hundreds of different 80's era computers on a modern PC using a Catweasel card. http://www.vesalia.de/e_catweaselmk4.htm [vesalia.de]

    With the catweasel, a standard 5.25" PC floppy disk drive (hello, ebay), and a 3.5" PC floppy disk drive there's hardly a floppy disk you won't be able to retrieve your petrified files from.

    Finding a program that can do anything with those files is another subject entirely.

A morsel of genuine history is a thing so rare as to be always valuable. -- Thomas Jefferson

Working...