National Archive File Format Time Bomb 233

Posted by ScuttleMonkey on Wednesday July 04, 2007 @01:57PM from the cleaning-up-your-own-messes dept.

geordie_loz writes "The BBC is reporting that the UK National Archive is warning of old formats being a 'ticking time-bomb' where data is going to be lost because of incompatibility in newer versions of software, and software not existing at all. More surprisingly, Microsoft has offered a solution via the OOXML format."

This discussion has been archived. No new comments can be posted.

National Archive File Format Time Bomb

Load All Comments

Search 233 Comments Log In/Create an Account

Comments Filter:

Tagging beta... (Score:2, Insightful)

by Rufty ( 37223 ) writes:

ITSATRAP!
MS should not own the standard (Score:2)

by BroadbandBradley ( 237267 ) writes:

don't give in to MS on this one, some states in the US have already and it's no better than standard word format because it's owned by a private entity. use the Open Office format if you want to be sure that you won't get the rug pulled out from under you some years down the road.
- Re:MS should not own the standard (Score:5, Informative)
  
  by dvice_null ( 981029 ) writes: on Wednesday July 04, 2007 @02:20PM (#19745579)
  
  There is no such thing as Open Office format. Perhaps you mean OpenDocument Format, which is used by several different applications ( http://en.wikipedia.org/wiki/List_of_applications_ supporting_OpenDocument [wikipedia.org] ), including OpenOffice.org.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Informative)
    
    by esmrg ( 869061 ) writes:
    
    OpenOffice.org does have its own native format; "OpenOffice.org 1.0 Text", extension: .sxw. It was introduced with the original release, but no longer the default since the introduction of OpenDocument.
    While the GP may or may not have been exactly sure what they were referring to, it doesn't make them wrong.
  - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
    - Such precise terms as (Score:4, Informative)
      
      by Anonymous Coward writes: on Wednesday July 04, 2007 @04:38PM (#19747079)
      
      "Spacing like WP6"? "Caclculate incorrect leap year like Excel"?
      
      Becuase if you want to include bugs etc, then no, it doesn't support each and every 2007 feature.
      
      If you mean supporting tables, nested documents, embedded graphs, scripting and so on, yes.
      
      It may not be "click the same buttons" feature correct nor probably the "run the same VB code" compatible.
      
      Take a look at some of the people on the board that devised ODF. They include the US National Archives. Print media. Archivists.
      
      Y'know, people who KNOW DOCUMENTS.
      
      As to the remainder of your questions, there is a process, it does have to go through comittee (else how does everyone else know how to implement the new standard? MS doesn't have this problem since they only want themselves to know their updated standard). It is XML so it is extensible (decode the initialism). The process will take as long as it takes. Much the same as Vista will take as long as it takes to get SP 1 out.
      
      I don't see how these latter issues are something that is a part of ODF and not any form of standardisation that OfficeXML will have to have to go through for anyone other than MS to implement...
      
      Parent Share
      twitter facebook
    - Re:One thing I'd like to know (ODF question) (Score:5, Informative)
      
      by a.d.trick ( 894813 ) writes: on Wednesday July 04, 2007 @05:06PM (#19747323) Homepage
      
      Does the ODF specification support each and every Word/Excel/Powerpoint 2007 feature?
      Thank goodness no. "Auto Space like Word 95"? That's in the OOXML spec (and there's no explanation on how Word 95 does spacing either).
      If not, is it extensible?
      
      Yeah, it's XML. Also, unlike OOXML, ODF uses namespaces, so you can create a separate standard if you don't want to muck around with ODF.
      
      If it is extensible, do changes have to go through some sort of committee to be incorporated? How frequently are changes incorporated? How long is the process?
      
      It would depend. The thing about changing standards is that it causes problems for all sorts of people. There is a real need for a stable and standardized document format that just doesn't change, or if it does, very slightly.
      
      Parent Share
      twitter facebook
  - Re:MS should not own the standard (Score:5, Funny)
    
    by syousef ( 465911 ) writes: on Wednesday July 04, 2007 @05:10PM (#19747373) Journal
    
    There is no such thing as Open Office format.
    
    Rubbish. I've worked at places with an Open Office format. Basically they open the office to any monkey who turns up for a job interview and a handful of people have to make up for their incompetence.
    
    Parent Share
    twitter facebook
Idiots (Score:4, Insightful)

by suv4x4 ( 956391 ) writes: on Wednesday July 04, 2007 @02:03PM (#19745415)

The BBC is reporting that the UK National Archive is warning of old formats being a 'ticking time-bomb' where data is going to be lost because of incompatibility in newer versions of software, and software not existing at all. More surprisingly, Microsoft has offered a solution via the OOXML format.

There are so many idiots in this state of the affairs:

1. the idiots which decided to build huge archive with undocumented proprietary format
2. idiots which believe they can't find even a single copy of the software they need
3. idiots who didn't store a single copy of the software that reads the format, together with the archive (not very far from obvious, is it).
4. idiots who want to convince other idiots that OOXML is an open format (versus straight XML serialization of the whatever binary DOC was in the source code base at the time in MS)

Share
twitter facebook
- Re:Idiots (Score:4, Interesting)
  
  by xtracto ( 837672 ) writes: on Wednesday July 04, 2007 @02:09PM (#19745491) Journal
  
  2. idiots which believe they can't find even a single copy of the software they need
  
  Please give me a link to a copy of the Professional Write 3 (PW) software app. for MSDOS 6.
  
  Yep, I had that very problem some years ago when I was cleaning my room and found several 5 1/4 disquettes which contained the .pw extension. No way to find the program.
  
  Parent Share
  twitter facebook
  - Re:Idiots (Score:4, Insightful)
    
    by CastrTroy ( 595695 ) writes: on Wednesday July 04, 2007 @02:13PM (#19745525)
    
    I believe points 2 and 3 can be lumped into 1 format. It's like creating backup tapes, and then throwing out the tape reader. Who thinks these systems up?
    
    Parent Share
    twitter facebook
    - Re:Idiots (Score:5, Insightful)
      
      by tolan-b ( 230077 ) writes: on Wednesday July 04, 2007 @02:23PM (#19745615)
      
      It's not an archive of files in a single format, it's an archive of files in general, many formats, depending on which format the file was originally in.
      
      The system wasn't thought up any more than a library thinks up all the books it contains.
      
      Parent Share
      twitter facebook
      - Doesn't matter. (Score:2)
        
        by khasim ( 1285 ) writes:
        
        It's not an archive of files in a single format, it's an archive of files in general, many formats, depending on which format the file was originally in.
        And being a government, these files are INCREDIBLY important.
        
        Why haven't they been converted? Really, all their DIGITAL archives should be in a single format by now.
        The system wasn't thought up any more than a library thinks up all the books it contains.
        All those books are in a single format. And paper records can last a LOT longer than digital records. The
        
        Re:Doesn't matter. (Score:5, Interesting)
        
        by Bazzargh ( 39195 ) writes: on Wednesday July 04, 2007 @04:09PM (#19746807)
        
        And being a government, these files are INCREDIBLY important.
        
        Why haven't they been converted? Really, all their DIGITAL archives should be in a single format by now.
        
        No, they shouldn't. You usually want 3 formats:
        - the original format of the document. Whatever whichever idiot happened to write (or record, or video) it in, you absolutely want the original in your records.
        - a searchable format (eg OCR'd text from scanned image docs)
        - a rendered format. (eg an image or pdf, or svg - something open enough that you can continue to show how the doc would have looked). The appropriate rendered format varies. Paper is not an appropriate format for storing CCTV footage, for example ;)
        
        If you're very, very lucky the original is both searchable and viewable; like, say, HTML. It gets more complicated too, because you often want to store a redacted copy of the document (think of the Onion story 'CIA realise they've been using black highlighter pen all these years') and you want that searchable too, so you have to keep a redacted searchable format too... and of course, some of the records are on actual paper. Have you started worrying about the fading inks in the originals yet?
        
        BTW you can't restrict the format of the original. Consider an email from a corporate bidding for a govt contract, with attachments. They need to keep those.
        
        - Mr. E
        
        PS, posting anon because I have dealings with the national archives, and don't want to speak for my company.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3, Insightful)
        
        by Bazzargh ( 39195 ) writes:
        
        Hum now. completely failed to tick the posting anon box :) good job I held back from expressing opinions in there.
        
        1/2 pentabyte = 20 bits? (Score:5, Funny)
        
        by benhocking ( 724439 ) writes: <benjaminhocking.yahoo@com> on Wednesday July 04, 2007 @03:29PM (#19746303) Homepage Journal
        
        Fine, then you get to be the schmuck who has to organize, sort, label and store about 1/2 a pentabyte of information on paper.
        
        A pentabyte is 5 bytes, right? How hard is it to store 20 bits on paper? ;)
        
        (I assume petabyte (10^15 or 2^50, depending on convention) is the word you're looking for.)
        
        Parent Share
        twitter facebook
        
        That's why I said "depending on convention" (Score:2)
        
        by benhocking ( 724439 ) writes:
        
        Not everyone accepts the Pebi designation, so I included the phrase "depending on convention" in an effort to bypass arguments on both sides. Obviously, my effort failed. ;)
        
        Re: (Score:2)
        
        by benhocking ( 724439 ) writes:
        
        20 bits probably isn't hard to get stored on paper but when you consider 40 bits (which is what 5 bytes would really be) maybe that's when you start running into problems.
        
        Right, 1 pentabyte = 40 bits, which is why I stated that 1/2 pentabyte (taken from the GGP post, and in the subject heading) is 20 bits. 1/2 of 40 is 20. ;)
        
        Re: (Score:3, Insightful)
        
        by bheer ( 633842 ) writes:
        
        Whatever is worth keeping for a long time should be on paper and translated in more than one language.
        Er, even if you translate it into other languages, they'll evolve too. Try reading Old French much? And translation also leaves you with the headache of reconciling various translations and figuring out which is "more correct" (IIRC the Bible has this problem). It would be a much better idea to make redundant copies, to guard against bitrot and store them as physically apart as possible.
        I doubt that the now
        
        Re: (Score:3, Interesting)
        
        by Corporate Troll ( 537873 ) writes:
        
        For example, I keep a copy of DOS and Win3.1 ISOs (about 20MB total) and Norton Commander (3 floppy images!) on a DVDR, along with a copy of Virtual PC. This lets me recreate a Windows 3.1 virtual PC anytime I want.
        
        Now.... You can do that now. However, in 100 years, will this be possible? You do not know what the future brings. Let's not even talk about 1000 years and beyond. Now; you backed this stuff up on a DVD and you die tomorrow. Your kids keep the data, and when they die a historian speciali
        
        Re: (Score:3, Interesting)
        
        by ozmanjusri ( 601766 ) writes:
        
        Whoever modded this "Funny" is wrong. It should be insightful.
        My copy of Office XP won't activate on any of the computers I currently own (the hardware it was originally activated on is long-dead), and that's only 5 years old.
  - Re: (Score:2, Interesting)
    
    by mjensen ( 118105 ) writes:
    
    Don't have a link, but have Professional Write on CD of "Work software not used anymore"
    
    Along with Professional File (database product)....
  - Re: (Score:2)
    
    by suv4x4 ( 956391 ) writes:
    
    Yep, I had that very problem some years ago when I was cleaning my room and found several 5 1/4 disquettes which contained the .pw extension. No way to find the program.
    
    You're suggesting the National Archives have the resources and intelligence (as in, research and know-how) of a single guy who found several 5 1/4 disks while cleaning his room.
    
    Well, thanks. I laughed a lot.
    - Overestimate? (Score:2)
      
      by benhocking ( 724439 ) writes:
      
      You're suggesting the National Archives have the resources and intelligence (as in, research and know-how) of a single guy who found several 5 1/4 disks while cleaning his room.
      So, do you think that's a bit of an overestimate or an underestimate? ;)
  - - Re: (Score:2)
      
      by bladesjester ( 774793 ) writes:
      
      You beat me to it. :P
      
      While it may be hard to find the program itself anymore, you can usually find something that can read the files.
- Re: (Score:3, Interesting)
  
  by Kjella ( 173770 ) writes:
  
  1. the idiots which decided to build huge archive with undocumented proprietary format
  
  Which seems reasonable at a time when "everyone" has a computer that'll read it, for example when it comes to image viewers there's software covering literally hundreds of formats without issue.
  
  2. idiots which believe they can't find even a single copy of the software they need
  
  It's supposed to be an archive, not a "well we'll have to dig up a copy of the software, I'll get back to you in some months.
  
  3. idiots who didn't st
  - Re: (Score:2)
    
    by tjwhaynes ( 114792 ) writes:
    
    If Microsoft has taken the job of taking some binary BLOB and make it into something human-readable, OOXML or not, then I say you'd have an easier time converting OOXML to something readable in OpenOffice than not.
    If it's a text document, then you might be able to parse the OOXML regardless and understand most of the formatting. If it's a spreadsheet, then many of the parts of the OOXML spec are ALSO binary LOBs and you are no better off. If it's something that OOXML doesn't support, you are out of luck. At least ODF provides ways to package other formats along with itself in a transparent fashion, is completely documented and supported by multiple vendors.
    Cheers,
    Toby Haynes
- Re: (Score:3, Insightful)
  
  by timeOday ( 582209 ) writes:
  
  3. idiots who didn't store a single copy of the software that reads the format, together with the archive (not very far from obvious, is it).
  That's easier said than done. You'd have to keep multiple copies of everything, including hardware, up to the point where you're confident you have a stable standard - probably the power mains - and that's if you're not worried about violating licenses. Of course, with the advent of online apps, there is no way to snapshot the entire ecosystem of servers and softwa
  - - Re: (Score:3, Insightful)
      
      by timeOday ( 582209 ) writes:
      
      Just an FYI, governments don't have to worry about licensing. Especially in situations like this. They have the power of eminent domain.
      Think about Valve's Steam software for protecting video games (or any software that requires network activation). Just because you're willing to bypass it doesn't mean you can.
- It's not just about the software... (Score:3, Interesting)
  
  by WIAKywbfatw ( 307557 ) writes:
  
  It's not just about the software. It's the hardware, too.
  
  I'm sure that most of the archive data created today is stored on something like DVDs but, as recently as the early 1990s, the official long-term storage medium for the UK government was Syquest 44MB removable cartridge hard drives [wikipedia.org].
  
  I know that I have a working 44MB drive (well, when I last fired it up, which would have been sometime last decade) somewhere in my attic but I doubt that too many of these drives are still in existance.
  
  I only hope that the
  - - Re: (Score:2)
      
      by Torvaun ( 1040898 ) writes:
      
      Or, you could end up having the drive stolen from you. What do you think is more likely?
      
      Wrong! It was actually secret option C: accidentally drop the last Syquest drive, thus dooming civilization!
- Re: (Score:2)
  
  by mikael ( 484 ) writes:
  
  The National Archive is required to maintain a copy of every newspaper, magazine or journal published in the UK. In many cases, some magazines came with floppy disks, and CD-ROM's containing programs, data and applications submitted by users.
  This is the case, especially with computer magazines. Sensible publishers will have used self unpacking executables and/or the ZIP format.
  
  Finding a device to read floppy disks and CD-ROM's is straightforward enough. But trying to find the relevant application which runs
- Re: (Score:2)
  
  by uhlume ( 597871 ) writes:
  
  5. Idiots who apparently have never read the OOXML DTD, which, as I recall, includes certain type definitions for backward compatability with the binary DOC format, but explicitly deprecates them?
- Re: (Score:2)
  
  by MightyMartian ( 840721 ) writes:
  
  We have a format that has been around nearly half a century. It is universal (or nearly so), has countless applications that can open it. It can be used to store, via extensions to character sets, data in many international formats. It's not the most efficient method, but can be used to store documents, spreadsheets and databases.
  
  It's ASCII, of course. There's nothing wrong with using Word, VisiCalc, Wordstar or whatever, but just save a bloody text version of your document, and its guaranteed that some
- Re: (Score:2)
  
  by Jay L ( 74152 ) writes:
  
  There are so many idiots in this state of the affairs:
  
  You forgot the idiots who don't remember/were never aware of a time when there was no common platform, OS, hardware, or even, for that matter, alphabet encoding, and when nearly all files were saved as a dump of the in-memory structure for efficiency.
  
  At my old print shop, we used to save all the cool/funny business cards that came through the door. And one of the nicest Helvetica versions I've ever seen ("Helios").
  
  Someone probably still has all that. O
- Re: (Score:2)
  
  by kabz ( 770151 ) writes:
  
  Please send me a copy of Access 2.0 in which format I have quite a lot of IP stored, none of which is easily convertible to anything else.
  
  And if you suggest using Access 2007, just forget it, I already tried, and the documenter add-on is the biggest POS ever. This is using a legally installed copy of Office 2007 on XP Pro on my wife's desktop.
  
  I am sticking with .doc format generated by OpenOffice. Ionically I run OOo on Windows, but Office 97 on Linux. Hehe.
- Re: (Score:2)
  
  by AnotherDaveB ( 912424 ) writes:
  
  4. idiots who want to convince other idiots that OOXML is an open format (versus straight XML serialization of the whatever binary DOC was in the source code base at the time in MS)
  "The agreement between the National Archives and Microsoft centres on the use of virtualisation." The
- - Re: (Score:2)
    
    by Adult film producer ( 866485 ) writes:
    
    Programs exist that covert old formats to new. Digital archeology may become the hot growth industry of the 22nd century.
    
    In a hundreds years microsoft will no longer exist so what do the historians do when they uncover a stack of magnetic tapes/dvds that contains documents in .doc format? (.doc or whatever other format you can think of.) They may have the hardware to transfer the data to newer computers and storage but the secrets to translating .DOC were lost years ago when microsoft went bankrupt..
    - Re: (Score:2)
      
      by RexRhino ( 769423 ) writes:
      
      In a hundreds years microsoft will no longer exist so what do the historians do when they uncover a stack of magnetic tapes/dvds that contains documents in .doc format? (.doc or whatever other format you can think of.) They may have the hardware to transfer the data to newer computers and storage but the secrets to translating .DOC were lost years ago when microsoft went bankrupt..
      It isn't that complicated. If historians can piece together egyptian hieroglyphics from the Rossetta Stone, then they certainly can extract some plain text from a .doc file.
      
      And if they can't figure out a .doc file, they probably won't be able to figure out opendocument any better, because it is just as silly to believe that opendocument will be any more common 1000 years from now than microsoft word documents.
      - Re: (Score:3, Interesting)
        
        by nneonneo ( 911150 ) writes:
        
        Step back, though, and think for a minute about the "house of cards" upon which that Word document rests.
        
        It rests on
        1) Physical storage medium -- whether this is Flash, Hard Drive, Optical Medium, [NV]RAM, etc., all these technologies may be very difficult to retrieve data from, especially if the level of technology happens to go down in the future (say, global thermonuclear war). Even if data is retrieved, there's no guarantee that it's intact after 1000 years (the dyes in CDs will have decomposed by that
      - Re: (Score:2)
        
        by Chandon Seldon ( 43083 ) writes:
        
        And if they can't figure out a .doc file, they probably won't be able to figure out opendocument any better, because it is just as silly to believe that opendocument will be any more common 1000 years from now than microsoft word documents.
        The question isn't if the format will still be in use, the question is this: How hard will it be to write a converter to whatever the standard is in the future? With ODF, you need to uncompress a zip archive and parse some reasonably simple XML. With OOXML, you need to
Use SGML (Score:5, Funny)

by Morgaine ( 4316 ) writes: on Wednesday July 04, 2007 @02:04PM (#19745421)

It predates Moses, and is quite likely to survive the heat death of the universe.

Share
twitter facebook
- Use TeX (Score:2, Interesting)
  
  by user1003 ( 816685 ) writes:
  
  I wanted to design something that would be still usable in 100 years. (Donald E. Knuth, more than 20 years ago)
  
  Also, LaTeX will get you nicer documents than any WYSIWYG word processor in less time (once you know it ..). Oh and smaller filesize, too.
- - Re: (Score:3, Funny)
    
    by Cheesey ( 70139 ) writes:
    
    Yes, it's true. Sadly, early transcribers of the book left out the stuff they didn't understand. In addition to a number of now-forgotten sections describing the role of evolution in the creation of life, this included the following cryptic verses:
    
    2:2 And on the seventh day God said :wq and then make.
    
    2:3 And God watched gcc running and sanctified it, because it would have taken Him at least two weeks to write the whole thing in machine code.
The big lie... (Score:5, Informative)

by advocate_one ( 662832 ) writes: on Wednesday July 04, 2007 @02:04PM (#19745425)

they keep repeating this everytwhere they go... "Open XML"... their format is not Open... it's closed off with licensing and other restrictions... all the really good stuff in the specification has been obfuscated out and hidden behind indirections to the behaviour of legacy apps that only microsoft know the real ins and outs of... not only that, there's still an easy means for them to merely use XML as a wrapper for binary blobs...
to give it a proper name, the format is "Microsoft Open Office XML", they deliberately went to a lot of trouble to pick a name that's as easily to confuse as possible with OpenOffice

Share
twitter facebook
- Re: (Score:3, Informative)
  
  by Quarters ( 18322 ) writes:
  
  Their name choice has certainly worked on you. It's not "Microsoft Open Office XML" like you said. It's "Microsoft Office Open XML".
- Re: (Score:3, Funny)
  
  by davester666 ( 731373 ) writes:
  
  they keep repeating this everytwhere they go... "Open XML"... their format is not Open... it's closed off with licensing and other restrictions... all the really good stuff in the specification has been obfuscated out and hidden behind indirections to the behaviour of legacy apps that only microsoft know the real ins and outs of... not only that, there's still an easy means for them to merely use XML as a wrapper for binary blobs...
  You don't understand the format then. Office Open XML is the ultimate in
upgrades free money for MS (Score:2)

by wizardforce ( 1005805 ) writes:

MS benefits a lot from upgrades, that way you end up "needing" to pay for an upgrade down the road regardless of whether you bought a new computer or not. they stand to lose everything if open source is seen to be nearly or just as good by people at large/the government so they do just what they are required but not enough to weaken future cash streams from upgrades in the future.
Open Formats (Score:2)

by MrSteveSD ( 801820 ) writes:

This is why we should be using open formats, particularly for things that are really complex like video codecs.
Obviously... (Score:5, Funny)

by Colin Smith ( 2679 ) writes: on Wednesday July 04, 2007 @02:12PM (#19745515)

If you have a problem with proprietary formats you go to Microsoft to solve it for you... The word "DOH" springs to mind.

Oh yeah, their solution? Virtualised Windows 3.1. And obviously in 15 years you'll have to virtualise Vista in order to run the Win3.1 virtual machine to run Word. And Microsoft will be paid a license for each application and level of virtualisation.

You couldn't make this stuff up.

Share
twitter facebook
- Re: (Score:2)
  
  by joe 155 ( 937621 ) writes:
  
  I wish you weren't modded "funny", what you say is both true and insightful but its not funny... it's very, very sad
  - Re: (Score:2)
    
    by Macthorpe ( 960048 ) writes:
    
    It's not marked 'insightful' because everyone except you got laughed. Whether it was intended to be funny is another matter.
    
    Microsoft are providing virtualisation so they can run old software in order to convert it into newer formats, not to have a load of nested virtualised operating systems like Russian dolls that have to be paid for in perpetuity.
    
    The idea that an institution like the British Library, which is run by people bright enough to make you look like a dead match, would accept such a preposterous
    - Re: (Score:2)
      
      by CaptnMArk ( 9003 ) writes:
      
      Conversion doesn't work since there will likely be data loss in each step.. Just think .doc -> ...
      
      You really need open / published formats for archiving.
      - Re: (Score:2)
        
        by Macthorpe ( 960048 ) writes:
        
        What are you converting .doc to which causes data loss? I've never experienced any.
    - Bright people don't make tech decisions (Score:5, Interesting)
      
      by Cheesey ( 70139 ) writes: on Wednesday July 04, 2007 @03:46PM (#19746519)
      
      The idea that an institution like the British Library, which is run by people bright enough to make you look like a dead match, would accept such a preposterous idea is insulting.
      
      Unfortunately, those bright people don't get to make technical decisions.
      
      The British Library recently introduced SED [www.bl.uk], an electronic document delivery system. With SED, you can order electronic copies of journal papers and articles from their archives. Great idea! Previously, you had to wait for the documents to come through the post, and that would take a week or so. Now you get them by email in a couple of working days.
      
      Except that the documents are crippled by Adobe DRM, which imposes the following restrictions:
      
      You can only view them using certain specific versions of Acrobat Reader (6 or 7) - the latest version is not recommended [www.bl.uk].
      The software only works on Windows 2000 or XP. No Linux support, no Mac support. Vista might work, but again, it's not recommended.
      You can only look at each document for a limited time, and you can only print it once.
      So, if you want to use the service, you'd better hope that you have (a) the right version of Windows, (b) the right version of Acrobat Reader, (c) a reliable net connection, and, most importantly, (d) a very reliable printer that won't chew up the document. Unless you're a filthy dirty pirate, of course.
      
      If Adobe managed to convince the British Library to put up with this ridiculous system, I am sure that Microsoft will have no difficulty convincing them about their archive "solution". If SED is anything to go by, it'll be another awful implementation of a great idea.
      
      Parent Share
      twitter facebook
      - Re: (Score:2, Interesting)
        
        by innocent_white_lamb ( 151825 ) writes:
        
        a very reliable printer that won't chew up the document. Unless you're a filthy dirty pirate, of course.
        
        What about printing it on this [cups-pdf.de]?
      - Re: (Score:2)
        
        by jabuzz ( 182671 ) writes:
        
        Yeah tell me about it. However you can print it once. Let's just assume that I print it to a network attached PostScript printer. Except it really is not a network attached PostScript printer but a small program running on a Linux box saying, thank you very much and saving the entire stream to a file. At which point you can fire up your favourite PostScript distiller and turn it right back into a PDF.
        
        Oh and by the way you can use Acrobat 8 now.
        
        Re: (Score:2)
        
        by Cheesey ( 70139 ) writes:
        
        Oh and by the way you can use Acrobat 8 now.
        
        Ah, a minor improvement. It did strike me as particularly incompetent that Adobe's DRM scheme did not even work with the latest version of their own product, but then DRM is all about incompatibility and frustration for legitimate users.
    - Re: (Score:2)
      
      by Colin Smith ( 2679 ) writes:
      
      Microsoft are providing virtualisation so they can run old software in order to convert it into newer formats
      That doesn't make any sense. Microsoft already know the file format, just write a bit of software which will read in the old and write out the new format. We're talking terabytes of information here. it isn't as if you can just open each file manually and choose "export as".
      The idea that an institution like the British Library, which is run by people bright enough to make you look like a dead match, would accept such a preposterous idea is insulting.
      You seem to have a remarkable faith in institutions. They're planning to use OOXML. I think insulting them is entirely fair.
      
      Let me quote:
      Adam Farquhar, head of e-architecture at the British Library, praised Microsoft for its adoption of more open standards.
      Stop and consider that quote for a moment. Given your vast experience of the history of IT, does
More surprisingly!? No, UNsurprisingly (Score:4, Interesting)

by erroneus ( 253617 ) writes: on Wednesday July 04, 2007 @02:12PM (#19745521) Homepage

No. The obvious solution for the predicted problem of data being unavailable due to being in unsupported proprietary formats is to move it to a widely supported non-proprietary format.

As "well intentioned" as Microsoft may be, Microsoft's Open XML cannot be anything but proprietary when its code references Windows and Office API functions rather than more precise data format information as with ODF. (For more information about this, you might search out the arguments against making OOXML an ISO standard.)

Share
twitter facebook
- Re: (Score:2)
  
  by MightyMartian ( 840721 ) writes:
  
  It's partly MS's fault, but also partly a whole bunch of organizations' faults. Microsoft isn't the only one with big, ugly proprietary formats. There's still a helluva lot of documents from the age of WordPerfect. The real fault lies in the fact that the old push for standards like ASCII, which was meant to overcome much of this, were ignored in the halcyon days of the personal computer, when companies, whether through dreams of lock-in or simply because they didn't give a damn, ignored decades of work
Doesn't open source solve this (Score:5, Informative)

by wile_e_wonka ( 934864 ) writes: on Wednesday July 04, 2007 @02:23PM (#19745619)

It seems to me that this is really a nonproblem--OOo is compatible with lots of "dead" formats (or, can read them at least), as well as many other open source office programs. I can't imagine they're going to begin throwing away this compatability--it isn't like it takes extra coding (as far as I know). Also, I have found Microsoft Word's "Extract text from any file" to work pretty well (I had a roommate with a corrupted Mac-formatted disk that had his deceased grandmother's journal on it in some old Mac Word file (a format still readable in Word, but the disk was corrupted so I couldn't just open the file). I popped it in my parents' now deceased iMac and the only program I found that opened it was Word, using the "Extract text from any file" function. I emailed him the journal and he thanked me profusely).

Also--as noted, the OOXML format is a nonsolution for this nonproblem. It seems like it would be a waste of effort--why convert a bunch of files to a format that may die just as quickly as any other format, when you can just leave the file as is and open it in OOo (assuming I'm correct that they won't stop read support for dead formats)?

Also, it seems to me that no current format or any future format will ever solve this nonproblem because formats will always change as new functionality is continually added. The better solution is to keep this a nonproblem by having open source software that can read old file formats.

Share
twitter facebook
- Re: (Score:2)
  
  by stsp ( 979375 ) writes:
  
  It seems to me that this is really a nonproblem--OOo is compatible with lots of "dead" formats (or, can read them at least), as well as many other open source office programs. I can't imagine they're going to begin throwing away this compatability--it isn't like it takes extra coding (as far as I know).
  Well there is always maintenance work involved. Things change all the time, so does software. It could well be that in 20 years OOo won't support MS formats anymore for whatever reason, unless people actively
- Re: (Score:2)
  
  by MightyMartian ( 840721 ) writes:
  
  I think the whole problem is somewhat overstated. I've had to work with some godawful formats (mainly mainframe-style), and while it's a real pain in the ass, I don't imagine for a skilled programmer that finding some file in a format lost to antiquity is going to be impossible to break. It might take some effort. I guess the issue is the time and money spent cracking a file, but still, short of a media failure, none of this going to be truly lost.
  
  What needs to be done is for all these archival agencies
Real Issue (Score:2)

by saibot834 ( 1061528 ) writes:

I think I've read something that they are already unable to read some data stored on computers in the Ex-German Democratic Republic.

The only solution IMHO is _open and documented_ interfaces, protocols, programs, data types and hardware. In the future they won't be able to read our disks and files. They just can try to build a machine that reads our disks and files - for which they need documentation how they work.
surprise? (Score:5, Insightful)

by Tom ( 822 ) writes: on Wednesday July 04, 2007 @02:25PM (#19745633) Homepage Journal

What's surprising about that? Someone in MS Spin Control and Public Relations is worth his salary. The story could have exploded into an "avoid MS products if you want your data accessible some years down the road" fiasco (we all know that MS is the worst offender when it comes to changing the document formats, usually undocumented). Instead, it was turned into another push for their next format.

Brilliant.

"What, the shit I sold you yesterday stinks? Try this new shit, it's great and it has none of the problems of the old one."

That's what you hire PR people for.

Share
twitter facebook
How about some *helpful* suggestions (Score:5, Insightful)

by FreudianNightmare ( 1106709 ) writes: on Wednesday July 04, 2007 @02:26PM (#19745641)

Rather than bitching about Microsoft making an offer of 'help' which is just thinly disguised marketing (I mean, come on, par for the course no?), could we get a discussion about real solutions? I know MS bashing is fun, but come on, we do it on just about every other thread... lets have a day off.

To kick things off here's one:

Keep EVERYTHING in the simplest possible format. ASCII would seem sensible, since its the content we care about, not the formatting. (although that wouldn't help our Asiatic brethren much). Then Keep decent records of HOW you can read that format. With examples of the software and hardware. do this bit on PAPER. V. Tough Paper (or rock, or plastic or whatever). Update the explanations every other year, to put it in language the next gen will understand. Maybe also have instructions on how to translate the simple format to less simple things.

I guess, basically, its a case of KISS and then *provide a persistent and regularly updated 'Rosetta Stone'* for latecomers to work from.

As a side branch, this kind of reminds me of discussions I read about a while back of how to warn future generations about Nuclear Waste dumps (y'know, the really nasty stuff with half-lives in the thousands of years range). I don't think anyone ever came up with a decent answer....

Share
twitter facebook
- Re: (Score:2)
  
  by Colin Smith ( 2679 ) writes:
  
  http://en.wikipedia.org/wiki/OpenDocument [wikipedia.org]
  - Comment removed (Score:5, Funny)
    
    by account_deleted ( 4530225 ) writes: on Wednesday July 04, 2007 @03:14PM (#19746157)
    
    Comment removed based on user account deletion
    
    Parent Share
    twitter facebook
  - - Re: (Score:2)
      
      by Colin Smith ( 2679 ) writes:
      
      I keep my knowledgebases in Wiki format.
      
      http://pardus-larus.student.utwente.nl/~pardus/pro jects/zim/ [utwente.nl]
- Re: (Score:3, Interesting)
  
  by Ceriel Nosforit ( 682174 ) writes:
  
  ASCII would seem sensible, since its the content we care about, not the formatting.
  No, the formatting is important as well. Sometimes 'the medium is the message', and that whole bunch of artsy crap we geeks would prefer to ignore. - Just think of it as an engineering challenge in order to make the pain go away.
  
  You always archive the original (unless you have a batch; then you sample one and call it the original), and that original can be in just about any format, hand-written, coffee-stained, in sanskrit. When scanning a document into an electronic archive the ideal would be to have OCR
- Re: (Score:3, Interesting)
  
  by Kjella ( 173770 ) writes:
  
  "There is always an easy solution to every human problem--neat,
  plausible, and wrong." -- H.L.Mencken, The Divine Afflatus (1917).
  
  Ok, you started by identifying one problem - asian languages. In fact, pretty much every non-US language since you said ASCII and not Latin1. So we can extend that to UTF-8 with no problems, except there's probably a huge table just for the 100000 characters or so, even though the spec is quite short.
  
  But then, you have only characters, which is probably fine for basic text. How ab
- Re: (Score:3, Interesting)
  
  by davecb ( 6526 ) * writes:
  
  We fought a lot with this at Siemens (Sietec) about fifteen years ago, when trying to decide what format to use on stackers full of 12" WORM disks, which were just nicely becoming useful for large-scale archival storage in those days. We needed format that would outlast the disks, which probably meant 50-100 years assuming normal replacement/turnover.
  We ended up with the bottom level being a WORM standard, which was served out to users via the NFS standard, which was reasonably close to a Unix filesystem
- - Re: (Score:2)
    
    by MightyMartian ( 840721 ) writes:
    
    I've been to the National Archives researching 1st world war records (fascinating place BTW). These were stored on reel to reel tapes (similar to microfiche) that you viewed with a special machine. These are pretty much future proof, other than the fact that they will decay over time. ASCII would not be a good medium as they contain hand written comments etc.
    I think we're talking about electronic documents here, not about other mediums. ASCII isn't perfect, and it still leaves problems as far as binary d
Microsoft lecturing about open standards?!? (Score:2)

by daveewart ( 66895 ) * writes:

I just can't believe that Microsoft think they can get away with lecturing others about open standards.
You can almost hear the slime dripping. (Score:2)

by Colin Smith ( 2679 ) writes:

Microsoft's UK head Gordon Frazer warned of a looming "digital dark age"
A dark age caused by... Microsoft... Actually they're just doing what comes naturally.

The real problem seems to be the credulous morons in charge of the National Archives project.
I've never understood this arguement... (Score:2, Informative)

by ferrellcat ( 691126 ) writes:

...this argument that files and data will one day just magically become inaccessible in the future. I have tape and diskette media for my Commodore Pet machines that goes back to 1978. That's 29 years ago, and guess what? The great majority of this media is STILL READABLE. Furthermore, the tools [c64.org] necessary to transfer any of my old media to modern PCs have been around for well over a decade. Once you have the data on a modern PC the rest can be handled with emulation or virtualization. For someone to co
- Re: (Score:2)
  
  by Colin Smith ( 2679 ) writes:
  
  Yes. Can I just point out that you are still alive and clearly have all of the equipment you used to have in the seventies... Though I suggest you try the tapes, I suspect they'll be gone.
solution (Score:2)

by r00t ( 33219 ) writes:

You need to run the original software in an emulator, OS and all.

That emulator itself needs to be Open Source so that you can port it to future platforms. Otherwise, you'd be faced with running an emulator in an emulator in an emulator in an emulator in an emulator...

Keeping around multiple conversions certainly doesn't hurt. Converters vary in quality and the resulting conversions will themselves vary in future compatibility.
OOXML is not a solution (Score:2)

by argent ( 18001 ) writes:

OK, the deal is this. Let's say you have a bunch of files in some old format, and a spec for that format, and you need some information out of those files. That spec ill be useful to you if - and only if - the cost of implementing that format from the spec is less than the cost of losing those files, AND it's less than the cost of reverse-engineering enough of the format to extract the information you want from the files.

The OOXML spec is huge (expensive to implement from the spec) and complex, and the mean
- Re: (Score:2)
  
  by DaleGlass ( 1068434 ) writes:
  
  Bizarre.
  
  OOXML is XML -- if you want to extract plain text from it just feed it through a XML parser and strip all the tags. You can do something similar with Office's format, but the solution will be far less perfect and contain lots of junk.
  
  In fact, I just tried that. One of my .doc files filtered through strings is unreadable. There are newlines at weird points in the output, some text is outright missing (I imagine that because internally .doc is at least part memory dump, and so the text inside isn't ne
  - Re: (Score:2)
    
    by DaleGlass ( 1068434 ) writes:
    
    Ok, just finally realized that OOXML is the MS format and not the Open Office one. Seems like I should get some sleep, heh.
  - Thanks for explaining why OOXML is not a solution (Score:2)
    
    by argent ( 18001 ) writes:
    
    OOXML is XML -- if you want to extract plain text from it just feed it through a XML parser and strip all the tags.
    
    Precisely my point. If the layout and non-text information in the file matters, then you've thrown it away. If it doesn't, then why are you bothering to put it in the archive?
    
    You can do something similar with Office's format, but the solution will be far less perfect and contain lots of junk.
    
    Yes, and (as I noted) I've done the same thing, and it's a relatively crude way of reverse-engineering
called "hiring the fox to guard the henhouse"(n/t) (Score:2)

by HiThere ( 15173 ) writes:

I believe this is called "hiring the fox to guard the henhouse".
Time bomb ? (Score:2)

by Yvanhoe ( 564877 ) writes:

Good! The word is carefully chosen. It now has a chance to be heard by politicians. Wouldn't there just be a way to link proprietary formats to Al-Quaeda ? Come on ! I'm sure we can !
Here in the colonies ... (Score:2)

by IchBinEinPenguin ( 589252 ) writes:

... we already have a solution: http://www.naa.gov.au/recordkeeping/preservation/ d igital/applications.html [naa.gov.au] The Archives' approach to digital preservation relies on converting digital records from their original format into preservation formats. Xena (XML Electronic Normalising of Archives) is the program created by the National Archives to complete these processes.
Xena converts digital records into two preservation formats.
* Bitstream version. This is a metadata-wrapped bitstream version of the
You have *got* to be kidding me... (Score:2)

by NerveGas ( 168686 ) writes:

We live in an age when brand-new, undocumented, *encrypted* file formats are deciphered within days or weeks. You're telling me that in a few decades, NOBODY will be able to figure out a spreadsheet or word-processing document?
- Re: (Score:2, Insightful)
  
  by Professor_UNIX ( 867045 ) writes:
  
  That's a silly argument. You just have to emulate Windows95 on whatever platform you're using 100 years from now, not all the intermediate platforms. For example, high speed computers today can still play old arcade games from 30+ years ago through emulation, but we're not doing it by emulating a Pentium that is emulating an Amiga that is emulating a Commodore 64 that is emulating an arcade machine, we're just emulating the arcade machine. It's not a good solution to file format issues, but virtualizatio
  - Re: (Score:2)
    
    by hcdejong ( 561314 ) writes:
    
    Good point. My idea was to avoid having to build an emulator for a 100 year-old platform [1]. By stacking them, the emulator you're writing only needs to understand software that was written 10 years ago.
    
    1: I figured that to do this, you need to know how the 100-year-old system works. That's no problem now, since there's still enough of the old hardware (and its documentation) lying around. But some day, those will have turned to dust. Your archive better contain complete information on all the old data for
- IBM (Score:3, Informative)
  
  by ushering05401 ( 1086795 ) writes:
  
  If you are going to choose a proprietary vendor to safeguard your data wouldn't IBM be the obvious choice. They have proven their ability to keep 20 year old programs running in modern environments without modification.
  
  It has been a while since I worked on an AS/400 system... so anyone with updated info please feel free to correct me if things have changed.
  
  It seems like a no-brainer.
  
  Link: http://en.wikipedia.org/wiki/AS/400 [wikipedia.org]
  - Re: (Score:2)
    
    by Feyr ( 449684 ) writes:
    
    i have a customer who's been told by ibm, with a 2 weeks notice, that they'd have to change their whole network because the firewall module for their as/400 (or something to that effect) would not run after applying the patch, and they had no plan to make it work
    
    so much for 100% compatibility
  - Re: (Score:2, Informative)
    
    by LiquidCoooled ( 634315 ) writes:
    
    Actually, MS have done quite well with forwards compatibility.
    
    I can still double click on .com executable files written well back in the mists of time and run usable programs.
    
    For example, here is a version of Visicalc from 1981!
    
    http://www.bricklin.com/history/vcexecutable.htm [bricklin.com]
    - Re: (Score:2)
      
      by gerrysteele ( 927030 ) writes:
      
      Works perfectly well in DOSbox on MS's sworn enemy platform.
      
      How is it such an achievement on a platform of their own design?
    - Re: (Score:3, Insightful)
      
      by TheRaven64 ( 641858 ) writes:
      
      I have run that same version of Visicalc, in DOSBox, on a PowerPC Mac. Actually, I've run a few programs in that environment that don't run on Windows without the aid of DOSBox. To me, this says that third parties are better than Microsoft themselves for backwards compatibility with Microsoft programs. I wonder how long it will be before WINE has better support for old Windows apps. I think this is already the case for a few win16 programs...
      - Re: (Score:2)
        
        by toddestan ( 632714 ) writes:
        
        I have run that same version of Visicalc, in DOSBox, on a PowerPC Mac. Actually, I've run a few programs in that environment that don't run on Windows without the aid of DOSBox. To me, this says that third parties are better than Microsoft themselves for backwards compatibility with Microsoft programs.
        
        That's not backwards compatibility, that's just emulation. I can run C64 programs on Windows too using emulation, but it would be wrong to say Windows is backwards compatible with the C64. Windows is pretty
        
        Re: (Score:2)
        
        by ozmanjusri ( 601766 ) writes:
        
        Windows is pretty good at backwards compatibility, and a surprisenly large amount of old DOS/Windows stuff will run on it.
        That's not backwards compatibility, that's just emulation.
        There is no DOS in Windows XP What is called the "command prompt" is not really DOS ... it can be thought of as more of a simulation of DOS.
- Re: (Score:2)
  
  by SanityInAnarchy ( 655584 ) writes:
  
  And what do you do when it does break?
  
  (Not if. When.)
- Re: (Score:3, Informative)
  
  by Macthorpe ( 960048 ) writes:
  
  The video is of the managing director of Microsoft UK, not someone associated with the British library. Hence the caption 'Microsoft UK Managing Director Gordon Frazer running Windows 3.1 on a Vista PC'.
  
  Yes, that was sarcastic, but you deserved it.
  - - Re: (Score:2)
      
      by Macthorpe ( 960048 ) writes:
      
      There was nothing personal about it, just good-natured ribbing, though I appreciate that doesn't carry over the internet very well.
- Re: (Score:2, Informative)
  
  by Vombatus ( 777631 ) writes:
  
  For a solution which converts documents to openly specified file formats (not OOXML), see XENA at https://sourceforge.net/projects/xena [sourceforge.net]

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Tagging beta... (Score:2, Insightful)

MS should not own the standard (Score:2)

Re:MS should not own the standard (Score:5, Informative)

Re: (Score:3, Informative)

Re: (Score:2)

Such precise terms as (Score:4, Informative)

Re:One thing I'd like to know (ODF question) (Score:5, Informative)

Re:MS should not own the standard (Score:5, Funny)

Idiots (Score:4, Insightful)

Re:Idiots (Score:4, Interesting)

Re:Idiots (Score:4, Insightful)

Re:Idiots (Score:5, Insightful)

Doesn't matter. (Score:2)

Re:Doesn't matter. (Score:5, Interesting)

Re: (Score:3, Insightful)

1/2 pentabyte = 20 bits? (Score:5, Funny)

That's why I said "depending on convention" (Score:2)

Re: (Score:2)

Re: (Score:3, Insightful)

Re: (Score:3, Interesting)

Re: (Score:3, Interesting)

Re: (Score:2, Interesting)

Re: (Score:2)

Overestimate? (Score:2)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:3, Insightful)

Re: (Score:3, Insightful)

It's not just about the software... (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:2)

Use SGML (Score:5, Funny)

Use TeX (Score:2, Interesting)

Re: (Score:3, Funny)

The big lie... (Score:5, Informative)

Re: (Score:3, Informative)

Re: (Score:3, Funny)

upgrades free money for MS (Score:2)

Open Formats (Score:2)

Obviously... (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Bright people don't make tech decisions (Score:5, Interesting)

Re: (Score:2, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

More surprisingly!? No, UNsurprisingly (Score:4, Interesting)

Re: (Score:2)

Doesn't open source solve this (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Real Issue (Score:2)

surprise? (Score:5, Insightful)

How about some *helpful* suggestions (Score:5, Insightful)

Re: (Score:2)

Comment removed (Score:5, Funny)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:3, Interesting)

Re: (Score:3, Interesting)

Re: (Score:2)

Microsoft lecturing about open standards?!? (Score:2)

You can almost hear the slime dripping. (Score:2)

I've never understood this arguement... (Score:2, Informative)

Re: (Score:2)

solution (Score:2)

OOXML is not a solution (Score:2)

How about some helpful suggestions (Score:5, Insightful)

You have got to be kidding me... (Score:2)