Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Office 2003 and XML

Posted by michael on Thu Mar 13, 2003 11:46 AM
from the just-what-you-expected dept.
zachlipton writes "Internet World is reporting that initial reports from Office 2003 beta testers don't look good for those hoping to share documents with non-MS systems using the XML file format. Gary Edwards, the OpenOffice.org representative for the OASIS XML file-format group is quoted as saying "although it's still early in the review process, it does look as though XP XML has been so seriously crippled as to be useless to anyone but the big content management and collaboration system providers." Apparently, all formatting and presentation information is removed from the XML. Furthermore, Office's new collaboration featres will only work with users who are also running Office 2003 (requiring Windows 2000 or 2003) that are connecting over XP servers." So Microsoft will continue its efforts to lock-in users with proprietary formats, and hopefully the rest of the world will produce an XML standard document format without them.
+ -
story
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • At some point..... (Score:5, Insightful)

    by i_want_you_to_throw_ (559379) on Thursday March 13 2003, @11:48AM (#5503837) Homepage Journal
    Microsoft will have to learn IBM's lesson about transforming from a company that makes standards, to one that contributes to them.
    They still don't get that their attempts to "embrace and extend" the whole damn internet isn't going to work.

    The rest of the world WILL produce an XML standard document format without them, thank heavens.
    • by McDutchie (151611) on Thursday March 13 2003, @11:50AM (#5503866) Homepage
      he rest of the world WILL produce an XML standard document format without them, thank heavens.
      Which will be an irrelevant format because everyone will still need Word to read all the ubiquitous crippled Word XML format documents flying around on the net.
      • by gmuslera (3436) <gmuslera@@@gmail...com> on Thursday March 13 2003, @11:58AM (#5503974) Homepage Journal
        Word (or even complete office), Win2k/XP as desktop and server. If someone sends me a document in Office 2003 format that he say I "MUST" read, I ask him to choose between sending me US$2003 to be able to read it, or sendme it in a really open format.
        • by MeanMF (631837) on Thursday March 13 2003, @01:18PM (#5504740) Homepage
          You could also just download the free MS Word viewer that Microsoft provides here [microsoft.com].
          • by bfree (113420) on Thursday March 13 2003, @12:30PM (#5504308)
            Why? The attitude sounds harsh when expressed so simply, but if you tell you "client" that you can't read the file and that your company has decided not to purchase the software required to be able to do so as otherwise they would have to pass on the associated costs to their clients, so could they please send the file in a format you can read instead (even Word XP or earlier thanks to oo.o) or fax it, should the client really have a problem and if so is it worth keeping hte client (yes I really said that, lots of the time troublesome clients aren't worth keeping without changes if you actually can cost them completely)? Similarly with a coworker you can ask them if you can buy the software from their budget (in a company setting there should be company standards so this should be easy)!
                • by aaarrrgggh (9205) on Thursday March 13 2003, @01:41PM (#5504987)
                  I agree with what you are saying, but there is a caveat: once a product has reached critical mass, you have to go along with everyone else.

                  I remember problems with AutoCAD back 7 years ago or so, going from release 12 to release 13. 13 was a dog. It had an incompatible file format, forcing upgrades for everyone that shared the same document. Since 13 didn't offer enough incentive for them to reach critical mass, it died with most people sticking with 12 until the next release came out... which solved a lot of problems. Autodesk got a humility pill and realized that forcing the upgrades is bad policy, although you can do thing to encourage it (default format save).

                  The trouble with MSFT's approach is that it breaks too many things at once; you have to get critical mass not only on the office application, but also the operating system and servers. A company that is not posed for this migration will not do it. If a single client requires it, then they will hire a secretary to do a saveas down to a more manageable format. If half the clients require it, it is difficult to avoid the upgrade.
          • by ccp (127147) on Thursday March 13 2003, @12:32PM (#5504328)
            Why not?

            If your clients tell you to bend over, you bend over? You seem to have a very sad life. Grow some spine, explain things to them, and you'll be surprised about how many of them get it.

            And, in case you wonder,

            I'm not a student.
            I own a business.
            And yes, I'm doing rather well even with principles.

            Cheers,
      • by zog karndon (309839) on Thursday March 13 2003, @12:48PM (#5504492)
        Which standardization committee would this be? RTF has never been submitted to a standards committee - it's always been defined by Microsoft. I should know - I maintained the RTF spec for two years when I worked at Microsoft.

        While Microsoft could, theoretically, completely redefine RTF whenever they felt like it, it's so widely used (at least within Microsoft) that it wouldn't be worthwhile.

        Microsoft can (and does) add new control words whenever they feel like it, but new control words won't affect existing (non-broken) RTF readers.

            • Re:Grrrrrr (Score:5, Interesting)

              by Junior J. Junior III (192702) on Thursday March 13 2003, @01:18PM (#5504743) Homepage
              Some of us have to operate in the real world where most computer users couldn't give two fifths of a fourth of a crap about XML or open standards.

              They should.

              Most people should vote, keep up with current events, be knowledgable about history, be able to program their VCR, and floss their teeth, too.

              But don't expect that they will.

  • by avdi (66548) on Thursday March 13 2003, @11:50AM (#5503856) Homepage
    Apparently, all formatting and presentation information is removed from the XML.
    And this is bad how? Isn't this the dream that XML document proponents have aspired to for years? You just can't please some people...
    • I think the point is that if you save to their XML specification, you will loose all your document formatting. So yeah, the data is there, but it can't be reopened in Office or any other word processor and be in a structured way. Essentially, it is the same as just saving as plain text which has already been available since Office 95.
      • by RobotWisdom (25776) on Thursday March 13 2003, @01:24PM (#5504802) Homepage
        I think the point is that if you save to their XML specification, you will lose all your document formatting.

        I think the root of the confusion goes back to Golfarb's original theory for SGML-- that the styles in a document are secondary to the structures, and should be kept separate.

        This has been a religious conviction ever since, despite the fact that most authors are messy and intuitive, and SGML-etc are very, very rigid and unintuitive. The rationalisation is that messy authors can just represent their styles using 'fake' (ad hoc) XML, but if this turns out to be 90% of the real users of MS Office, then I think MS could indeed save valid XML, but it won't be portable in any useful sense.

    • I have to agree. The the basic concept behind SGML and its diminutive offspring, XML, was to separate content, structure and presentation. This just means that you have to share a style sheet, FOSSI, or whatever when you share a document if you expect the person you share it with to be able to view it.

      There may be other *valid* criticisms of what Microsoft is doing but this isn't one of them.
    • And this is bad how? Isn't this the dream that XML document proponents have aspired to for years? You just can't please some people...

      Unfortunately, Manny Manager and Sarah Secretary are now very used to depending on the formatting and presentation information. To be honest, not too many people these days subscribe to the whole minimalist document theory (unless your idea of starting your editor is typing 'vi').

      The main point here is to encourage the .XML format for interoperability. If the XML format can't figure out the fonts, colors, and various drawing elements in your document, then people will abandon it for something that does - at the expense of the rest of us.

    • I don't think this means that there is no stylistic information in the document, rather that the style information is contained within the proprietary code segment of the document.

      If Word documents all utilised the same style for various elements, it'd all be hunky-dory. However, users like their choice of a 50pt purple serif font for a title to stand, so the formatting information MUST be included with the document.

      Perhaps a better format would be a zipped file that contains seperate XML and XSL documents...

    • by djoham (93430) on Thursday March 13 2003, @12:22PM (#5504228)
      This may be bad (keeping in mind the jury is still out on exactly how Microsoft is making this work) because in the case of office documents, the style is actually *part* of the content, from the perspective of Joe Office User.

      If Microsoft just puts the raw text data into a .xml file, then that .xml file is practically useless to anyone who wants to collaborate with the original author since all of the styling information is lost.

      As an example of a good way to do this (IMHO), take a look at how OpenOffice.org builds their files. When you make a .sxw (the default writer format) you're actually taking the raw data of the document, the styling rules for the document and a few other important bits and pieces and zipping them up into a single file.

      After unzipping this file, the following directory structure was exposed:

      content.xml
      META-INF/manifest.xml
      meta.xml
      mi metype
      settings.xml
      styles.xml

      With this type of design, you can get the best of both worlds. Technically, there is a separation between your presentation and content which allows simple programatic access to the data when necessary. At the same time, this design allows for full collaboration between people who also consider the styling of the data to be part of the content because the style rules for the content are included with the document.

      With xml-saved Office documents containing only data and no style, collaboration between non-office users (and apparently Win9x users as well) will be no better off than before. Perhaps worse, assuming the binary .doc, .xls etc formats have changed and will need to be reverse-engineered again.

      If this article is true and Microsoft has decided to remove the styling of their xml-saved office documents, I see two possible reasons for this:

      The first is obvious. You're not using Office? Ok, second class citizen, here's the data but in a format that is next to useless for you to use.

      The second possibility involves Microsoft just not being where they want to be with the Office XML sharing. Keep in mind that it took OpenOffice.org something like a year and half or so to define their XML interchange format. Microsoft may be going there, but due to overwhelming inertia, it just might not be going there very quickly.

      Personally, I think the first option is the most likely. However, with OpenOffice.org working with OASIS and others on a common XML interchange format, I'm hoping Microsoft will be forced by the marketplace into option 2.

      Best regards,

      David
  • Style Sheets (Score:5, Insightful)

    by FattMattP (86246) on Thursday March 13 2003, @11:50AM (#5503864) Homepage
    Apparently, all formatting and presentation information is removed from the XML.
    Good. That's the point of XML. Formatting and presentation goes in style sheets.
    • Re:Style Sheets (Score:5, Interesting)

      by Captain Large Face (559804) on Thursday March 13 2003, @12:14PM (#5504145) Homepage

      The problem is that they don't include it elsewhere.. So in order to share documents in the style intended by the user, it must be saved as the proprietary format.

      IMHO, this ensures the user will opt-out of the XML format, and stay with the proprietary format. As I posted above, if Microsoft are going to do this, then they should bundle an XSL document with each XML document.

      • Save As XML = WordML (Score:5, Informative)

        by malakai (136531) on Thursday March 13 2003, @12:46PM (#5504474) Journal
        Taken from a real review of the XML/Office features:

        Once valid, the document can be saved as XML in two ways. The default is to create
        WordML, which preserves Word's styles and formatting in an XML name-space that's separate from the one bound to the schema-controlled data. You can optionally save through an XSLT transformation which, in a publish-to-the-Web scenario, could translate WordML formatting into HTML/CSS formatting. Alternatively, if you tick the Save as Data option, you can instead save just the raw XML data. In that case, you can bind one or more XSLT stylesheets to the document, each of which can generate WordML styles and formatting.


        InternetNews is authored by morons.

        -malakai
        • by Hangtime (19526) on Thursday March 13 2003, @01:33PM (#5504896) Homepage
          Same thing with Excel, you can save as Excel with formatting or not. This comes from the Excel XML with formatting. Quite simply the article is flamebait.

          <Style ss:ID="s26" ss:Parent="s16">
          <Borders>
          <Border ss:Position="Bottom" ss:LineStyle="Continuous" ss:Weight="1"/>
          <Border ss:Position="Top" ss:LineStyle="Continuous" ss:Weight="1"/>
          </Borders>
          <Font ss:FontName="Times New Roman" x:Family="Roman" ss:Size="12" ss:Bold="1"/>
          <NumberFormat ss:Format="_(* #,##0_);_(* \(#,##0\);_(* &quot;-&quot;??_);_(@_)"/>
          </Style>
          <Style ss:ID="s27">
          <Alignment ss:Vertical="Bottom"/>
          <Borders/>
          <Font ss:FontName="Geneva"/>
          <Interior/>
          <NumberFormat/>
          <Protection/>
          </Style>
          <Style ss:ID="s28">
          <Font ss:FontName="Geneva" ss:Size="12"/>
          <NumberFormat ss:Format="0.0"/>
          </Style>

          <Stuff in between here to get around Lameness filter>

          <Style ss:ID="s27">
          <Alignment ss:Vertical="Bottom"/>
          <Borders/>
          <Font ss:FontName="Geneva"/>
          <Interior/>
          <NumberFormat/>
          <Protection/>
          </Style>
          <Style ss:ID="s28">
          <Font ss:FontName="Geneva" ss:Size="12"/>
          <NumberFormat ss:Format="0.0"/>
          </Style>
  • by dreamchaser (49529) on Thursday March 13 2003, @11:51AM (#5503874) Homepage Journal
    This isn't news really. Did anyone actually think that MS would embrace open standards to make it easier for their competition to work with their products?

    IMNSHO, I think that this will backfire eventually. Slowly but surely the world is moving more and more towards open, interoperable standards.

    I use Office 2000 and OpenOffice, and I won't be moving to Office XP or later versions anytime soon, if ever. The enviroment I work in still uses Office 97 (mostly due to budgetary constraints, though they ARE considering a move to XP sadly).

    Microsoft is at the point where they will do anything to lock in their current market share, and are trying to make it increasingly harder to move away to anything different. Once you can't share your files with any other application suite, the sheer cost of file conversion alone will keep most people from switching to other alternatives.
    • by Carnage4Life (106069) on Thursday March 13 2003, @12:43PM (#5504445) Homepage Journal
      Disclaimer: I work for one of the XML teams at Microsoft [not on Office] but this is not an official statement but my personal opinion.

      Office 11 supports a significant number of W3C XML standards including SOAP, XML, XPath, XSLT, WSDL, DOM and XSD. Don't take my word for it read Jon Udell's columns on Infoworld such as 10 Things You Need To Know About XDocs [infoworld.com] or Exploring Office 2003 [infoworld.com]. I personally was quite stunned and very pleased when I found out that the Office folks were moving from binary formats to XML which opens the doorway for producing and processing Office documents using off-the-shelf XML tools and technologies.

      The only real complaint I saw in the entire article that some tags related to presentation are stripped out when saving as XML. Specifically Jon Udell explained the differences in his blog entry [infoworld.com] where he stated
      There's been a fair amount of chatter about whether Office 2003 will "really" support XML. The answer is yes, but in two different ways. When a Word document contains schematized data, for example, and you save only the data, your XML output is pure as the driven snow


      When you elect to keep the WordML formatting information, you get a mixture of two namepaces: a WordML namespace with style and formatting information, and a data namespace (here, ns2) for schematized data. So, is this just angle-bracketed RTF? Yes. Is it "real" XML? Also, yes.
      Basically it looks like the authors of the article want to have their cake and eat it too. They somehow want to preserve all the formatting information in their documents in the XML output yet not end up with a lot of Office specific content in their documents.

      Secondly one of the primary goals of XML is the separation of presentation from content. Meaning that how an XML document is displayed to the user is unimportant (that's what stylesheets are for just look at the direction XHTML 2.0 [w3.org] is going in) and instead what is important is the data & metadata within the document. In my opinion, this actually allows people to innovate because they are not limited to a single look and feel for their documents but instead can present them in different ways for different audiences and different platforms. This was the major failing of HTML and it is sad to see people try to bring that mentality to the XML world.
      • by Tailhook (98486) on Thursday March 13 2003, @02:15PM (#5505335)
        Basically it looks like the authors of the article want to have their cake and eat it too. They somehow want to preserve all the formatting information in their documents in the XML output yet not end up with a lot of Office specific content in their documents.

        The choices then appear to be "data only XML" or "RTF marked up XML". Is this correct?

        If so, then I think the critics are correct. The critics wish that the document can be read and manipulated by some non-Microsoft editor. I doubt this is feasible with the WordML format, aka "RTF marked up XML". I'll explain why.

        If, as you point out, a WordML document is collection of data marked up by XML tags that provide only low-level (RDF) presentation information, then it is of no use to an alternate editor implementation. An analogy would be to attempt to edit a Word .doc document after saving it as RTF. The appearance of the RTF when rendered is correct, but the data model that Word uses internally is not represented in the RTF document. This is a "one way" conversion. You can edit the RTF, but you can not reproduce the .doc file from it.

        It sounds to me like it is the same with WordML. You can read and edit WordML because it's valid XML. However, the higher level data model of Word is simply lost. No means is provided for a processor to understand the original structure of the document.

        For example, if I should create a "style" in Word and apply that style to a paragraph, the WordML output will tell me what font to use. However, the WordML file tells me nothing about the "style". So I can't tell what other paragraphs are supposed to change in sync if I change the style. I can't know the inheritance of style parameters. In short, I can't programmatically edit the Word data model.

        The hope/expectation was that the XML output would provide this information. Thus, it would be possible to essentially re-implement the Word data model and correctly manipulate Word documents. With this hope/expectation in mind, it's clear why what they have actually found is considered crippled. WordML can't be used to recreate any part of the original data model in an alternate editor. It's just data mixed up with low level markup.

        I have always thought that this expectation is niave. Microsoft protects the tools they sell by making it infeasible to create alternate implementations. Just because the tool can output XML doesn't mean that you can do without Word.

        BTW, I am by no means an expert at any of this. I'm no Office beta tester and I haven't looked at OpenOffice in months.
  • LOL (Score:5, Insightful)

    by Boss, Pointy Haired (537010) on Thursday March 13 2003, @11:52AM (#5503889)
    I mean come on; who was expecting anything different?

    When I first heard that MS Office was moving to an XML based file format, I didn't think "ooh yippy do, we'll be able to share information".

    I thought:

    <msoffice type="word">
    6647AB84B348W837G86438H5D345W34
    6647AB84B348W837G86438H5D345W34
    6647AB84B348W837G86438H5D345W34
    6647AB84B348W837G86438H5D345W34
    6647AB84B348W837G86438H5D345W34
    </msoffice>

    I was right. :)
    • Re:LOL (Score:5, Funny)

      by 4r0g (467711) on Thursday March 13 2003, @12:19PM (#5504192)
      Or did you mean:
      <msoffice type="word" encoding="evil" compression="evil-magic">
      666
      666
      666
      666
      666
      666
      666
      666
      </msoffice>
      • Sorry, you are wrong (Score:5, Informative)

        by malakai (136531) on Thursday March 13 2003, @12:54PM (#5504546) Journal
        The XML save (default is as WordML) contains both data and all formating/styles needed to render this document without any loss. WordML is loosely based on RTF. RTF is what Word has been used as it's "properietary" standard for years (internally in memory as well as in the doc file).

        Saving as WordML give open office the ability to modify XML data and thus "modify" a legit word document. Word has no problems opening a WordML document I created by hand.

        What these morons authors are talking about in this article, is when you check off the "Save As Data" checkbox in Save as XML file dialog. Word then strips formating and tries to stick the data according to the choosen XSD (which should have been mapped to the Word elements) into an XML format. You can optionally choose to then have word run an XSL transform against the resulting data, and save _that_ to the file system.

        If these guys can't figure out how to make OpenOffice work with this WIDE FUCKING OPEN FORMAT then I certainly don't want to use their product.

        Hell, I'll write them a WordML to xsl transform for them in a day.

        -malakai
  • Wow. (Score:5, Funny)

    by deviator (92787) <bdp&amnesia,org> on Thursday March 13 2003, @11:53AM (#5503906) Homepage
    I am shocked. Shocked! I'm shocked that Microsoft would do something like this that wasn't in the best interest of their customers.
  • Missing the point (Score:5, Insightful)

    by graphicartist82 (462767) on Thursday March 13 2003, @11:55AM (#5503935)
    So Microsoft will continue its efforts to lock-in users with proprietary formats, and hopefully the rest of the world will produce an XML standard document format without them.

    I'm not trying to start a flame war here, but it seems that they're missing the point! We don't want it to be MS with one format and the rest of the world with another. That really wouldn't make it much different from how it is now. At least the way it is now, non-MS office software can read the MS formats. If it comes down to the choice between using the MS format or the "rest of the world" format, MS is going to win every time..
  • bollocks (Score:5, Insightful)

    by graveyhead (210996) <fletch&nationofcriminals,com> on Thursday March 13 2003, @11:58AM (#5503971) Homepage
    hopefully the rest of the world will produce an XML standard document format
    This is just so wrong. It smacks of a writer who doesn't really understand the utility of XML. There doesn't need to be "The One True Document Format"... that's not what XML is all about.

    Instead, create an XML format that is specific to your needs and write a DTD or XML-Schema that describes it. If you need to translate it to someone elses' XML document format, a quick XSLT stylesheet will transform the document with a minimum of effort.

    Just my 2 cents.
  • by PerlPunk (548551) on Thursday March 13 2003, @12:00PM (#5503989) Homepage
    All Microsoft needs to do is make their standard an open one (that can be used by others), like Adobe has done with their PostScript and PDF formats. Adobe has done quite well with their products based on these formats, too. Products like Adobe Illustrator and Photoshop (which works very well w/ bitmaps saved in PostScript) are the industry standard in digital art. If Microsoft followed a similar model, I'm sure that Microsoft Word will continue to be the industry standard in word processing software, and Microsoft as a business won't be any less richer for it.
  • by nhavar (115351) on Thursday March 13 2003, @12:02PM (#5504014) Homepage
    Isn't part of the concept of XML relating DATA and being able to seperate presentation from pure content. Isn't the additional concept of XML it's extensibility and adaptability for one group to use it differently than another? Because if not I've been using XML wrong for about 2 years now.

    This article makes it sound as if MS is doing something completely improper with XML (i.e. changing it's "standard"). But it seems to me that MS is simply separating content from presentation and relying on ????(something proprietary, xsl, more xml) to provide presentation. Just because they don't use the standard the same way you want them to doesn't mean that they are breaking the standard. I'm sure if you look at the XML that they output it's all standard XML. It also sounds as if they are not using any of the "tricks" that others have complained about (i.e. storing binary data in an xml tag).

    Instead of bitching about the problem maybe we should
    1) provide feedback if we are a beta tester
    2) wait for it to be released
    3) ready some tools to provide interoperability
    4) work harder on creating tools better than MS
    • by Fnkmaster (89084) on Thursday March 13 2003, @12:25PM (#5504258)
      Did you read the article? It's not about breaking a standard, it's about making a fucking USELESS file. If no formatting information is saved, it's no better than File->Save As Text. Clearly, separation of presentation and content is not unreasonable, and I think everybody would say they support that. But that's not what they've done. They have (at least according to the article, we won't know for sure till it's released) is eliminate the presentation data from their XML format. ELIMINATION of presentation makes the format useless for document exchange, and thus an essentially useless feature, period.
      • by malakai (136531) on Thursday March 13 2003, @12:37PM (#5504381) Journal
        Read some other articles, or better yet get ahold of a beta and try it out. The authors of this articles will feel like schmucks when they realize what they missed.

        First off, by default, if you save the word document as XML, it gets saved as WordML,which preserves Word's styles and formatting in an XML name-space that's separate from the one bound to the schema-controlled data.

        If you check off the checkbox "Data Only" then you will lose all formating and your own XSD will be used to map this document into XML data.

        WordML looks like a XML'ified RTF language. It would be trival to create an XSL stylesheet that transforms WordML into HTML/CSS with all formating (that HTML is capable of) which directly mimics MS Word. OpenOffice could also eat WordML quite easily and have all the formating/style of Word.

        What the authors of this article are REALLY bithing at, is the fact that MS didn't buy into the OpenOffice Document Specification from OASIS. MS prolly sees OASIS as the US sees the UN. Defunct, not needed.

        If you describe your data using XML semantics, and all it takes to convert from semantic style A to B is some XSL, then who cares about forcing everyone to use one specific format.

        -malakai
  • sometimes.. (Score:5, Interesting)

    by siphoncolder (533004) on Thursday March 13 2003, @12:08PM (#5504076) Homepage
    I wonder if michael is testing us for stupidity, literacy, and actual technical knowledge of the issues.

    1) Take MS, make a report that says they did something bad, watch how many people flock to bash them DESPITE THE FACTS PRESENTED IN THE ARTICLE, which leads me to:

    2) How many people read the article? And of those people who DID, :

    3) How many of them know that XML is supposed to be a divorce of data from presentation? Why this comes as a shock to people is obvious - they didn't know that.

    The poster above who said "style sheets" - bravo. You couldn't have made a better point with two words.

  • by Anonymous Coward on Thursday March 13 2003, @12:16PM (#5504167)
    I have Office 2003 Beta 2 freshly downloaded from MSDN. This article is completely wrong. I did the following:

    1. Opened a heavily formated .DOC Word document with tables, multiple fonts, etc.
    2. Saved the document as XML.
    3. Opened up the XML document in Word and it looks EXACTLY like the original .DOC format.

    I also opened the XML file in a text editor and sure enough it contains complete formatting information.
  • by malakai (136531) on Thursday March 13 2003, @12:17PM (#5504178) Journal
    The point of the Office 2003 "Save as XML" with the "Data Only" checkbox is _NOT_ a poor mans Save As XHTML. It's decide to allow the data of the document and pet placed into an XML document based on a schema. You literally can make your own schema file/XSD, and use a tool inside Word to map the elements of a Word document to elements of the schema. If you simply map a paragraph to a string you will lose formating. Unless of course you define in your schema how you'd like to store formating information. But that is generally an overkill.

    Think of a resume. you could define an XSD for a resume, and be able to save resumes against this XSD, as validated pure XML.

    Now, if you want to produce a document, using an XML syntax but want to combine both data and presentation, then you want WordML.

    WordML uses Word's own tags to markup the word document. I was going to show you an example of WordML but i don't feel like escaping allt he greater-than/less-than signs. Anyhow, WordML contains all the formating and everything necessary to display a Word document as it is supposed to look.

    I think this Open Office guy is looking for a devil in Office 11 that isn't there. That or he didn't read the friggin manual.

    -Malakai
  • Wait a minute... (Score:5, Insightful)

    by sheldon (2322) on Thursday March 13 2003, @12:20PM (#5504216)
    "has been so seriously crippled as to be useless to anyone but the big content management and collaboration system providers."

    That indicates to me that the problem is really that the document format is so complicated that it takes tremendous resources to understand and implement compatibility with it, as this implies that larger companies like say a Xerox will have no problem producing tools to work with it.

    So from a business consumer perspective this is still a tremendous win.

    This sounds like more whining from the open source crowd.
  • by Daimaou (97573) on Thursday March 13 2003, @12:32PM (#5504330)
    Proprietary document formats were fine at one point. Most people shared documents via printed paper, or shared them via "soft copy" within their own organizations. However, the time for printed documents and interoffice "soft copies" is over. We need the ability to share documents with the world in an easy to use, feature rich, and easy to edit format. Since a significant part of a document's legibility is in its style and formatting (or at least people are more apt to read a well formatted document over one which is not) text files are out.

    Once an easy to use, open document format is created, and the ability to read and write those documents is built into many programs, I think we will see an end of .DOC file attachements.

    While there are currently some "open" formats like PDF and PS, the problem is that they are not easy to create for the average user, nor are they easy to edit. While PDF may be a good format, we need something better.

    XML is a logical choice as a base for an open format because it is a well defined standard, it is text based, and is quite easy to parse.

    But I ramble.
  • by Trailer Trash (60756) on Thursday March 13 2003, @12:43PM (#5504447) Homepage

    Internet World is reporting that initial reports from Office 2003 beta testers don't look good for those hoping to share documents with non-MS systems using the XML file format...

    That's because XML is not a file format, it is instead a format for file formats. To quote the O'Reilly "Learning XML" book, page 2:

    Note that despite its name, XML is not itself a markup language: it's a set of rules for building markup languages.

    I've said this many times on /. (look at my history), but the fact that a particular format is XML-based says nothing of your ability to read it. I'm even going beyond the fact that Microsoft could simply stick their traditional file formats into a CDATA and claim XML compliancy.

    The statement "If Microsoft used a standard XML format for their documents then anyone could read them" makes as much sense as an equally stupid statement like "If Microsoft just used 8-bit bytes in their file formats then anyone could read them".

    Sorry to rant, but the level of cluelessness around XML is astounding. Please read up, there's a ton of useful information on XML around the internet.

    MDC

  • by PeekabooCaribou (544905) <slashdot@bwerp.net> on Thursday March 13 2003, @01:50PM (#5505079) Homepage Journal
    I realize this is redundant by now, but I think this is important enough to warrant a few duplicated posts. For Microsoft's XML format to be useful (and even worth implementing), it's going to require some advantages above and beyond what plain text formatting offers. The only completely useless XML format would be:
    <document>
    This is my document.
    Second paragraph.
    </document>
    I make the assumption that at least some tags are applied, such as some sort of paragraph tags and the like. I may be going out on a limb here, but I would even assume that their final XML format will produce documents identical to .doc files. I would also assume that I could pass this file off to Joe in marketing, and he would see a document identical to the one I saw. What I'm getting at is that style has to be held somewhere. If the XML file has no style associated with it, then congratulations, Microsoft, you did it right. But if Word can display the right formatting, then so can anyone else. (Assuming Word doesn't store the styles in a proprietary format, which I don't think is beyond them.) But why am I even writing this? From the article:
    However, Mark McWilliams, a software engineer and Office 2003 beta tester, said he has seen nothing to indicate that Office 2003 removes formatting information from files saved in .xml. He noted that he opened a heavily formatted .doc Microsoft Word file, saved the file as XML, and later opened the file in Word 2003, "The opened XML document looks exactly like the original .doc file," he said. "And if I open up the XML file in a text editor, I can see that all of the formatting is properly maintained in the XML file."
    Time will tell.
    • by 4/3PI*R^3 (102276) on Thursday March 13 2003, @12:00PM (#5503994)
      Because if they incorporate DRM into their proprietary format then the DOJ gave them a "blank check" to not release any information about the service thus they are in perfect compliance with the letter of the settlement.

      http://www.usdoj.gov/atr/cases/f200400/200457.htm [usdoj.gov]

      J. No provision of this Final Judgment shall: Require Microsoft to document, disclose or license to third parties: (a)portions of APIs or Documentation or portions or layers of Communications Protocols the disclosure of which would compromise the security of a particular installation or group of installations of anti-piracy, anti-virus, software licensing, digital rights management, encryption or authentication systems, including without limitation, keys, authorization tokens or enforcement criteria; or (b)any API, interface or other information related to any Microsoft product if lawfully directed not to do so by a governmental agency of competent jurisdiction.

      Microsoft is not locking up your information they are simply stripping out the formatting and layout in the XML file format.

    • Re:Duh. (Score:5, Insightful)

      by t0ny (590331) on Thursday March 13 2003, @12:18PM (#5504184)
      How do you figure this is anti-trust? This is simply a company who has the dominant product protecting their lead. And quite honestly, I dont see anything wrong with that, as long as they confine their practices to their product (ie. they arent making Office the only suite that can run on windows)

      Have you ever played a game like Civilization or Alpha Centari? You would be amazed at how much those games make you understand politics. Once you are in the lead, you do anything you can to protect that lead. And why would you expect the real world to be any different?

      But this isnt a game, this is business. And since businesses are SUPPOSED to make money, they need to make sure people continue to buy MS Office. And making an office suite that shares documents with all the various third-tier office suites just doesnt do that. Why should my company buy MS Office if the documents it produces are exactly the same as those of FreeBeerOffice? Now, if FBO cannot do things MSO can do, then there is an incentive...

      • by blahlemon (638963) on Thursday March 13 2003, @12:47PM (#5504481)
        Truth be told the real disadvantage to this being the real world vs. a game is I can't set the level of difficulty to my liking...nor can I stop and speed up time.

        Or spy on other people from a God perspective. Damn you! Now I'll have to spend the rest of my day realizing how pathically small my scope is...

      • Duh. (Score:5, Insightful)

        by Tony-A (29931) on Thursday March 13 2003, @02:58PM (#5505758)
        How do you figure this is anti-trust? Microsoft has been judged a monopolist. Since past behavior is a good indicator of future behavior, there is a presumption that this is anti-competitive behavior until proven otherwise.

        This is simply a company who has the dominant product protecting their lead.
        For a monopolist, nothing is simply any more. In the absense of market forces to correct misbehavior, exactly how they attempt to protect their lead does matter.

        And quite honestly, I dont see anything wrong with that, as long as they confine their practices to their product (ie. they arent making Office the only suite that can run on windows) [emphasis added]
        As long as nothing in the Office Suite promotes the Desktop OS monopoly.
        As long as nothing in the Desktop OS monopoly promotes their own Office Suite.

        But this isnt a game, this is business.
        And screwing your customers is bad business.
        And screwing your suppliers is bad business.
        And screwing your investors is bad business.
        And screwing your employees is bad business.
        Even screwing your competitors is bad business.

        And since businesses are SUPPOSED to make money, they need to make sure people continue to buy MS Office.
        And General Motors needs to make sure people continue to buy Chevrolets.

        And making an office suite that shares documents with all the various third-tier office suites just doesnt do that.
        It just makes incomprehensible gibberish unless the recipient happens to have the exact same sooper-dooper magic decoder ring. Unless I can read my stuff, under circumstances of my own choosing, I have a problem. Unless I can send stuff to my correspondents and they can read it un circumstances of their own choosing, I have a problem. If my documents are hostage to the whims of a supplier, I have a problem.

        Why should my company buy MS Office if the documents it produces are exactly the same as those of FreeBeerOffice?
        New twist on Clippy?
        No reason they should. That's Microsoft's problem, not yours or your company's (unless you work for Microsoft;)

    • WordML (Score:5, Informative)

      by malakai (136531) on Thursday March 13 2003, @12:43PM (#5504442) Journal
      If you "Save as XML" in Office 11, then by default the data is saved as WordML. WordML is an xml version of MS internal storage format (basically RTF). OpenOffice could quite easily write an interpreter for WordML. Hell, I could write an WYSIWYG editor for WordML in a day. If that. It's pretty simple if you understand the basics of RTF.

      It's only when you Save as XML with the "Data Only" checkbox that you get into striping formating (and rightly so). Word WARNS you about this. In addition, you can specify your own XSD to save to. And word will VALIDATE this for. Not to mention, you can use a word tool to map elements of Word documents to elements of your schema. DAMN COOL.

      In addition (As if that isn't enough) when you save, in either way, you have the option of specifiying a XSL style sheet. It'll go ahead and transform the output for you as part of the save.

      Then only thing the OpenOffice people are upset about is that MS didn't buy into the OASIS/OpenOffice Document Specification. Tough shit. I'll write them an XSL that'll work again WordML to solve that for them. Lazy bastards.

      -malakai
    • Oops, i forgot to set the reply to "Code". Please note, your SAX parser probably wont be able to parse this, heh. It is however, theoretically proper XML.

      <?xml version="1.0" standalone="yes" encoding="en">
      <!DOCTYPE worddoc [
      <!ELEMENT document (document_properties, document_section)>
      <!ELEMENT document_properties (title, author, organization, department, job, generalsummary)>
      <!ELEMENT title (#PCDATA)>
      <!ELEMENT author (#PCDATA)>
      <!ELEMENT organization (#PCDATA)>
      <!ELEMENT department (#PCDATA)>
      <!ELEMENT job (#PCDATA)>
      <!ELEMENT generalsummary (#PCDATA)>
      <!ELEMENT document_section (sectionsummary, proprietarybinary, unenhancedcrappytext)>
      <!ELEMENT sectionsummary (#PCDATA)>
      <!ELEMENT proprietarybinary (#PCDATA)>
      <!ELEMENT unenhancedcrappytext (#PCDATA)>
      ]>
      <document>
      <document_properties>
      <title>Crappydoc</title>
      <author>William H. Gates III</title>
      <organization>BORG</organization>
      <department>Unimatrix 0</department>
      <job>Secondary information processing adjunct</job>
      <generalsummary>Doc about crappy M$ things.</generalsummary>
      </document_properties>
      <document_section>
      <sectionsummary>Haha, you cant parse this and make it look perty, it's BINARY! You're still screwed!</sectionsummary>
      <proprietarybinary>firoiorfioeiojvonvonviniooiwnco ncooisoi39f940f9439 0f904390f94390fj904j90j3f09j4fj3490jf30jf040fj03j0 9fj9340fj043j90fj4903fj9043jfj0vjoirejvoojvoerjgoe jgojerogjoejoenmvotnhnoignoengotnhinringuinfi</pro prietarybinary>
      <unenhancedcrappytext>Hehe, doesnt this text just look ugly? I bet it does, if you arent using M$ WORD!</unenhancedcrappytext>
      </document_section>
      </document>