Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Microsoft

Tim Bray on Microsoft Office 589

jgeelan writes "The co-inventor of XML, Tim Bray, has been talking about the newly XML-enabled version of Microsoft Office, code-named 'Office 11' and tells XML-Journal that 'when the huge universe of MS Office documents becomes available for processing by any programmer with a Perl script and a bit of intelligence, all sorts of wonderful new things can be invented that you and I can't imagine.'"
This discussion has been archived. No new comments can be posted.

Tim Bray on Microsoft Office

Comments Filter:
  • However... (Score:4, Insightful)

    by kubrick ( 27291 ) on Thursday October 24, 2002 @04:57AM (#4520217)
    Knowing Microsoft, they don't like to give up that format lock-in. They'll find some way to make MSXML difficult or impossible to access, assuming that they haven't already.... encrypted data or something like that.

  • by robbyjo ( 315601 ) on Thursday October 24, 2002 @04:57AM (#4520218) Homepage

    .... I guess it's just MSXML rather than THE standard XML. But we can figure it out with some "intelligent guesswork" now because the file would be human-readable.

  • by leandrod ( 17766 ) <{gro.sartud} {ta} {l}> on Thursday October 24, 2002 @04:59AM (#4520225) Homepage Journal

    The most important question, besides if the MS Word XML format will be well-documented enough, is if it will be the default saving format. Most MS Office users simply don't care enough to save MS Word documents in RTF, for example, even if it's more than good enough for the vast majority of the documents.

    Not the main issue on the article, but it is unfair to single someone as the inventor of XML, which is just a streamlined version of SGML which is an evolution from IBM's GML.

  • I doubt it. (Score:3, Insightful)

    by theLOUDroom ( 556455 ) on Thursday October 24, 2002 @05:00AM (#4520226)
    I really have my doubts about wether Microsoft will allow "any programmer with a Perl script and a bit of intelligence" to muck around with Office documents.
    I'm guessing their XML document format will be just as hard to decyper and the current office formats.
  • by terminal.dk ( 102718 ) on Thursday October 24, 2002 @05:03AM (#4520238) Homepage
    MS is trying to time this right.

    Right now they are seeing diminishing sales, possible shrinking market share. Most of the danish public sector is looking to save money using OpenOffice/StarOffice.

    MS needs to increase their compatibility with other options, as they would otherwise force customers to convert every single user away from MS at once, instead of OpenOffice coming in slowly.

    They can also hope, that their format is setting the standard, and the other companies will have to play catch-up rather than the other way around.
  • by OrangeSpyderMan ( 589635 ) on Thursday October 24, 2002 @05:05AM (#4520243)
    Wow, I was way off when I predicted that Microsoft would further obfuscate their Word format.

    They won't have to. Since they are going the SQL server way for their filesystem, they can happily give away the hold they have on file formats, since they are going to have a stranglehold on accessing those files. You want an open file system? Here you go (and MS has a lot to gain by doing this - they instantly give Word access to most other data formats) - but don't think anything other than a microsoft OS will actually be able to access the files - thanks to our new deliciously obfuscated method of storing data on a disk. Reverse engineering kernel level SQL data (how a bit of crypto, for DRM of course, thrown in) will probably be even harder than reverse engineering file formats was. And impossible to do legally (say hi to all those DMCA guys out there.)
  • Re:WTF???? (Score:4, Insightful)

    by lovebyte ( 81275 ) <lovebyte2000@gm[ ].com ['ail' in gap]> on Thursday October 24, 2002 @05:10AM (#4520262) Homepage
    Have you ever seen some complex XML file? Without documentation it could be as difficult as binary to reverse-engineer!
  • by Baki ( 72515 ) on Thursday October 24, 2002 @05:14AM (#4520276)
    Just because the file format, instead of binary, is "human readable", does not make it more open.

    For "any programmer with a Perl script and a bit of intelligence" it doesn't make a difference if you read bytes (binary) or XML structures.

    As long as you don't get a DTD with extensive comments on how to interpret the elements, along with some promise/guarantee that the DTD won't change every minor release, there is no real improvement at all.

    The fact that XML is human readable is irrelevant, since no human shall read the files, but programs such as perl scripts shall. For them it makes hardly any difference; it is only marginally easier since you can use an existent XML parser instead of rolling your own (which is no big deal using the right tools such as YACC).

    This 'openness' comes at a good time for Microsoft. They suggest openness in a time that they are criticized and attacked because of file-format lock in. Many 'advisors' shall be mislead, blinded by buzzwords such as XML as they are, and actually believe that this solves the issue.
  • by Masa ( 74401 ) on Thursday October 24, 2002 @05:14AM (#4520277) Journal

    Because it doesn't matter if everyone is able to read, modify and generate Office-compatible files. People will us Office products in future. Opening the file formats doesn't change anything.

    XML makes it easy to create programs that will depend on MS Office. So this only makes it easier to create programs which depend on Microsoft products.

  • by Jeremiah Cornelius ( 137 ) on Thursday October 24, 2002 @05:14AM (#4520278) Homepage Journal
    I don't beleive any of this crap is goingto happen from MS. Not for a New York second.

    You'll be DMCA'd out of the loop for trying, and the format will validate itself with 'Palladium' features in software, or some such.

    However, the mind reels at the idea of managing PowerPoint and Excel files from emacs!

  • by lanalyst ( 221985 ) on Thursday October 24, 2002 @05:16AM (#4520283)

    It seems M$ has done their best over the years to protect their file formats... The implication now is Ballmer's enemy #1 (open office, ximian, koffice, star office, joe's office, etc) will be able to interchange documents seamlessly with M$ Office.

    I don't know about anyone else, but the reason companies hold onto M$ (like grim death) is they receive documents via email in M$ format - defacto proprietary format.

    There has to be an angle here. This can't be construed as a tactic to hold market share.

  • by JaredOfEuropa ( 526365 ) on Thursday October 24, 2002 @05:18AM (#4520292) Journal
    It's just like the old SGML module for Word they used to have about 6 years ago. My guess is that there will be some significant drawback to saving documents in XML, such as loss of some formatting information. That would convince users not to save in the XML format... but that isn't the important thing to Microsoft.

    More significantly, there might be small incompatibilities, or ways that Word-created XML documents divert slightly from what is normal and proper in XML. Perhaps Word will make some (intentional) mistakes when reading back XML files generated in other applications, just like Word's old SGML module would choke on many proper SGML documents.

    Make no mistake: the fact that almost everybody is using Office and the associated file formats makes it very hard for a new contender to enter the office suite market. Microsoft must be aware of the power they have over the market with their Office file formats. Think of it: when you exchange files with other businesses, you have two realistic choices of file formats: Office or plaintext. And now Microsoft is introducing compatibility with an open and well-defined markup langauge, in favour of their proprietary language? I'll believe it when I see it.
  • by pubjames ( 468013 ) on Thursday October 24, 2002 @05:22AM (#4520306)
    Perhaps these announcements of XML compatible office file formats are just stalling tactics? MS has done it before.

    MS now has a serious competitor in StarOffice/OpenOffice.org. And that competitor has two compelling advantages - it's cheaper/free, and open XML file formats. So when clued-up IT people say to their Pointy-Haired Bosses that they should use StarOffice/OpenOffice.org, PHBs can respond "but MS is doing that next year. We can avoid all the disruption of changing office suites just by waiting a bit and upgrading to the next version of MS Office. Besides, we're already paying for it." Then when MS actually releases Office 11, they will have used all sorts of devious and subtle devices to keep their lock-in of the file format, and MS and PHBs will be happy.
  • Re:WTF???? (Score:3, Insightful)

    by Anonymous Coward on Thursday October 24, 2002 @05:28AM (#4520330)
    That really depends on your definition of XML and human readable.

    <?xml version="1.0">
    <document>
    jMyB38QAAMETWFjs7IQAAQEVkJBNq0jEAAW
    RvbGWTYBAADARUaGlzRG9jdW1lbnQ8nhAAC
    udGrTEAAC8BATwAAADMAv8AAgEABABIAAAA
    </document>

    is valid xml, just like a uuencoded file is valid ASCII and human readable.

    But if other M$ products are any indication it won't be that bad. I parsed some Visio stuff and the data was more or less readable. The drawing data (or previews, didn't care) was still encoded though. I expect it to go a little like M$ html did.
  • by mindriot ( 96208 ) on Thursday October 24, 2002 @05:31AM (#4520343)
    Yes, the point of XML files is that their _syntax_ is simple and easily parseable by computers. But that doesn't tell you anything about the _semantics_ of a document. And as long as there is no proper documentation on what the mess of tags in your XML file means, there's hardly any way for you to hack together a Perl script to, say, extract plain text, or convert the Word XML file to an OpenOffice.org XML file, or whatever else comes to mind.
  • by Anonymous Coward on Thursday October 24, 2002 @05:34AM (#4520357)
    Other MS products that use XML (Visual Studio.net, for example) actaully do it quite well. The VS.net generated XML, including project files, is clean and very readable.
  • C'mon People (Score:3, Insightful)

    by BurritoWarrior ( 90481 ) on Thursday October 24, 2002 @05:34AM (#4520358)
    Office's MS-XML will be even less compatible with sthe spec than MS-Kerberos or MS-Java/J++. Office is their cash cow. It brings in 30-40% of their revenues all by itself.

    If you think there is even a remote chance in he-double L that MS will loosen their grip on this revenue stream, I have a bridge to sell you.

    You can call this flamebait if you want, but what in MS's history would lead me to believe they are suddenly going to change their historic behavior pattern AND risk a huge amount of revenue at the same time?
  • Re:wicked :) (Score:3, Insightful)

    by Mnemia ( 218659 ) on Thursday October 24, 2002 @05:40AM (#4520376)
    I doubt it. XML is specifically designed around interoperability, and I don't think MS can charge for use of a standard they don't own. That's why I think that they will break standards compatibility somehow.
  • Yeah, right (Score:3, Insightful)

    by Alex Belits ( 437 ) on Thursday October 24, 2002 @05:41AM (#4520377) Homepage
    XML is a format with nearly infinite possibilities for obfuscation, convolutedness and poorly defined standards. The most we can expect is the possibility to validate a file to absolutely certainly determine if it is compliant with the new Word format or not.
  • by Qbertino ( 265505 ) <moiraNO@SPAMmodparlor.com> on Thursday October 24, 2002 @05:45AM (#4520392)
    I'm working with that weedy Word 2k at the office. And we use Outlook as a standard communication Platform. Believe me, that their Software often is such a pain isn't that much of a greater plan to rule the world, but more the flat-out ineptitude of delivering products with a conceptual consitency.
    Looking at Frontpain and Word HTML and extrapolating XML from that, tells me they're gonna do just a crappy job as usual and really think they've done a great thing.
    Just like the people sending me source code additions and DB content as Wordfiles. Nothing but simple inemptitude, I say.
    Not that my System of choice, Linux, is that much more consistent. Mind you. With a bazillion Font methods, every single one of them looking crappier than the next and QT, GTK+, Motif, Lesstif, Inbetweentif, Swing, TK and whatnot and none of them following the same Clipboard behaviour it's just as weedy. Only it is under *my* control to change it.
    That way, the bottom line is: With OSS if it doesn't work, there's another way. With M$ it's 'Game Over' with the first "Error in module [fill in random hexcode here]".
    That's the simple difference.
  • Re:HTML from Word (Score:5, Insightful)

    by pubjames ( 468013 ) on Thursday October 24, 2002 @05:51AM (#4520403)
    Just look at an HTML file exported form Word2k.

    An excellent point sir. That's a great illustration of how Microsoft approaches 'open' file formats.

    People that think that MS Office is going to move to open, well documented file formats are just plain nuts. But look at many of the comments in this forum - it seems MS has even managed to persuade many Slashdotters that they are going to use open formats. Poor fools.
  • by jsse ( 254124 ) on Thursday October 24, 2002 @05:52AM (#4520407) Homepage Journal
    Don't laugh yet. That's exactly what'd be happening.

    The new document just needs to have their meta-tags comply with XML, the rest could still be obscure junky as show above.
  • by tonywestonuk ( 261622 ) on Thursday October 24, 2002 @05:55AM (#4520416)
    So, what happens when somone want's to email an XML enabled Word document...... Does it somhow become encrypted on its way out of the database, remains scrambled on it's way over the internet, and reassembles itself into nice XML once it arrives on the recepients computer?.... Doesn't sound like XML to me?!
  • Re:WTF???? (Score:4, Insightful)

    by DGolden ( 17848 ) on Thursday October 24, 2002 @05:59AM (#4520424) Homepage Journal
    Here's my pet rant:

    I would say that XML isn't a markup language - a markup language would allow the "bad nesting", since a markup language should be "layers of virtual highlighter pen" applied to an underlying data stream. XML, since it requires "proper nesting", is just Lisp sexps reimplemented, but with terrible syntax. It's yet-another-tree-structured-data-format. Big Wow. A true markup language environment would facilitate part-structured data, like HTML used to be, rather than shoehorning everything into trees.

    Lisp sexps would just say (stuff (things "text"))

    In fact, that's pretty much all there is to lisp syntax right there. The above is (a) a potentially valid lisp program and (b) a valid lisp data structure.

    XML is a data format designed mainly to allow C and Java programmers to use vaguely Lisp-like processing techniques without realising it and/or admitting it to themselves.

  • Re:I doubt it. (Score:5, Insightful)

    by sql*kitten ( 1359 ) on Thursday October 24, 2002 @05:59AM (#4520425)
    I really have my doubts about wether Microsoft will allow "any programmer with a Perl script and a bit of intelligence" to muck around with Office documents.

    Why not? After all, the high-quality ActiveState port of Perl to Win32 exists because Microsoft paid for it, and you can download it for free. Not only that, but if you want to write your own code to manipulate Office documents, you have been able to do that for years in VBA - all the Office programs expose rich APIs. In fact, they are composed of Objects that you can instantiate and use in your own programs if you want - all MS care about is that there is a licensed copy of Office on the user's machine. One of the easiest ways to do charting is to simply reuse a bit of Excel, for example. From there it's a short hop via COM to any program you want.

    I'm guessing their XML document format will be just as hard to decyper and the current office formats.

    The fact that Office documents have been in a proprietary format in the past is actually unimportant, since the interfaces to the applications (and hence their documents) are well documented (check MSDN or Barnes & Noble if you don't believe me). So the reason that Microsoft are doing this is that they lose nothing and gain from making the platform even more attractive to developers.
  • by jgp ( 72888 ) on Thursday October 24, 2002 @06:00AM (#4520428) Homepage Journal

    Have you seen the HTML produced by the current "Save as webpage .." options in Word? shudder. The vast majority of semantics are actually embedded in XML islands hidden inside HTML comments. I see no reason why Microsoft would change their tune now (they'll simply change the DTD from one inappropriate document model to another one IMHO).

    <wordDocument>
    <!-- (document content here) -->
    <nonMicrosoftElement>I'm sorry, you don't appear to have a StandardsEnhanced(tm) word processor.</nonMicrosoftElement>
    </wordDocument> --
  • by Viol8 ( 599362 ) on Thursday October 24, 2002 @06:05AM (#4520437) Homepage
    Yes so its portable. Yes so its (mostly) human readable. So what? So is GWBASIC. XML is just a data description format (I wont grace it by calling it a language , its not) and there have been plenty of portable DDFs in the past. Pdf , postscript (though the latter is actually a language). So why all the hoo-ha about XML? Seems to me that various marketing types have jumped on the bandwagon with this one and are going to ride it till the wheels fall off and take all the suckers along with them.
  • Bigger picture (Score:3, Insightful)

    by Cheese Cracker ( 615402 ) on Thursday October 24, 2002 @06:05AM (#4520438)
    Look at the bigger picture of where Microsoft is heading. They're diversifying their line of business.
    In the past, MS Office was the cash cow at Microsoft, but the market for office packages is rather
    saturated... companies and governments are looking for cheaper alternatives etc. Not much room to
    grow. Now they can afford playing the good guys by opening up their file formats, since they got
    new markets to capture... mobile phones, handheld computers, home entertainment etc.
  • by Baki ( 72515 ) on Thursday October 24, 2002 @06:11AM (#4520453)
    One big difference: SVG was designed and is intended to be open and understandable. Office formats, using XML or not, are not. I do not believe MSFT would voluntarily cease their lock-in strategy.

    XML may be easier to reverse engineer, but must not be, this depending on how complex the DTD/Schema is and if the designer intended it to be easily understandable or not. Apart from that, as a purist I don't like reverse engineering, especially not if the subject of reverse engineering is from an uncooperative company known for its dirty tricks.

    A non XML grammar/syntax, if accompandied by a decent and documented EBNF description of it's grammar, is much better to base your program on than an undocumented XML.
  • by Anonymous Coward on Thursday October 24, 2002 @06:12AM (#4520457)
    The article give Tim Bray XML "co-inventor" status. Come on. Ever since HTML was around people have been extending it with fake tags like , , etc. Sure XML is useful but hardly an invention.
  • by thelen ( 208445 ) on Thursday October 24, 2002 @06:13AM (#4520461) Homepage

    Okay, so it'll be harder to mount a windows partition effectively, but this doesn't affect transmission of documents, especially if they're stored in an XML format. As for me, I think it's more valuable to have files that I can read outside of their native filesystem rather than have a readable filesystem filled with unreadable files.

  • by passthecrackpipe ( 598773 ) <passthecrackpipe AT hotmail DOT com> on Thursday October 24, 2002 @06:20AM (#4520474)
    No you were not. MS routinely uses XML to encapsulate (proprietary) binary data. In the case of the MSOffice file format, this is especially true, but to a lesser extent this also goes for stuff like BizTalk etc (that has a terrible license attached to it). If Ms is *really* serious about using open formats, and using XML in their Office suite, they should put their money where their mouth is and join in the OpenOffice File format project [openoffice.org]. Most of the opensource players are working their already, and the EU is also set to join. I assure you that mature participation of Microsoft would be very welcome.

    Of course, this will never happen. Instead, MS will continue to push their own "open" XML based file formats. Microsoft Kerberos, anyone?

  • Re:WTF???? (Score:5, Insightful)

    by vidarh ( 309115 ) <vidar@hokstad.com> on Thursday October 24, 2002 @06:25AM (#4520483) Homepage Journal
    The point of XML is not for it to be human readable, but to allow easy automatic processing of various kinds.

    With XML Schema and DTD's, you can validate various aspects of the data without writing a custome validator.

    With XPath and XPointer you can refer to parts of an XML document without needing to understand what the document contains.

    With XSL you can translate all or parts of the document from one format to the other without your application needing to know the structure, and without needing to understand more of the format than the parts you are extracting.

    With SAX and the DOM you can programmatically traverse and extract information from an XML file without having to write a custom parser.

    With CSS an editor or viewer for instance can use a standard mechanism of applying styles to elements without hardcoding the style attributes for elements anywhere.

    With XML namespaces, you can intersperse data in various formats in the same file, and the components handling each of the vocabularies need not know anything about the other components - an example would be embedding SVG in HTML: The HTML renderer doesn't need to understand any of the SVG tags, only that it should delegate contents with other namespaces to another component. And the SVG renderer couldn't care less about the HTML.

    And this doesn't even touch on the benefits of all the various interchange formats that have been specified on top of these base technologies.

    The importance of XML is that it opens up the doors for building interchangable components that operate on data without needing any hardcoded application specific knowledge of the data.

    Most of the time, you still have to write some code to tie it all together, but you don't have to build your own parsers, your own document object model, your own styling system, your own way of handling contained data of other types, your own way of transforming data between formats, etc.

    For me as a software developer XML delivered years ago. I use XML technologies daily, and it saves me work.

  • by bokmann ( 323771 ) on Thursday October 24, 2002 @06:28AM (#4520494) Homepage
    Except I will look to xml.openoffice.org to write some xslt transformations to take Microsoft office documents and liberate them once and for all.

    Once I can move my team of 20 people to open office with no real worries or complaints about 'interchanging' files with lusers still using Microsoft, I will.

    BUT, have you ever looked at an HTML file generated by Microsoft word? It is a GREAT example of how they can pollute a standard into something unreadable.

    I suspect that they will copyright or otherwise lock up their DTD/Schema, and try to lash out at anyone that uses them in other than 'approved' ways.

  • Re:I doubt it. (Score:3, Insightful)

    by Bartmoss ( 16109 ) on Thursday October 24, 2002 @06:37AM (#4520508) Homepage Journal
    I think you are dead on. Plus: a) XML is a great buzzword; b) it makes MS *seem* more "open" and "standards compliant".
  • Political XML (Score:2, Insightful)

    by Wheelie_boy ( 26751 ) on Thursday October 24, 2002 @06:38AM (#4520517)
    Looks like M$ has found a way to placate those various governments that are beginning to insist on open file formats for data storage.
  • Re:WTF???? (Score:3, Insightful)

    by ianezz ( 31449 ) on Thursday October 24, 2002 @06:43AM (#4520529) Homepage
    WTF!? XML shouldn't need to be documented. The whole point is to create a human readable file that is parseble by computer. If MS Word delivers an XML file that I can't figure out, it's not XML.

    Uhm, it is also the point of source files in the programming language of your choice, I'd say... and still, you need good comments.

    XML is like Lisp, but with sharp parenthesis.

  • Re:I doubt it. (Score:2, Insightful)

    by Anonymous Coward on Thursday October 24, 2002 @06:50AM (#4520544)
    "After all, the high-quality ActiveState port of Perl to Win32 exists because Microsoft paid for it"

    That port existed well before the MS involvement in ActiveState.

    Here's the original story on Microsoft's role:

    "6/2/1999 -- Microsoft Corp. and ActiveState Tool Corp. (www.activestate.com) signed a three-year Perl Open Source development and support contract.

    As part of the agreement, ActiveState will add features previously missing from Windows ports of Perl, as well as full support for Unicode on Windows platforms."

    Source: http://www.entmag.com/news/article.asp?EditorialsI D=1633

    ActiveState has similar partnerships with many others: http://www.activestate.com/Corporate/Partnerships/
  • by Anonymous Coward on Thursday October 24, 2002 @06:51AM (#4520546)
    No, and 8bit binary files .DOC files don't just become scrambled either.

    XML don't make things easier to parse, you still have to figure out what means, just as you would have to figure out 04 07 in a binary file.
  • by Anonymous Coward on Thursday October 24, 2002 @07:11AM (#4520594)
    it doesn't matter if everyone is able to read, modify and generate Office-compatible files.

    For many businesses, the ONLY thing keeping them using MS is file compatability. They can't change because it's industry standard, and they need to be able to share docs with their suppliers and customers.
  • by Qrlx ( 258924 ) on Thursday October 24, 2002 @07:21AM (#4520622) Homepage Journal
    Think of it: when you exchange files with other businesses, you have two realistic choices of file formats: Office or plaintext

    I think PDF is a viable (growing even) third option. Adobe is "evil" just like MS (remeber Sklyarov)... regardless, PDF is nice and it works well, and the files are way smaller than word docs.
  • by pubjames ( 468013 ) on Thursday October 24, 2002 @07:34AM (#4520667)
    The ISO tried it. It was called ODA
    and was a complete failure.


    So? Formats come and go all the time. Just because the ISO failed in the early nineties doesn't mean someone else would fail today.
  • Re:MOD PARENT UP (Score:5, Insightful)

    by gazbo ( 517111 ) on Thursday October 24, 2002 @07:37AM (#4520676)
    Slight difference this time. Now, we actually have a beta release, seen by independent parties (such as Tim Bray) who have praised the file format. As I said, unless you seriously think they will completely change the file format between beta and release...

    Now, whether they can license the format so as to make it illegal for other apps to use it, I don't know. However, I suspect this is not the case as it more or less removes the advantage to having invested in XML in the first place. Well, sure there's good publicity, but how long would that last when people immediately discover it is worthless?

    And of course, the vast majority of people don't care about file formats. The only people to whom this news is of interest are those who will want to either access Office docs themselves, or use other apps (e.g. Open Office) to view Office docs. If this sector are banned from doing this, why did MS spend so much money on using XML in the first place?

  • by nmg196 ( 184961 ) on Thursday October 24, 2002 @07:50AM (#4520720)
    Microsoft is switching from a proprietary file format, to XML, and the first 100 comments are all flaming MS. WTF does it take to make you people happy?

    They've already shown with .NET that they can make an entire programming framework (and at least 3 assocated languages) into an open standard and even have them ratified by the ECMA and maybe even ISO. Because of this people have already managed to port Perl, Python and many other languages to this framework before it even came out of beta! The guys at Ximian [ximian.com] have even managed to port quite a bit of the framework itself as part of the Mono Project [go-mono.com].

    So perhaps instead of perpetually slating Microsoft, you could get off your arse and do something useful instead.

    Nick...
  • Re:I doubt it. (Score:5, Insightful)

    by khuber ( 5664 ) on Thursday October 24, 2002 @08:05AM (#4520791)
    The fact that Office documents have been in a proprietary format in the past is actually unimportant, since the interfaces to the applications (and hence their documents) are well documented

    So you can read Office documents with other programs as long as you have Office and MS dev tools?

    You do see the folly in that, right?

    -Kevin

  • Re:umm (Score:2, Insightful)

    by Arimus ( 198136 ) on Thursday October 24, 2002 @08:34AM (#4520995)
    As he is one of the people responsible for XML and Office 11 is going to be using XML as its native file format have you spotted the link (hint think of three letters...)

    That aside, if MS do adopt XML as their file format AND they don't screw the way the HTML formatted output did then it is about time, and I would imagine that the people who came with XML are going to be happy to see their work being put to good use.

  • Export to HTML... (Score:2, Insightful)

    by Anonymous Coward on Thursday October 24, 2002 @08:46AM (#4521078)
    ...to see where they're going with this. Word has been exporting to HTML, which is really some funky XML/XHTML with stylesheets that IE can read and display, for a while.
  • by Perl-Pusher ( 555592 ) on Thursday October 24, 2002 @08:55AM (#4521159)
    There is also the fact that microsoft loves to put stuff in their Eula. I can also imagine anyone producing a reader for the "encrypted XML" running afoul of the DCMA.

    "Doesn't sound like XML to me?!"

    Sure it is! It's XML with Microsoft Security Extensions!

  • by PackMan97 ( 244419 ) on Thursday October 24, 2002 @08:57AM (#4521182)
    Sure, IBM lost control of the PC market...but is that better than what's happened to Apple?

    Let's go back in time to 1985 and you can choose which company to invest in...IBM or Apple. Hmmm...tough choice isn't it? Their stocks have both appreciated almost the same amount since then! Shocking isn't it.
  • by Anonymous Coward on Thursday October 24, 2002 @09:23AM (#4521388)
    XML does not mean an open format. You can invent tags that mean things only to you, and you can wrap an existing binary docuement in an XML file.

    Why not a tag like
    <902358r9838239hjfs98>Data</902358r9838239hjfs98>?
  • by GT_Alias ( 551463 ) on Thursday October 24, 2002 @09:24AM (#4521391)
    I need to know that I can open and access every single one of those without problems

    Interesting point...when people start buying Office 11 and sending you those XML-saved Word documents, you will have no option but to go out and fork over some cash for an upgrade.

    Unlike now, I can send an Office XP formatted Word document and older versions can still open it. Of course...older versions can't open newer databases, that's been a wonderful source of headaches.

  • Re:MOD PARENT UP (Score:4, Insightful)

    by abe ferlman ( 205607 ) <bgtrio@@@yahoo...com> on Thursday October 24, 2002 @09:53AM (#4521604) Homepage Journal
    If this sector are banned from doing this, why did MS spend so much money on using XML in the first place?

    You might similarly ask, "If MS didn't intend to comply with web standards, why did they spend so much on Internet Explorer"

    Please tell me I don't have to spell the answer out for you.

  • Re:MOD PARENT UP (Score:3, Insightful)

    by Milican ( 58140 ) on Thursday October 24, 2002 @10:06AM (#4521727) Journal
    True, they could do that. Perhaps one day the Supreme Court will strike down this rstricting move in their EULA. However, in the meantime the following could be done:

    1. Implement an interpreter module using the BSD license. Allow it to export to another XML model. Lets call this output deMStify.xml
    2. Use a GPL model to read deMStify.xml and do whatever the hell it wants
    3. ??? Profit... hehe

    JOhn
  • Re:I doubt it. (Score:2, Insightful)

    by buzzcutbuddha ( 113929 ) <maurice-slashdot&mauricereeves,com> on Thursday October 24, 2002 @10:06AM (#4521728) Homepage
    Oh I get it! We're beating on Microsoft for not opening up it's file formats earlier because WordPerfect and Lotus products are so much more open...oh wait....
  • office HTML (Score:2, Insightful)

    by avandesande ( 143899 ) on Thursday October 24, 2002 @10:14AM (#4521797) Journal
    Anyone looked at the HTML output from an office program? It's terrrible. Do you think their xml will look any better?
  • Hmmm.... (Score:2, Insightful)

    by BuffJoe ( 307408 ) on Thursday October 24, 2002 @10:19AM (#4521828) Homepage
    I have a feeling that Microsoft "XML" will use Microsoft "Unicode." That is, any character in the range of 0x82 to 0x95, which Unicode reserves for extra control characters, will be littered with "smart" quotes, emdashes, and other proprietary extensions to Unicode that ensure that nothing works with it. I ran into this problem when I tried converting FrontPage generated HTML into XHTML so I could do conversions with XSLT. Needless to say, it took a lot of effort, even with HTML Tidy [sourceforge.net], to get Microsoft's generated HTML to get converted into XHTML! HTML Tidy constantly complained about the HTML, and looking at what FrontPage generates, it's not hard to see why it complained.

    I ran across the demoroniser [fourmilab.ch], which fixes Microsoft Unicode problems, but it still doesn't fix the invalid HTML that FrontPage generates.

    Microsoft XML? Hah! I'll believe it when I see it.
  • by 4of12 ( 97621 ) on Thursday October 24, 2002 @10:47AM (#4522069) Homepage Journal

    MS Office saving its data in XML format is a great start.

    But will this really be enough?

    Previous complaints about how versions of Office didn't disclose the format were often referred to a specification that Microsoft made available to describe what was in a Word document.

    The key problem, IIRC, was the the description was not sufficient for one to predict how the Word document was actually formatted and rendered on the page.

    Because XML is very much like SGML or TeX, it has the potential for much more exhaustively describing document structure. But whether the new Word XML format (or OpenOffice format, for that matter) contains sufficient information for developers to reproduce the "right" format is a different issue.

    I hope I'm wrong and that the format is specified comparably to the level you'd find in say PostScript or PDF.

    Maybe MS is willing to let rendered Office douments change, just as HTML rendered documents change whenever one resizes the browser window.

    But I doubt it.

  • by Anonymous Coward on Thursday October 24, 2002 @11:52AM (#4522581)
    First of all, why would Microsoft bother creating an XML file format if it was just an "encapsulation of binary, proprietary, encrypted file formats"? What would be the point? A PR move to say that they use XML? Having not seen what Office 2002 generates in the way of XML, I can't really say how obfuscated it is or isn't; however, I can't think of any reason to adopt an XML format if it wasn't at least a little more open then the binary file formats they've been using.

    Also, how would a "binary, proprietary, encrypted file format" fit into everything else Microsoft is doing with .NET? Wouldn't Microsoft want the content of a document to be open enough so that it could be read and processed by applications using .NET's XML libraries? If they're going to sell the whole .NET XML concept, it would be a big advantage to say that you can process documents generated by the Office suite.

    Explain to me why Microsoft would want to prevent you from sending your self-generated Word documents to another computer? What possible sense does this make? Is it because they hate their customers and want to piss them off so they won't use Microsoft products any more? Has RedHat paid Microsoft to include technology that will piss off all Windows users?

    The whole point of Palladium is that the content provider chooses how the content can be distributed. Microsoft has no interest in protecting documents you've generated yourself. Palladium in and of itself doesn't do anything; it's the content providers (audio, video, software, and hardware providers) that will make the thing fail if they make the content controls too restrictive. I don't have a problem with the content industry trying it out as long as they're up front about any restrictions on the content itself. However, I do have a problem with making it illegal to reverse engineer and break encryption, but that's a different story.

    They've got their faults, but in the end both the content companies and Microsoft are businesses. They've got to respond to their customers (us), otherwise we'll go elsewhere. It's free as in country.

  • by FatherOfONe ( 515801 ) on Thursday October 24, 2002 @12:12PM (#4522811)
    Not that I totally disagree with your point, but with ".Net" people will be discouraged, or it will be far more difficult to send the actual document. My guess is that some future version of Office will default to "Send the shortcut".

    Now they of course will change Office for the Mac to read from those servers... The data WILL be stored in XML on those servers, so coders will have an easy time with it.

    You bring up an interesting point about paranoid people and Microsoft. I have followed Microsoft fairly closely over the last ~18 years and feel comfortable saying that they have never worked with any "standard" out there. They have ALLWAYS developed their own. Can you name an example of any "standard" software technology they have adopted and not changed? A perfect example of this would be ZIP. Why doesn't Microsoft use it instead of CAB files? There are many many more I could use as examples if you would like.

    Microsoft has an internal saying "If it is not ours destroy it".

    My point is this. A company that has for 18 years been trying to lock people in to their technology, will cause some people to be a bit paranoid.

  • by sketerpot ( 454020 ) <sketerpot&gmail,com> on Thursday October 24, 2002 @12:18PM (#4522855)
    Sure it is! It's XML with Microsoft Security Extensions!

    That reminds me of something that MS has been doing for quite a while now: the file type reported for any HTML files is "Microsoft HTML file" (your system may vary). Will XML become Microsoft XML? I hope not.

    If everything about this really is kosher, though, then everybody give a great big "Thank You!" to MS!

  • by spitzak ( 4019 ) on Thursday October 24, 2002 @12:27PM (#4522957) Homepage
    why would Microsoft bother creating an XML file format if it was just an "encapsulation of binary, proprietary, encrypted
    file formats"? What would be the point? A PR move to say that they use XML?


    YES! Now you are starting to get it!

    I can't think of any reason to adopt an XML format if it wasn't at
    least a little more open then the binary file formats they've been using.


    How about for a "PR move to say they use XML". In addition it is obvious how to make an XML that is exactly as obscure, by putting the entire contents of the old format into a binary block.

    Also, how would a "binary, proprietary, encrypted file format" fit into everything else Microsoft is doing with .NET? Wouldn't Microsoft
    want the content of a document to be open enough so that it could be read and processed by applications using .NET's XML libraries?


    No, of course not. You would only read Word documents with the special "read a Word document" interface. It might use the XML libraries underneath, but big deal. Be assurred you will be unable to reconstruct all the contents of the document by any kind of perverted arrangement of calls to the "read a Word document interface". (though not just a complaint abount MicroSoft, I think .NET, DCOM, CORBA, KCOP, etc all pervert the idea of "object orientation" by making elaborate communcation protocols which are only "object oriented" because they call some part of the protocol an "object". Real object-orientation means there is some commonality of functionality, and the only instances I can think of that really work are the original Unix where everything known then (terminals, printers, tapes, disks) used the same read/write/seek calls, and Plan9 which tries to extend this to networks and file systems).

    Explain to me why Microsoft would want to prevent you from sending your self-generated Word documents to another computer? What possible sense does this make? Is it because they hate their customers and want to piss them off so they won't use Microsoft products any more? Has RedHat paid Microsoft to include technology that will piss off all Windows users?

    Ha ha, very funny. Of course you will be able to send a Word document to another computer. It will still be an unreadable Word document. If they can obfuscate things so that the destination computer also has to be running Windows, all the better. You seem to be under the weird delusion that "other computer" meant "other computer running Windows" when in fact I'm sure every other poster here knew it meant the exact opposite, ie "other computer not controlled by MicroSoft".

  • by Anonymous Coward on Thursday October 24, 2002 @12:42PM (#4523069)
    No, of course not. You would only read Word documents with the special "read a Word document" interface. It might use the XML libraries underneath, but big deal. Be assurred you will be unable to reconstruct all the contents of the document by any kind of perverted arrangement of calls to the "read a Word document interface".

    I think you are getting near the point.

    A big problem with MS Office right now is that the file formats are such a mess that Nobody can parse the documents without MS Office, and that includes Microsoft.

    If MS wants to get into the content/groupware market, they NEED server-side processing that doesn't rely on running a single-threaded 15MB WINWORD.EXE process.

    Using an XML format allows Microsoft to build a clean C# component implementation of "Read a Word Document", or a "Save SQL Server Data As Excel" without being fucked by their own file formats.
  • Re:HTML from Word (Score:2, Insightful)

    by fzammett ( 255288 ) on Thursday October 24, 2002 @01:46PM (#4523589) Homepage
    Yeah, couldn't be that some people actually BELIEVE WHAT THEY WROTE, right??

    Why is it that every OSS zealot has to insist that any point of view contrary to their own is the result of a derranged mind?

    You want to try and convince me that Microsoft is evil and that I should shun absolutely anything coming out of Redmond and that I should embrace the OSS world? Fine, try and convince me. Do it logically and without insulting me. You'll find it's not that hard because I hate Microsoft anyway, but I don't hate every product they produce, in fact I very much like some of them (Win2K, Office in general as two examples).

    BUT DON'T FUCKING DO IT BY TELLING ME I'M A NUTCASE OR A PAID LACKEY OF SOME CORPORORATE ENTITY BECUASE I DON'T CURRENTLY AGREE WITH YOUR WORLD-VIEW!!

    Another group of people acted the way some of you people act... we fought a world war against them...
  • by lostchicken ( 226656 ) on Thursday October 24, 2002 @01:59PM (#4523734)
    XML can be whatever you want it to. XML does have standarads, but just standards for wrapping data with control codes, not what the control codes mean.

    While StarOffice may use an XML word processing format, it won't be what MSFT will use.
  • Re:However... (Score:3, Insightful)

    by iabervon ( 1971 ) on Thursday October 24, 2002 @02:42PM (#4524059) Homepage Journal
    Actually, it will almost certainly be that the XML version doesn't include all of the information. It will probably be missing a bunch of relatively insignificant details which will mean that the spacing is a bit off if you export to XML and then import (particularly on a different version of Word). This will, of course, totally mess up the document's pagination and such, so people won't tend to do it.

    Additionally, they probably won't add any Office 12 features to the XML version unless they're being prodded hard at the time, so you'll lose any new features if you try to use the XML format (because, of course, they're being careful not to change the XML format...).
  • by esarjeant ( 100503 ) on Thursday October 24, 2002 @03:47PM (#4524572) Homepage
    Here here! Standardization was attempted before via RTF, and ultimately what happened is users preferred File -> Save (rather than File -> Save As...) because Word DOC format seemed to keep all of their formatting. The only way an XML-ized document format for Word, Excel & Powerpoint can work is;

    (a) The XML format is a published open-standard that can be read by any app that recognizes this DTD.

    (b) Microsoft defaults to the new format when a user selects File -> Save.

    If they had done this I would be really excited, but the simple fact is MS can't do this without confusing/annoying their end-users. Imagine if your new version of Word created files that were incompatible with any other previous version of Word. For this reason, Word won't be the next-generation wordprocessor -- WordPerfect perhaps?
  • by InnovATIONS ( 588225 ) on Thursday October 24, 2002 @08:25PM (#4526346)
    By taking the initiative in this MS can create an XML schema that neatly includes ALL of the featureset and terminology of MS Word/Excel/etc.

    Which then by virtue of market share becomes standard. It is actually in their best interest to publish it clearly. Then the other potential competitors will feel strong pressure to fit their software to match MS and have no real excuse why they can't. If MS waited there would be some other standard emerging and MS would be pressured by customers to adopt it. Then it would be MS having to shoehorn its document logic into some other form and not the other way around.

    While other potential competitors are playing catch-up with making their documents fit into the MS schema MS can be busy thinking about the next thing to do.

    So frankly I expect the word document xml (and excel and the rest) to actually be quite clear and documented but very aligned to how MS Word sees a document, which will likely impress others as obtuse.

Living on Earth may be expensive, but it includes an annual free trip around the Sun.

Working...