
Tim Bray on Microsoft Office 589
jgeelan writes "The co-inventor of XML, Tim Bray, has been talking about the newly XML-enabled version of Microsoft Office, code-named 'Office 11' and tells XML-Journal that 'when the huge universe of MS Office documents becomes available for processing by any programmer with a Perl script and a bit of intelligence, all sorts of wonderful new things can be invented that you and I can't imagine.'"
Yay Evil Monopoly Of Doom! (Score:3, Interesting)
StarOffice has used XML for their native file formats for some time now; I wonder if this means we'll see an even better-quality translator between the two formats?
Re:Yay Evil Monopoly Of Doom! (Score:3, Insightful)
They won't have to. Since they are going the SQL server way for their filesystem, they can happily give away the hold they have on file formats, since they are going to have a stranglehold on accessing those files. You want an open file system? Here you go (and MS has a lot to gain by doing this - they instantly give Word access to most other data formats) - but don't think anything other than a microsoft OS will actually be able to access the files - thanks to our new deliciously obfuscated method of storing data on a disk. Reverse engineering kernel level SQL data (how a bit of crypto, for DRM of course, thrown in) will probably be even harder than reverse engineering file formats was. And impossible to do legally (say hi to all those DMCA guys out there.)
Re:Yay Evil Monopoly Of Doom! (Score:5, Insightful)
Re:Yay Evil Monopoly Of Doom! (Score:3, Insightful)
"Doesn't sound like XML to me?!"
Sure it is! It's XML with Microsoft Security Extensions!
Re:Yay Evil Monopoly Of Doom! (Score:4, Insightful)
Now they of course will change Office for the Mac to read from those servers... The data WILL be stored in XML on those servers, so coders will have an easy time with it.
You bring up an interesting point about paranoid people and Microsoft. I have followed Microsoft fairly closely over the last ~18 years and feel comfortable saying that they have never worked with any "standard" out there. They have ALLWAYS developed their own. Can you name an example of any "standard" software technology they have adopted and not changed? A perfect example of this would be ZIP. Why doesn't Microsoft use it instead of CAB files? There are many many more I could use as examples if you would like.
Microsoft has an internal saying "If it is not ours destroy it".
My point is this. A company that has for 18 years been trying to lock people in to their technology, will cause some people to be a bit paranoid.
Re:Yay Evil Monopoly Of Doom! (Score:3, Insightful)
While StarOffice may use an XML word processing format, it won't be what MSFT will use.
Re:Yay Evil Monopoly Of Doom! (Score:5, Insightful)
Okay, so it'll be harder to mount a windows partition effectively, but this doesn't affect transmission of documents, especially if they're stored in an XML format. As for me, I think it's more valuable to have files that I can read outside of their native filesystem rather than have a readable filesystem filled with unreadable files.
Re:Yay Evil Monopoly Of Doom! (Score:3, Interesting)
Re:Yay Evil Monopoly Of Doom! (Score:3, Insightful)
file formats"? What would be the point? A PR move to say that they use XML?
YES! Now you are starting to get it!
I can't think of any reason to adopt an XML format if it wasn't at
least a little more open then the binary file formats they've been using.
How about for a "PR move to say they use XML". In addition it is obvious how to make an XML that is exactly as obscure, by putting the entire contents of the old format into a binary block.
Also, how would a "binary, proprietary, encrypted file format" fit into everything else Microsoft is doing with .NET? Wouldn't Microsoft .NET's XML libraries?
want the content of a document to be open enough so that it could be read and processed by applications using
No, of course not. You would only read Word documents with the special "read a Word document" interface. It might use the XML libraries underneath, but big deal. Be assurred you will be unable to reconstruct all the contents of the document by any kind of perverted arrangement of calls to the "read a Word document interface". (though not just a complaint abount MicroSoft, I think .NET, DCOM, CORBA, KCOP, etc all pervert the idea of "object orientation" by making elaborate communcation protocols which are only "object oriented" because they call some part of the protocol an "object". Real object-orientation means there is some commonality of functionality, and the only instances I can think of that really work are the original Unix where everything known then (terminals, printers, tapes, disks) used the same read/write/seek calls, and Plan9 which tries to extend this to networks and file systems).
Explain to me why Microsoft would want to prevent you from sending your self-generated Word documents to another computer? What possible sense does this make? Is it because they hate their customers and want to piss them off so they won't use Microsoft products any more? Has RedHat paid Microsoft to include technology that will piss off all Windows users?
Ha ha, very funny. Of course you will be able to send a Word document to another computer. It will still be an unreadable Word document. If they can obfuscate things so that the destination computer also has to be running Windows, all the better. You seem to be under the weird delusion that "other computer" meant "other computer running Windows" when in fact I'm sure every other poster here knew it meant the exact opposite, ie "other computer not controlled by MicroSoft".
Re:Yay Evil Monopoly Of Doom! (Score:5, Informative)
I have a Masters in Computer Science with a focus on databases and storage technology and very little of what you said makes any sense to me. There's nothing easier than getting at data stored in SQL. Where I work, we've shipped a few products where we didn't document the schema because it was too complex and we didn't feel we could support it. Within weeks, almost all of our major customrs had it reverse-engineered anyway. SQL is very easy to get at!
kernel level SQL data
There's no such thing. SQL data is stored in tables. You use queries to get at it. Period.
Also, your story doesn't make any sense. The article says Office 11 is in Beta already. IIRC, the SQL Server and Palladium stuff in the OS doesn't come until Longhorn. Do you think they will actually release a version of Office which won't work until their next OS (who knows when that will be) is released and adopted? How will they make money off all the people who recently upgraded to Windows XP then?
Re:Yay Evil Monopoly Of Doom! (Score:4, Insightful)
You'll be DMCA'd out of the loop for trying, and the format will validate itself with 'Palladium' features in software, or some such.
However, the mind reels at the idea of managing PowerPoint and Excel files from emacs!
Re:Yay Evil Monopoly Of Doom! (Score:5, Funny)
Dark-masked B.Gates approaching you:
"I find your lack of faith....disturbing."
Re:Yay Evil Monopoly Of Doom! (Score:5, Insightful)
Of course, this will never happen. Instead, MS will continue to push their own "open" XML based file formats. Microsoft Kerberos, anyone?
However... (Score:4, Insightful)
An eXaMpLe of MSWord XML... (Score:5, Funny)
<Data>
MSWORD$$g$%jk$%sxx"d$%^$
</Data>
Re:An eXaMpLe of MSWord XML... (Score:3, Funny)
MSWORD$IFERFHFO$FFKFJWJEFOJ$FJ$FJ*J#$J$RJ$R$J)JE*F J$Netscape engineers are weenies()$()#*$)*$U*$U*U$%*
Re:However... (Score:5, Funny)
What?!?!?!? You mean they'll try and pass something off as a a "security feature," when it's really intended to protect them?
Nah, that's not really their style.
Read the article? (Score:5, Informative)
So unless your mind has been slashdotted to the extent that you think that Microsoft is going to suddenly change the file-format completely between beta and release, then we know that it is perfectly easy to read.
And if you do believe they will change the format, then you are a moron.
Re:MOD PARENT UP (Score:3, Troll)
Using DMCA they then can crimple any competitor who creates a word editor that uses their format.
They already done that with other products and wrote it in thier halloween-document. Where do you have any indication that they should change stratagy now ?
This are partly a way to be able to use buzzwords in thier advertising and to try and cripple the open source competition with FUD in the same way they tried with their shared source publicity stunt.
Comment removed (Score:5, Interesting)
Re:MOD PARENT UP (Score:3, Interesting)
Did you ever think that maybe all the things MS has done in the last 24+ months have at their root the exact same motivations as everything MS has done in the past 24+ years? MS has a long and well documented history of showing "increasingly high level of support for interoperability" while at the same time subverting those same open standards so that they will only work with MS Operating Systems. Kerberos? SMB? ASCII text files?! The list goes on...
Did it ever occur to you that, despite what your stockbroker keeps telling you, past performance just might be indicative of future performance?
What part of "Embrace, Extend, Exterminate" do you not understand?
Re:MOD PARENT UP (Score:3, Interesting)
The folks at Microsoft haven't concluded that the Halloween documents were garbage, they are simply under increased pressure from their customers to provide features that are actually useful. Microsoft's biggest problem isn't Linux or StarOffice or any other non-Microsoft product. Microsoft's biggest problem is that people are increasingly happy with the Microsoft software they already own. There are a lot of companies that are perfectly content to keep on using MS Office 2000, and these guys hurt Microsoft's business model just as much as the Linux converts do. So Microsoft has to do something to entice these users to the new versions.
What Microsoft would like to do is simply switch formats like they did between Office 95 and Office 97. That would force everyone to become current. However, that move was viewed very negatively by most of Microsoft's larger customers. A new XML format, that won't be readable by older clients, as a secondary format is as close as Microsoft is likely to get. Throw in the fact that for the first time ever businesses will be able to use the information in these common formats easily and you have an idea that might tempt even some of the more stalwart holdouts that an upgrade is in order.
And Microsoft has to play fair in this case too. The fact of the matter is that StarOffice has XML formats now. If Microsoft gets too heavy handed then corporations will simply jump ship.
In other words, you are absolutely right. The XML formats are going to be great. They have to be, otherwise people will simply continue to use the old format and Microsoft will fail in their attempts to get everyone to upgrade. What the new formats won't have are open schemas (or DTDs). Sun and Corel get to reverse engineer another document filter.
Re:MOD PARENT UP (Score:3, Insightful)
1. Implement an interpreter module using the BSD license. Allow it to export to another XML model. Lets call this output deMStify.xml
2. Use a GPL model to read deMStify.xml and do whatever the hell it wants
3. ??? Profit... hehe
JOhn
Re:MOD PARENT UP (Score:5, Insightful)
Now, whether they can license the format so as to make it illegal for other apps to use it, I don't know. However, I suspect this is not the case as it more or less removes the advantage to having invested in XML in the first place. Well, sure there's good publicity, but how long would that last when people immediately discover it is worthless?
And of course, the vast majority of people don't care about file formats. The only people to whom this news is of interest are those who will want to either access Office docs themselves, or use other apps (e.g. Open Office) to view Office docs. If this sector are banned from doing this, why did MS spend so much money on using XML in the first place?
Re:MOD PARENT UP (Score:4, Insightful)
You might similarly ask, "If MS didn't intend to comply with web standards, why did they spend so much on Internet Explorer"
Please tell me I don't have to spell the answer out for you.
I expect them to use legal means (Score:3, Funny)
What really is going to happen (Score:3, Interesting)
Some starry-eyed graduate student there is going to stay up all night for a few weeks and try to do it right, and may even be 3733t enough to try non-MicroSoft tools to read the XML to see if they really did it right. Probably all the problems with the format is that this person is going to be inept. In fact I'm sure that amateur or inept programmers are far more responsible for all the standards breaking from MicroSoft than some evil plan by Bill Gates.
The problem is that this is not going to be the default save-as format. Most likely the ability to change to this format will be buried pretty deep, and once you do it will pop up error boxes that say "some features of your document may be lost". Again this probably wont really be an order from evil overlords to discourage XML. It will be the inept programmer, realizing that they can't figure out how to translate an obscure feature and thinking they better warn the poor user, and too stupid to figure out how to delay the warning until they detect if the document is using the untranslatable feature.
The result is that "Word" files will still be the same as they are now. If you don't believe me, MicroSoft long ago tried to standardize of RTF, with exactly the same fanfare and claims that this would solve the incompatability problems. Nobody uses RTF now. And try sending an RTF saved by Word to one of the places that insists you send them a Word document. They will not take it.
Word also saves as HTML and plain text and can make a pdf, and despite claims here that they are ugly they are still parsable and adhere enough to standards that you can write code to read them. All of this is totally irrelevant, these are not "Word documents". And this new XML is not going to be a "word document" any more than those are.
Re:However... (Score:3, Insightful)
Additionally, they probably won't add any Office 12 features to the XML version unless they're being prodded hard at the time, so you'll lose any new features if you try to use the XML format (because, of course, they're being careful not to change the XML format...).
Incompatibilities Once Again (Score:3, Insightful)
.... I guess it's just MSXML rather than THE standard XML. But we can figure it out with some "intelligent guesswork" now because the file would be human-readable.
Re:Incompatibilities Once Again (Score:5, Insightful)
More significantly, there might be small incompatibilities, or ways that Word-created XML documents divert slightly from what is normal and proper in XML. Perhaps Word will make some (intentional) mistakes when reading back XML files generated in other applications, just like Word's old SGML module would choke on many proper SGML documents.
Make no mistake: the fact that almost everybody is using Office and the associated file formats makes it very hard for a new contender to enter the office suite market. Microsoft must be aware of the power they have over the market with their Office file formats. Think of it: when you exchange files with other businesses, you have two realistic choices of file formats: Office or plaintext. And now Microsoft is introducing compatibility with an open and well-defined markup langauge, in favour of their proprietary language? I'll believe it when I see it.
Re:Incompatibilities Once Again (Score:5, Insightful)
I think PDF is a viable (growing even) third option. Adobe is "evil" just like MS (remeber Sklyarov)... regardless, PDF is nice and it works well, and the files are way smaller than word docs.
wicked :) (Score:2)
I'm wondering, can MS charge for licences to write tools that parse the XML documents?
Re:wicked :) (Score:3, Insightful)
What will be the default save format? (Score:5, Insightful)
The most important question, besides if the MS Word XML format will be well-documented enough, is if it will be the default saving format. Most MS Office users simply don't care enough to save MS Word documents in RTF, for example, even if it's more than good enough for the vast majority of the documents.
Not the main issue on the article, but it is unfair to single someone as the inventor of XML, which is just a streamlined version of SGML which is an evolution from IBM's GML.
Re:What will be the default save format? (Score:5, Funny)
If you continue with that line of reasoning, someone's gonna demand that it be called SGML/XML.
Grr.
Re:What will be the default save format? (Score:5, Interesting)
I doubt it. (Score:3, Insightful)
I'm guessing their XML document format will be just as hard to decyper and the current office formats.
Re:I doubt it. (Score:5, Insightful)
Why not? After all, the high-quality ActiveState port of Perl to Win32 exists because Microsoft paid for it, and you can download it for free. Not only that, but if you want to write your own code to manipulate Office documents, you have been able to do that for years in VBA - all the Office programs expose rich APIs. In fact, they are composed of Objects that you can instantiate and use in your own programs if you want - all MS care about is that there is a licensed copy of Office on the user's machine. One of the easiest ways to do charting is to simply reuse a bit of Excel, for example. From there it's a short hop via COM to any program you want.
I'm guessing their XML document format will be just as hard to decyper and the current office formats.
The fact that Office documents have been in a proprietary format in the past is actually unimportant, since the interfaces to the applications (and hence their documents) are well documented (check MSDN or Barnes & Noble if you don't believe me). So the reason that Microsoft are doing this is that they lose nothing and gain from making the platform even more attractive to developers.
Re:I doubt it. (Score:5, Insightful)
So you can read Office documents with other programs as long as you have Office and MS dev tools?
You do see the folly in that, right?
-Kevin
Re:I doubt it. (Score:3, Funny)
Re:I doubt it. (Score:5, Interesting)
There are 2 problems with the current format of Microsoft Office file:
This is mostly solved (thanks to years of trials and errors).
This is definitively more difficult, as nobody knows Office internals and how they expect such additional data to be. StarOffice guys managed to make an acceptable job, at the price of years of trials and errors. It's like watching at a dump of your computer's memory, guesssing what's code, what's data, what's padding and the meaning of every byte...
Now, do an XML format simplifies things? Well, yes, just as an RTF text is easier to manage than a pure binary format, but nothing prevents putting extra cruft in an XML document, so it's just that instead of having to use a hex editor, you now may use a text editor, but giving a correct interpretation of tags and attributes is something that only Microsoft can do, unless it publishes the full specifications (present and future: after all, XML is eXtendible, right?)
Personally, I think that:
Re:I doubt it. (Score:3, Insightful)
Historical turningpoint? (Score:5, Interesting)
One small such point is when IBM gave out the specs to their hardware for PC allowing everyone to clone it, while Apple did not.
This could be such a point. Maybe in 10 years we'll look back at this and ask ourselves "Why the heck did MS XML-enable their Office app, releasing the hold that they had"
Only time will tell I guess.
I Play Hattrick [hattrick.org]
Re:Historical turningpoint? (Score:4, Informative)
They didn't do it out of the goodness of their hearts, but they did indeed do it. It wasn't the complete bios though so Compaq had two teams...one team looking at the specs, and another (that could never look) building a clean room implementation.
Codename? (Score:2, Informative)
Re:Codename? (Score:3, Funny)
See! They succeeded in mystifying you. Now while you are tettering in amazement they will move in and 'embrace and extend' you into the MS cloud.
Re:Codename? (Score:3, Funny)
*when* ? (Score:4, Funny)
I beg you pardon? Smelly programmers can keep their hands off my documents. If I wanted you to have them, I'd have emailed them to you as plaintext. I wasn't aware the the Office license meant my documents were common property....
The right time for MS (Score:5, Insightful)
Right now they are seeing diminishing sales, possible shrinking market share. Most of the danish public sector is looking to save money using OpenOffice/StarOffice.
MS needs to increase their compatibility with other options, as they would otherwise force customers to convert every single user away from MS at once, instead of OpenOffice coming in slowly.
They can also hope, that their format is setting the standard, and the other companies will have to play catch-up rather than the other way around.
imagination (Score:5, Funny)
When will MS ever learn that we don't WANT to imagine how wonderfull the MS Office Universe is ?
WTF???? (Score:3, Informative)
WTF!? XML shouldn't need to be documented. The whole point is to create a human readable file that is parseble by computer. If MS Word delivers an XML file that I can't figure out, it's not XML.
Re:WTF???? (Score:4, Insightful)
Re:WTF???? (Score:5, Insightful)
With XML Schema and DTD's, you can validate various aspects of the data without writing a custome validator.
With XPath and XPointer you can refer to parts of an XML document without needing to understand what the document contains.
With XSL you can translate all or parts of the document from one format to the other without your application needing to know the structure, and without needing to understand more of the format than the parts you are extracting.
With SAX and the DOM you can programmatically traverse and extract information from an XML file without having to write a custom parser.
With CSS an editor or viewer for instance can use a standard mechanism of applying styles to elements without hardcoding the style attributes for elements anywhere.
With XML namespaces, you can intersperse data in various formats in the same file, and the components handling each of the vocabularies need not know anything about the other components - an example would be embedding SVG in HTML: The HTML renderer doesn't need to understand any of the SVG tags, only that it should delegate contents with other namespaces to another component. And the SVG renderer couldn't care less about the HTML.
And this doesn't even touch on the benefits of all the various interchange formats that have been specified on top of these base technologies.
The importance of XML is that it opens up the doors for building interchangable components that operate on data without needing any hardcoded application specific knowledge of the data.
Most of the time, you still have to write some code to tie it all together, but you don't have to build your own parsers, your own document object model, your own styling system, your own way of handling contained data of other types, your own way of transforming data between formats, etc.
For me as a software developer XML delivered years ago. I use XML technologies daily, and it saves me work.
Re:WTF???? (Score:3, Insightful)
<?xml version="1.0">
<document>
jMyB38QAAMETWFjs7IQAAQEVkJBNq0jEAAW
RvbGWTYBAADARUaGlzRG9jdW1lbnQ8nhAAC
udGrTEAAC8BATwAAADMAv8AAgEABABIAAAA
</document>
is valid xml, just like a uuencoded file is valid ASCII and human readable.
But if other M$ products are any indication it won't be that bad. I parsed some Visio stuff and the data was more or less readable. The drawing data (or previews, didn't care) was still encoded though. I expect it to go a little like M$ html did.
Syntax vs. Semantics (Score:5, Insightful)
Re:WTF???? (Score:3, Insightful)
Uhm, it is also the point of source files in the programming language of your choice, I'd say... and still, you need good comments.
XML is like Lisp, but with sharp parenthesis.
Re:WTF???? (Score:4, Insightful)
I would say that XML isn't a markup language - a markup language would allow the "bad nesting", since a markup language should be "layers of virtual highlighter pen" applied to an underlying data stream. XML, since it requires "proper nesting", is just Lisp sexps reimplemented, but with terrible syntax. It's yet-another-tree-structured-data-format. Big Wow. A true markup language environment would facilitate part-structured data, like HTML used to be, rather than shoehorning everything into trees.
Lisp sexps would just say (stuff (things "text"))
In fact, that's pretty much all there is to lisp syntax right there. The above is (a) a potentially valid lisp program and (b) a valid lisp data structure.
XML is a data format designed mainly to allow C and Java programmers to use vaguely Lisp-like processing techniques without realising it and/or admitting it to themselves.
What is the format? (Score:2)
XML takes away Microsoft's main advantage (Score:5, Interesting)
However, if Microsoft Office documents become "built around an open, internationalized standard", i.e. XML, would this not enable the people behind OpenOffice, StarOffice etc to acheive total 100% file compatability and thus negate Microsoft's largest advantage with Office?
Of course, this could be yet another Microsoft "embrace and extend" tactic, a la` kerberos. Incorporate the standard in a bastardised form, claim standards compatability, then pollute it so you must be using Microsoft technology to properly interact with it.
No, it doesn't (Score:3, Interesting)
XML helps only if the creator of the document wants the information to be easily accessible by programs other than their own.
HTML from Word (Score:5, Interesting)
Re:HTML from Word (Score:5, Insightful)
An excellent point sir. That's a great illustration of how Microsoft approaches 'open' file formats.
People that think that MS Office is going to move to open, well documented file formats are just plain nuts. But look at many of the comments in this forum - it seems MS has even managed to persuade many Slashdotters that they are going to use open formats. Poor fools.
Re:HTML from Word (Score:5, Informative)
Anyway, there was tons of gibberish in the file, but it displayed fine in IE6. It was a completely blank page in Mozilla! Nothing at all! We always knew the XP didn't stand for cross-platform, but I didn't know it was this bad.
Typical XML-proponent mistake (Score:5, Insightful)
For "any programmer with a Perl script and a bit of intelligence" it doesn't make a difference if you read bytes (binary) or XML structures.
As long as you don't get a DTD with extensive comments on how to interpret the elements, along with some promise/guarantee that the DTD won't change every minor release, there is no real improvement at all.
The fact that XML is human readable is irrelevant, since no human shall read the files, but programs such as perl scripts shall. For them it makes hardly any difference; it is only marginally easier since you can use an existent XML parser instead of rolling your own (which is no big deal using the right tools such as YACC).
This 'openness' comes at a good time for Microsoft. They suggest openness in a time that they are criticized and attacked because of file-format lock in. Many 'advisors' shall be mislead, blinded by buzzwords such as XML as they are, and actually believe that this solves the issue.
Re:Typical XML-proponent mistake (Score:5, Interesting)
As long as you don't get a DTD with extensive comments on how to interpret the elements, along with some promise/guarantee that the DTD won't change every minor release, there is no real improvement at all.
Have you ever tried to reverse engineer a binary file format? And have you ever tried to do the same thing with an XML file format? I learned huge chunks SVG yesterday _without_ opening an SVG book, just by mucking around in an existing SVG file and with an SVG viewer. Of course, Microsoft could do something clearly in violation of the spirit of XML, by making the whole thing one tag full of base64ed text or something. But as long as they use tags in a semi-sane way (which is the whole point, for integration with corporate systems), XML will be a big step forward.
Re:Typical XML-proponent mistake (Score:3, Insightful)
XML may be easier to reverse engineer, but must not be, this depending on how complex the DTD/Schema is and if the designer intended it to be easily understandable or not. Apart from that, as a purist I don't like reverse engineering, especially not if the subject of reverse engineering is from an uncooperative company known for its dirty tricks.
A non XML grammar/syntax, if accompandied by a decent and documented EBNF description of it's grammar, is much better to base your program on than an undocumented XML.
Stalling tactics?... (Score:5, Insightful)
MS now has a serious competitor in StarOffice/OpenOffice.org. And that competitor has two compelling advantages - it's cheaper/free, and open XML file formats. So when clued-up IT people say to their Pointy-Haired Bosses that they should use StarOffice/OpenOffice.org, PHBs can respond "but MS is doing that next year. We can avoid all the disruption of changing office suites just by waiting a bit and upgrading to the next version of MS Office. Besides, we're already paying for it." Then when MS actually releases Office 11, they will have used all sorts of devious and subtle devices to keep their lock-in of the file format, and MS and PHBs will be happy.
Well Excel in Perl is pretty easy now (Score:5, Informative)
Or you could go the whole hog and use a SAX writer like XML::SAXDriver::Excel [cpan.org] to create the documents from XML yourself.
(This is not to say I don't think XML native formats arn't cool and will have many uses, I'm just pointing out what you can do now.)
Re:Well Excel in Perl is pretty easy now (Score:3, Informative)
What I heard.... (Score:3, Interesting)
I think maybe it was the CEO of Microsoft Denmark. I'm NOT sure though
The new Word XML document format: (Score:5, Funny)
<uueWord2kDocument>
M"@D)("!'3E
M("`@(%9E7)I9VAT("A#*2`Q.3DQ
M($9R9
M92!V97)B87
</uueWord2kDocument>
C'mon People (Score:3, Insightful)
If you think there is even a remote chance in he-double L that MS will loosen their grip on this revenue stream, I have a bridge to sell you.
You can call this flamebait if you want, but what in MS's history would lead me to believe they are suddenly going to change their historic behavior pattern AND risk a huge amount of revenue at the same time?
I can see it now... (Score:3, Funny)
Re:I can see it now... (Score:4, Redundant)
<document type="word">
<ole><![CDATA[ (linenoise) ]]></ole>
</document>
I.e OLE blobs embedded in an XML container
They already did this for two other products (Score:5, Interesting)
ASP.net uses XML for all the human-readable files, and the IIS in windows.net server finally uses Apache-style configuration files which are also XML.
Yeah, right (Score:3, Insightful)
M$ only cooks with water too. (Score:3, Insightful)
Looking at Frontpain and Word HTML and extrapolating XML from that, tells me they're gonna do just a crappy job as usual and really think they've done a great thing.
Just like the people sending me source code additions and DB content as Wordfiles. Nothing but simple inemptitude, I say.
Not that my System of choice, Linux, is that much more consistent. Mind you. With a bazillion Font methods, every single one of them looking crappier than the next and QT, GTK+, Motif, Lesstif, Inbetweentif, Swing, TK and whatnot and none of them following the same Clipboard behaviour it's just as weedy. Only it is under *my* control to change it.
That way, the bottom line is: With OSS if it doesn't work, there's another way. With M$ it's 'Game Over' with the first "Error in module [fill in random hexcode here]".
That's the simple difference.
what a cool codename (Score:4, Funny)
awesome. Apparently the next version of the linux kernel is code named 2.6! Wow!
How to convert Word to XML (Score:5, Informative)
So far the best tool I found is upCast (free for personal use) from http://www.infinity-loop.de/
To convert a Word file:
* Use Word's AutoFormat feature to convert visual formatting to Word styles
* Redefine all the text as Word styles
* Run upCast to convert to XML using the "XML (content, no DTD)" filter
* Run HTML Tidy from http://tidy.sourceforge.net/ with the parameters -xml -utf8 -clean -bare .
Other tools that might be worth a second look:
* Majix (Open Source) - http://www.tetrasix.com/
* WorX SE - http://www.xyvision.com/
* XML MarkupKit (in German) - http://www.eds.schema.de/download/MarkupKit/
* DocSoft LLC Word-to-XML - http://www.docsoft.com/w2xml.htm
Hype! Hype! Hype! (Score:5, Interesting)
The thread a couple of weeks ago about the death of META headers will apply 1000 times worse for semantic tags-- if the semantic web is going to work at all it needs to start from headers describing the webpage as a whole.
(Also, what's with XML-Journal's claim the article has three pages when it only has two?)
Bigger picture (Score:3, Insightful)
In the past, MS Office was the cash cow at Microsoft, but the market for office packages is rather
saturated... companies and governments are looking for cheaper alternatives etc. Not much room to
grow. Now they can afford playing the good guys by opening up their file formats, since they got
new markets to capture... mobile phones, handheld computers, home entertainment etc.
What we need is a ISO standard (Score:5, Interesting)
Then goverments and corporation will adopt it for official documents so they can read their own documents in ten years.
Re:What we need is a ISO standard (Score:3, Interesting)
This may interest you:
http://www.1dok.org/eng/index.html
Re:What we need is a ISO standard (Score:5, Insightful)
and was a complete failure.
So? Formats come and go all the time. Just because the ISO failed in the early nineties doesn't mean someone else would fail today.
Nice, but redundant statement (Score:4, Funny)
any programmer with a Perl script and a bit of intelligence
and I thought intelligence was a prerequisite to be able to handle perl ? :)
There is some documentation of Office XML already. (Score:5, Informative)
It is simply not what others is claiming: <?xml version="1.0"><data>blahblah</data>
What are you all complaining about? (Score:5, Insightful)
They've already shown with
So perhaps instead of perpetually slating Microsoft, you could get off your arse and do something useful instead.
Nick...
Re:What are you all complaining about? (Score:3, Informative)
They've already shown with
That's not true. Only C# has been submitted to ECMA. VB and JScript.NET have NOT.
The CLI submissions are only a small subset of the
C# and the CLI does NOT make up a platform like Java. It's more like C. Both C# and C provide a basic set of classes. Anything more 'advanced' is provided through extension libraries that may or may not be cross platform (just like C). You could write a sound library for C# that uses DirectX and it would only work on Windows. On the other hand, you could write a sound library for C# that uses OpenAL. It would work on all platforms where OpenAL is supported.
Many features that Java has such as GUIs, Telephony, Speech, Sound, 3D etc aren't supported by
The cross platform hopes for C# pretty lie in OSS hands. It is up to the OSS community to write 'standard' cross platform libraries for C# (just like we have for C). C# interfaces nicely with C so it is likely that many cross platform libraries for C# will use the corresponding C libraries.
As you can see, the CLI is much more like C+GLIB than the "Java Platform".
Java is a meta-operating system. It a huge set of APIs consistantly on all platforms.
C#/CLI does not always provide a consistant API on all platforms but it allows and encourages you to rely and exploit on the native APIs available on the underlying operating system.
Which is better? It really depends on what you want. Java is obviously the only choice for cross platform development (atm). C# however appears to be a good replacement for C -- especially on the client side. It complements the underlying operating system whereas Java tends to hide it. That's why you will see a lot of C#/GTK# applications for Gnome in the future but not many Java/GTK applications.
the most wonderful thing... but it's not happening (Score:3, Interesting)
Unfortunately, Microsoft won't let it happen. The data may be "in XML", but that doesn't mean you can read it or generate it well. Instead, Microsoft will give you just enough to serve their business interests and nobody else's.
How? Office will probably stick undocumented base64 encoded binary stuff into the output, containing formatting information. You can use the document content, for example, with a database, but you can't load it into another word processor and preserve all the formatting. And in the other direction, sure, you can generate simple documents that Office will import, but you can't generate arbitrary Word documents--they will, again, have weird, undocumented tags and binary stuff.
In short: don't hold your breath. Microsoft isn't stupid.
OpenOffice is XML, now! (Score:5, Informative)
Now I wrote, just for demonstration, the following XSLT example in just a few minutes, useable directly with xsltproc in Linux.
The example prints all the Heading paragraphs in a OO Writer document, indented according to the header level.
<?xml version='1.0'?>
<xsl:stylesheet
xmlns:xsl="http
xmlns:office="h
xmlns:style="ht
xmlns:text="http:
xmlns:table="http://o
xmlns:draw="http://open
xmlns:fo="http://www.w3.
xmlns:xlink="http://www.w3.o
xmlns:number="http://openoffice.or
xmlns:svg="http://www.w3.org/2000/svg"
xmlns:
xmlns:dr
xmlns:math="
xmlns:form="h
xmlns:script="htt
version='1.0'>
<xsl:output method="text" encoding="ISO-8859-1"/>
<!-- Print all headings, indented. -->
<xsl:template match="text:h">
<xsl:value-of select="substring(' ', 1, (@text:level - 1) * 2)"/>
<xsl:text>* </xsl:text>
<xsl:value-of select="text()"/>
<xsl:text>
</xsl:text>
</xsl:template>
<!-- Don't output any other text. -->
<xsl:template match="text()">
</xsl:template>
</xsl:stylesheet>
The result would be something like:
* Top-level heading such as a chapter
* Second-level heading (section)
* Another section
* Subsection
* Subsubsection
* Yet another section
Tim here with a bit more background (Score:5, Informative)
The word "foo" in bold single-underline looks something like
<r>
<rf>
<rp class="bold"
<rp class="underline" lines="1"
</rf>
foo</r>
Yeah, it's pretty verbose.
Near as I can tell, it is 100% round-trip-able, i.e. you save as that file format, you read it in again, you hit ctl-S and it saves again; about as good as a native format. Now someone needs to write some script-ware to run Word in batch mode to xml-ify server directories with zillions of office docsl
I think the reason MS is doing this is obvious. Look at their financials - they *really* need people to upgrade to the new version of Office. End-users don't buy Office any more, CIOs and the like do. These people are just not gonna be impressed by another new word-processing feature, but they might be motivated to upgrade if they thought that they were opening up all their data to re-use by other programs.
I expect that with any luck we'll get a secondary industry built around doing cool unexpected stuff to Office docs. Don't want to sound over-excited here, but a huge amount of all the intellectual capital in the world is sitting around in Office docs, and this makes it noticeably more re-usable. Has to be a good thing.
Cheers, Tim
Re:Tim here with a bit more background (Score:3, Informative)
Uhh.. from this article [microsoft.com].
Information Worker turned in healthy revenue growth of 26 percent, reflecting customer adoption of Microsoft Office XP through multi-year licensing programs. Customers acquiring Office this quarter included ChevronTexaco, Lockheed Martin, MetLife, Newell Company (Rubbermaid) and the US Department of the Army, Program Executive Office, Aviation.
and
Microsoft Corp. today announced revenue of $7.75 billion for the quarter ended Sept. 30, 2002, a 26 percent increase over revenue of $6.13 billion for the same quarter last year. Operating income for the first quarter was $4.05 billion, compared to $2.90 billion in the same period last year. Net income and diluted earnings per share for the first quarter of fiscal year 2003 were $2.73 billion and $0.50, which included an after-tax charge for investment impairments of $291 million or $0.05. For the same period of the previous year, net income and diluted earnings per share were $1.28 billion and $0.23, which included an after-tax charge for investment impairments of $1.22 billion.
"Results for the first quarter were exceptionally strong, exceeding our expectations. During the quarter, we saw broader customer adoption of our licensing programs than we anticipated, as customers recognized the value of entering into long-term licensing agreements for our products. This strength in licensing led to solid growth for Windows® XP, Office XP and
Structure vs Presentation? (Score:3, Insightful)
MS Office saving its data in XML format is a great start.
But will this really be enough?
Previous complaints about how versions of Office didn't disclose the format were often referred to a specification that Microsoft made available to describe what was in a Word document.
The key problem, IIRC, was the the description was not sufficient for one to predict how the Word document was actually formatted and rendered on the page.
Because XML is very much like SGML or TeX, it has the potential for much more exhaustively describing document structure. But whether the new Word XML format (or OpenOffice format, for that matter) contains sufficient information for developers to reproduce the "right" format is a different issue.
I hope I'm wrong and that the format is specified comparably to the level you'd find in say PostScript or PDF.
Maybe MS is willing to let rendered Office douments change, just as HTML rendered documents change whenever one resizes the browser window.
But I doubt it.
Government Contracts Might be The Reason (Score:4, Interesting)
This tied to the fact that US sales are going to slow down or are already, due to the complete inundation of PC, they need new markets, and unless they use an open format they won't be able to get them. I'd be panicked Linux and Java eroding their server market. Governments are eroding their Office market. They only way they can grow is add value.
Genuine XML? (Score:4, Interesting)
Why is MS doing this...featuresets (Score:3, Insightful)
Which then by virtue of market share becomes standard. It is actually in their best interest to publish it clearly. Then the other potential competitors will feel strong pressure to fit their software to match MS and have no real excuse why they can't. If MS waited there would be some other standard emerging and MS would be pressured by customers to adopt it. Then it would be MS having to shoehorn its document logic into some other form and not the other way around.
While other potential competitors are playing catch-up with making their documents fit into the MS schema MS can be busy thinking about the next thing to do.
So frankly I expect the word document xml (and excel and the rest) to actually be quite clear and documented but very aligned to how MS Word sees a document, which will likely impress others as obtuse.
Re:Too good to be true (Score:5, Insightful)
Because it doesn't matter if everyone is able to read, modify and generate Office-compatible files. People will us Office products in future. Opening the file formats doesn't change anything.
XML makes it easy to create programs that will depend on MS Office. So this only makes it easier to create programs which depend on Microsoft products.
Re:Too good to be true (Score:5, Insightful)
Once I can move my team of 20 people to open office with no real worries or complaints about 'interchanging' files with lusers still using Microsoft, I will.
BUT, have you ever looked at an HTML file generated by Microsoft word? It is a GREAT example of how they can pollute a standard into something unreadable.
I suspect that they will copyright or otherwise lock up their DTD/Schema, and try to lash out at anyone that uses them in other than 'approved' ways.
Re:Too good to be true (Score:5, Interesting)
Re:Whats wrong with html/css2 ? (Score:3, Informative)
I don't think the new XML format is meant for documents you wish to publish on the web. Office already support the HTML format pretty well (with some extensions.. ahem) since Office 2000. HTML support works even better in Office XP since it allow you to save the document as "filtered HTML", where Office filters most of the Office-specific tags and attributes at the cost of loosing some information in the document.
I think the XML format is being added since XML represent the document with a much more meaningful structure that's easier to parse by third party software for use in electronic commerce and other automated systems, something that's inappropriate to use HTML code for, as it was designed to make pretty layouts, not to describe content for easy parsing.
I think it's pretty obvious why MS would want to add XML support - to spread their Office document format and make Office useful in places such as web services where it wouldn't be as useful before.