Tim Bray on Microsoft Office 589
jgeelan writes "The co-inventor of XML, Tim Bray, has been talking about the newly XML-enabled version of Microsoft Office, code-named 'Office 11' and tells XML-Journal that 'when the huge universe of MS Office documents becomes available for processing by any programmer with a Perl script and a bit of intelligence, all sorts of wonderful new things can be invented that you and I can't imagine.'"
Codename? (Score:2, Informative)
WTF???? (Score:3, Informative)
WTF!? XML shouldn't need to be documented. The whole point is to create a human readable file that is parseble by computer. If MS Word delivers an XML file that I can't figure out, it's not XML.
Well Excel in Perl is pretty easy now (Score:5, Informative)
Or you could go the whole hog and use a SAX writer like XML::SAXDriver::Excel [cpan.org] to create the documents from XML yourself.
(This is not to say I don't think XML native formats arn't cool and will have many uses, I'm just pointing out what you can do now.)
Re:Historical turningpoint? (Score:4, Informative)
They didn't do it out of the goodness of their hearts, but they did indeed do it. It wasn't the complete bios though so Compaq had two teams...one team looking at the specs, and another (that could never look) building a clean room implementation.
How to convert Word to XML (Score:5, Informative)
So far the best tool I found is upCast (free for personal use) from http://www.infinity-loop.de/
To convert a Word file:
* Use Word's AutoFormat feature to convert visual formatting to Word styles
* Redefine all the text as Word styles
* Run upCast to convert to XML using the "XML (content, no DTD)" filter
* Run HTML Tidy from http://tidy.sourceforge.net/ with the parameters -xml -utf8 -clean -bare .
Other tools that might be worth a second look:
* Majix (Open Source) - http://www.tetrasix.com/
* WorX SE - http://www.xyvision.com/
* XML MarkupKit (in German) - http://www.eds.schema.de/download/MarkupKit/
* DocSoft LLC Word-to-XML - http://www.docsoft.com/w2xml.htm
Re:I doubt it. (Score:2, Informative)
Re:HTML from Word (Score:5, Informative)
Anyway, there was tons of gibberish in the file, but it displayed fine in IE6. It was a completely blank page in Mozilla! Nothing at all! We always knew the XP didn't stand for cross-platform, but I didn't know it was this bad.
Re:Whats wrong with html/css2 ? (Score:3, Informative)
I don't think the new XML format is meant for documents you wish to publish on the web. Office already support the HTML format pretty well (with some extensions.. ahem) since Office 2000. HTML support works even better in Office XP since it allow you to save the document as "filtered HTML", where Office filters most of the Office-specific tags and attributes at the cost of loosing some information in the document.
I think the XML format is being added since XML represent the document with a much more meaningful structure that's easier to parse by third party software for use in electronic commerce and other automated systems, something that's inappropriate to use HTML code for, as it was designed to make pretty layouts, not to describe content for easy parsing.
I think it's pretty obvious why MS would want to add XML support - to spread their Office document format and make Office useful in places such as web services where it wouldn't be as useful before.
Read the article? (Score:5, Informative)
So unless your mind has been slashdotted to the extent that you think that Microsoft is going to suddenly change the file-format completely between beta and release, then we know that it is perfectly easy to read.
And if you do believe they will change the format, then you are a moron.
There is some documentation of Office XML already. (Score:5, Informative)
It is simply not what others is claiming: <?xml version="1.0"><data>blahblah</data>
Re:What we need is a ISO standard (Score:2, Informative)
Re:What we need is a ISO standard (Score:2, Informative)
Re:Yay Evil Monopoly Of Doom! (Score:2, Informative)
Re:What are you all complaining about? (Score:3, Informative)
They've already shown with
That's not true. Only C# has been submitted to ECMA. VB and JScript.NET have NOT.
The CLI submissions are only a small subset of the
C# and the CLI does NOT make up a platform like Java. It's more like C. Both C# and C provide a basic set of classes. Anything more 'advanced' is provided through extension libraries that may or may not be cross platform (just like C). You could write a sound library for C# that uses DirectX and it would only work on Windows. On the other hand, you could write a sound library for C# that uses OpenAL. It would work on all platforms where OpenAL is supported.
Many features that Java has such as GUIs, Telephony, Speech, Sound, 3D etc aren't supported by
The cross platform hopes for C# pretty lie in OSS hands. It is up to the OSS community to write 'standard' cross platform libraries for C# (just like we have for C). C# interfaces nicely with C so it is likely that many cross platform libraries for C# will use the corresponding C libraries.
As you can see, the CLI is much more like C+GLIB than the "Java Platform".
Java is a meta-operating system. It a huge set of APIs consistantly on all platforms.
C#/CLI does not always provide a consistant API on all platforms but it allows and encourages you to rely and exploit on the native APIs available on the underlying operating system.
Which is better? It really depends on what you want. Java is obviously the only choice for cross platform development (atm). C# however appears to be a good replacement for C -- especially on the client side. It complements the underlying operating system whereas Java tends to hide it. That's why you will see a lot of C#/GTK# applications for Gnome in the future but not many Java/GTK applications.
OpenOffice is XML, now! (Score:5, Informative)
Now I wrote, just for demonstration, the following XSLT example in just a few minutes, useable directly with xsltproc in Linux.
The example prints all the Heading paragraphs in a OO Writer document, indented according to the header level.
<?xml version='1.0'?>
<xsl:stylesheet
xmlns:xsl="http
xmlns:office="h
xmlns:style="ht
xmlns:text="http:
xmlns:table="http://o
xmlns:draw="http://open
xmlns:fo="http://www.w3.
xmlns:xlink="http://www.w3.o
xmlns:number="http://openoffice.or
xmlns:svg="http://www.w3.org/2000/svg"
xmlns:
xmlns:dr
xmlns:math="
xmlns:form="h
xmlns:script="htt
version='1.0'>
<xsl:output method="text" encoding="ISO-8859-1"/>
<!-- Print all headings, indented. -->
<xsl:template match="text:h">
<xsl:value-of select="substring(' ', 1, (@text:level - 1) * 2)"/>
<xsl:text>* </xsl:text>
<xsl:value-of select="text()"/>
<xsl:text>
</xsl:text>
</xsl:template>
<!-- Don't output any other text. -->
<xsl:template match="text()">
</xsl:template>
</xsl:stylesheet>
The result would be something like:
* Top-level heading such as a chapter
* Second-level heading (section)
* Another section
* Subsection
* Subsubsection
* Yet another section
Re:Well Excel in Perl is pretty easy now (Score:3, Informative)
Tim here with a bit more background (Score:5, Informative)
The word "foo" in bold single-underline looks something like
<r>
<rf>
<rp class="bold"
<rp class="underline" lines="1"
</rf>
foo</r>
Yeah, it's pretty verbose.
Near as I can tell, it is 100% round-trip-able, i.e. you save as that file format, you read it in again, you hit ctl-S and it saves again; about as good as a native format. Now someone needs to write some script-ware to run Word in batch mode to xml-ify server directories with zillions of office docsl
I think the reason MS is doing this is obvious. Look at their financials - they *really* need people to upgrade to the new version of Office. End-users don't buy Office any more, CIOs and the like do. These people are just not gonna be impressed by another new word-processing feature, but they might be motivated to upgrade if they thought that they were opening up all their data to re-use by other programs.
I expect that with any luck we'll get a secondary industry built around doing cool unexpected stuff to Office docs. Don't want to sound over-excited here, but a huge amount of all the intellectual capital in the world is sitting around in Office docs, and this makes it noticeably more re-usable. Has to be a good thing.
Cheers, Tim
Re:Historical turningpoint? (Score:1, Informative)
The BIOS was half the story, but IBM also held patents on "ISA", CGA, the disk interface, etc. Clone-makers just bought licences for these parts right from IBM (@ only about $5/PC).
If it wasn't for the "plug-compatible" anti-trust wars in the mainframe market, the PC would have never been cloned,
Re:Yay Evil Monopoly Of Doom! (Score:5, Informative)
I have a Masters in Computer Science with a focus on databases and storage technology and very little of what you said makes any sense to me. There's nothing easier than getting at data stored in SQL. Where I work, we've shipped a few products where we didn't document the schema because it was too complex and we didn't feel we could support it. Within weeks, almost all of our major customrs had it reverse-engineered anyway. SQL is very easy to get at!
kernel level SQL data
There's no such thing. SQL data is stored in tables. You use queries to get at it. Period.
Also, your story doesn't make any sense. The article says Office 11 is in Beta already. IIRC, the SQL Server and Palladium stuff in the OS doesn't come until Longhorn. Do you think they will actually release a version of Office which won't work until their next OS (who knows when that will be) is released and adopted? How will they make money off all the people who recently upgraded to Windows XP then?
Re:Incompatibilities Once Again (Score:2, Informative)
Re:Tim here with a bit more background (Score:3, Informative)
Uhh.. from this article [microsoft.com].
Information Worker turned in healthy revenue growth of 26 percent, reflecting customer adoption of Microsoft Office XP through multi-year licensing programs. Customers acquiring Office this quarter included ChevronTexaco, Lockheed Martin, MetLife, Newell Company (Rubbermaid) and the US Department of the Army, Program Executive Office, Aviation.
and
Microsoft Corp. today announced revenue of $7.75 billion for the quarter ended Sept. 30, 2002, a 26 percent increase over revenue of $6.13 billion for the same quarter last year. Operating income for the first quarter was $4.05 billion, compared to $2.90 billion in the same period last year. Net income and diluted earnings per share for the first quarter of fiscal year 2003 were $2.73 billion and $0.50, which included an after-tax charge for investment impairments of $291 million or $0.05. For the same period of the previous year, net income and diluted earnings per share were $1.28 billion and $0.23, which included an after-tax charge for investment impairments of $1.22 billion.
"Results for the first quarter were exceptionally strong, exceeding our expectations. During the quarter, we saw broader customer adoption of our licensing programs than we anticipated, as customers recognized the value of entering into long-term licensing agreements for our products. This strength in licensing led to solid growth for Windows® XP, Office XP and
Re:Yay Evil Monopoly Of Doom! (Score:1, Informative)