Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Microsoft

Tim Bray on Microsoft Office 589

jgeelan writes "The co-inventor of XML, Tim Bray, has been talking about the newly XML-enabled version of Microsoft Office, code-named 'Office 11' and tells XML-Journal that 'when the huge universe of MS Office documents becomes available for processing by any programmer with a Perl script and a bit of intelligence, all sorts of wonderful new things can be invented that you and I can't imagine.'"
This discussion has been archived. No new comments can be posted.

Tim Bray on Microsoft Office

Comments Filter:
  • Codename? (Score:2, Informative)

    by furchin ( 240685 ) on Thursday October 24, 2002 @05:01AM (#4520231)
    It is interesting to note that "Office 11" is referred to as a codename for the next version of Office. Really, it is simply the version number. Office XP is Office 10 (start any office app, click on Help -> About and look for yourself). To call it a codename is mystifying it.
  • WTF???? (Score:3, Informative)

    by jericho4.0 ( 565125 ) on Thursday October 24, 2002 @05:06AM (#4520251)
    from the article;
    The most important question, besides if the MS Word XML format will be well-documented enough,...

    WTF!? XML shouldn't need to be documented. The whole point is to create a human readable file that is parseble by computer. If MS Word delivers an XML file that I can't figure out, it's not XML.

  • by twoshortplanks ( 124523 ) on Thursday October 24, 2002 @05:22AM (#4520309) Homepage
    I've used the excel reading and writing modules for Perl with great success. They're easy to use and do the job. (there are also simpler [cpan.org] interfaces [cpan.org] if you want them too.)

    Or you could go the whole hog and use a SAX writer like XML::SAXDriver::Excel [cpan.org] to create the documents from XML yourself.

    (This is not to say I don't think XML native formats arn't cool and will have many uses, I'm just pointing out what you can do now.)

  • by BurritoWarrior ( 90481 ) on Thursday October 24, 2002 @05:55AM (#4520418)
    IBM release th framework in which to do so because of the governmental investigation they were under at the time.

    They didn't do it out of the goodness of their hearts, but they did indeed do it. It wasn't the complete bios though so Compaq had two teams...one team looking at the specs, and another (that could never look) building a clean room implementation.
  • by Korth ( 50341 ) on Thursday October 24, 2002 @06:05AM (#4520433)
    I've recently been reviewing a dozen of different software to convert from Word to XML.

    So far the best tool I found is upCast (free for personal use) from http://www.infinity-loop.de/ .

    To convert a Word file:
    * Use Word's AutoFormat feature to convert visual formatting to Word styles
    * Redefine all the text as Word styles
    * Run upCast to convert to XML using the "XML (content, no DTD)" filter
    * Run HTML Tidy from http://tidy.sourceforge.net/ with the parameters -xml -utf8 -clean -bare .

    Other tools that might be worth a second look:
    * Majix (Open Source) - http://www.tetrasix.com/
    * WorX SE - http://www.xyvision.com/
    * XML MarkupKit (in German) - http://www.eds.schema.de/download/MarkupKit/
    * DocSoft LLC Word-to-XML - http://www.docsoft.com/w2xml.htm
  • Re:I doubt it. (Score:2, Informative)

    by jkramar ( 583118 ) on Thursday October 24, 2002 @06:14AM (#4520463)
    That's all fine and good if
    1. you don't mind having to buy Office just to modify Office files
    2. you're on Windows
    . Actually, the various APIs are probably there on Macs as well. However, if you're on Linux, then you're stuck. OpenOffice, Abiword, et al. do a reasonable job of reading Office files, but can't quite read everything perfectly, and the fact that Office documents are binary dumps instead of nice, legible XML doesn't help. This is, as I think many readers have realized, a significant advantage that an XML file format would lend. If they carry this out, then developers of other apps, such as competing office suites and member programs, whether free software or not, will have a much easier time reading and correctly interpreting these documents.
  • Re:HTML from Word (Score:5, Informative)

    by superyooser ( 100462 ) on Thursday October 24, 2002 @06:24AM (#4520481) Homepage Journal
    True. Just a couple days ago, I saved a doc as Web Page in Word (Office XP) hoping that some clipart would be saved in a web-friendly format. (This was originally made in Publisher, NOT by me.) It didn't work; it saved the images as .wmz! For the web?!

    Anyway, there was tons of gibberish in the file, but it displayed fine in IE6. It was a completely blank page in Mozilla! Nothing at all! We always knew the XP didn't stand for cross-platform, but I didn't know it was this bad.

  • by Jugalator ( 259273 ) on Thursday October 24, 2002 @06:34AM (#4520503) Journal
    Whats wrong with HTML and CSS2 for all your word processing?

    I don't think the new XML format is meant for documents you wish to publish on the web. Office already support the HTML format pretty well (with some extensions.. ahem) since Office 2000. HTML support works even better in Office XP since it allow you to save the document as "filtered HTML", where Office filters most of the Office-specific tags and attributes at the cost of loosing some information in the document.

    I think the XML format is being added since XML represent the document with a much more meaningful structure that's easier to parse by third party software for use in electronic commerce and other automated systems, something that's inappropriate to use HTML code for, as it was designed to make pretty layouts, not to describe content for easy parsing.

    I think it's pretty obvious why MS would want to add XML support - to spread their Office document format and make Office useful in places such as web services where it wouldn't be as useful before.
  • Read the article? (Score:5, Informative)

    by gazbo ( 517111 ) on Thursday October 24, 2002 @06:44AM (#4520533)
    This bloke said he has had extensive access to the alphas and betas. He also said how great it was.

    So unless your mind has been slashdotted to the extent that you think that Microsoft is going to suddenly change the file-format completely between beta and release, then we know that it is perfectly easy to read.

    And if you do believe they will change the format, then you are a moron.

  • by frleong ( 241095 ) on Thursday October 24, 2002 @06:46AM (#4520535)
    Here at MSDN [microsoft.com]

    It is simply not what others is claiming: <?xml version="1.0"><data>blahblah</data>

  • by AnEmbodiedMind ( 612071 ) on Thursday October 24, 2002 @06:52AM (#4520549)
    From OpenOffice [openoffice.org]:

    The OpenOffice.org XML project contains support for and implementation of the XML based file format.

    Mission
    Our mission is to create an open and ubiquitous XML-based file format for office documents and to provide an open reference implementation for this format.

    Core Requirements (these items are absolutely required)

    • The file format must be capable of being used as an office program's native file format. The format must be "non-lossy" and must support (at least) the full capability of a StarOffice/OpenOffice document. The format is likely to be used for document interchange but that use alone is not enough.
    • Structured content should make use of XML's structuring capabilities and be represented in terms of XML elements and attributes.
    • The file format must be fully documented and have no "secret" features.
    • OpenOffice must be the reference implementation for this file format.
    Core Goals (these items are highly desired)
    • The file format should be developed in such a way that it will be accepted by the community and can be placed under community control for future development and format evolution.
    • The file formats should be suitable for all office types: text processing, spreadsheet, presentation, drawing, charting, and math.
    • The file formats should reuse portions of each other as much as possible (so for example a spreadsheet table definition can work also as a text processing table definition).
    Standardization and Inter-Office Cooperation
    There is a office_standards mailing lists hosted on this site, intended to foster cooperation between the various office suites. At this early state no results have been achieved, but we are certainly excited about the prospects. For details, look at http://xml.openoffice.org/standardisation/ .
    Its on its way... maybe
  • by Anonymous Coward on Thursday October 24, 2002 @07:21AM (#4520628)
    This is already in hand. Sun are taking the OpenOffice XML file format to OASIS [oasis-open.org] for standardisation. Something should be announced about the formation of a working group on this real soon now.
  • by JohnFluxx ( 413620 ) on Thursday October 24, 2002 @07:30AM (#4520657)
    XML does support encryption of its data...
  • by TummyX ( 84871 ) on Thursday October 24, 2002 @09:57AM (#4521642)

    They've already shown with .NET that they can make an entire programming framework (and at least 3 assocated languages) into an open standard and even have them ratified by the ECMA and maybe even ISO.


    That's not true. Only C# has been submitted to ECMA. VB and JScript.NET have NOT.

    The CLI submissions are only a small subset of the .NET framework. This is for a good reason, most of the .NET framework relies on Windows services (System.DirectoryServices, System.Windows.Forms, System.EnterpriseServices, ...).

    C# and the CLI does NOT make up a platform like Java. It's more like C. Both C# and C provide a basic set of classes. Anything more 'advanced' is provided through extension libraries that may or may not be cross platform (just like C). You could write a sound library for C# that uses DirectX and it would only work on Windows. On the other hand, you could write a sound library for C# that uses OpenAL. It would work on all platforms where OpenAL is supported.

    Many features that Java has such as GUIs, Telephony, Speech, Sound, 3D etc aren't supported by .NET and certainly won't be standardised. Sound support will be added by Microsoft in the future but it will use DirectX (obviously NOT cross platform).

    The cross platform hopes for C# pretty lie in OSS hands. It is up to the OSS community to write 'standard' cross platform libraries for C# (just like we have for C). C# interfaces nicely with C so it is likely that many cross platform libraries for C# will use the corresponding C libraries.

    As you can see, the CLI is much more like C+GLIB than the "Java Platform".

    Java is a meta-operating system. It a huge set of APIs consistantly on all platforms.

    C#/CLI does not always provide a consistant API on all platforms but it allows and encourages you to rely and exploit on the native APIs available on the underlying operating system.

    Which is better? It really depends on what you want. Java is obviously the only choice for cross platform development (atm). C# however appears to be a good replacement for C -- especially on the client side. It complements the underlying operating system whereas Java tends to hide it. That's why you will see a lot of C#/GTK# applications for Gnome in the future but not many Java/GTK applications.
  • by magi ( 91730 ) on Thursday October 24, 2002 @09:59AM (#4521661) Homepage Journal
    Doing XML stuff with OpenOffice is supergreat. It took me half-an-hour to study the format enough to write a XSLT parser that extracts all strings from an OO document.

    Now I wrote, just for demonstration, the following XSLT example in just a few minutes, useable directly with xsltproc in Linux.

    The example prints all the Heading paragraphs in a OO Writer document, indented according to the header level.

    <?xml version='1.0'?>
    <xsl:stylesheet
    xmlns:xsl="http: //www.w3.org/1999/XSL/Transform"
    xmlns:office="ht tp://openoffice.org/2000/office"
    xmlns:style="htt p://openoffice.org/2000/style"
    xmlns:text="http:/ /openoffice.org/2000/text"
    xmlns:table="http://op enoffice.org/2000/table"
    xmlns:draw="http://openo ffice.org/2000/drawing"
    xmlns:fo="http://www.w3.o rg/1999/XSL/Format"
    xmlns:xlink="http://www.w3.or g/1999/xlink"
    xmlns:number="http://openoffice.org /2000/datastyle "
    xmlns:svg="http://www.w3.org/2000/svg"
    xmlns:c hart="http://openoffice.org/2000/chart"
    xmlns:dr3 d="http://openoffice.org/2000/dr3d"
    xmlns:math="h ttp://www.w3.org/1998/Math/MathML"
    xmlns:form="ht tp://openoffice.org/2000/form"
    xmlns:script="http ://openoffice.org/2000/script"
    version='1.0'>

    <xsl:output method="text" encoding="ISO-8859-1"/>

    <!-- Print all headings, indented. -->
    <xsl:template match="text:h">
    <xsl:value-of select="substring(' ', 1, (@text:level - 1) * 2)"/>
    <xsl:text>* </xsl:text>
    <xsl:value-of select="text()"/>
    <xsl:text>&#xa;</xsl:text>
    </xsl:template>

    <!-- Don't output any other text. -->
    <xsl:template match="text()">
    </xsl:template>
    </xsl:stylesheet>

    The result would be something like:

    * Top-level heading such as a chapter
    * Second-level heading (section)
    * Another section
    * Subsection
    * Subsubsection
    * Yet another section
  • by twoshortplanks ( 124523 ) on Thursday October 24, 2002 @10:05AM (#4521723) Homepage
    Why not just have Excel export the file as CSV?
    Oh, you can do that...but I've come across numerous problems while doing this. For a start, you lose the metadata about cells (i.e. if it's a formula or a string or a number with $foo number of decimal places.) You have problems associated with using multiple workbook speadsheets (annoying if you've ever had to use them.) CSV is okay (and I've used it quite a bit) but it simply doesn't hold as much info as the original file.
  • by tbray ( 95102 ) on Thursday October 24, 2002 @10:22AM (#4521858) Homepage Journal
    I've seen the native Word XML format (alpha mind you, so it might get changed). It isn't exactly pretty, and if I had to write code to extract all the paragraphs that contained the word "foo" in bold it would give me a bit of a headache, but I could do it.

    The word "foo" in bold single-underline looks something like

    <r>
    <rf>
    <rp class="bold" />
    <rp class="underline" lines="1" />
    </rf>
    foo</r>

    Yeah, it's pretty verbose.

    Near as I can tell, it is 100% round-trip-able, i.e. you save as that file format, you read it in again, you hit ctl-S and it saves again; about as good as a native format. Now someone needs to write some script-ware to run Word in batch mode to xml-ify server directories with zillions of office docsl

    I think the reason MS is doing this is obvious. Look at their financials - they *really* need people to upgrade to the new version of Office. End-users don't buy Office any more, CIOs and the like do. These people are just not gonna be impressed by another new word-processing feature, but they might be motivated to upgrade if they thought that they were opening up all their data to re-use by other programs.

    I expect that with any luck we'll get a secondary industry built around doing cool unexpected stuff to Office docs. Don't want to sound over-excited here, but a huge amount of all the intellectual capital in the world is sitting around in Office docs, and this makes it noticeably more re-usable. Has to be a good thing.

    Cheers, Tim
  • by Anonymous Coward on Thursday October 24, 2002 @11:42AM (#4522492)
    IBM was under antitrust restrictions to licence hardware technology under Reasonable and Non-Discrimatory licence.

    The BIOS was half the story, but IBM also held patents on "ISA", CGA, the disk interface, etc. Clone-makers just bought licences for these parts right from IBM (@ only about $5/PC).

    If it wasn't for the "plug-compatible" anti-trust wars in the mainframe market, the PC would have never been cloned,
  • by donutello ( 88309 ) on Thursday October 24, 2002 @11:57AM (#4522633) Homepage
    What a bunch of pseudo-technical garbage!

    I have a Masters in Computer Science with a focus on databases and storage technology and very little of what you said makes any sense to me. There's nothing easier than getting at data stored in SQL. Where I work, we've shipped a few products where we didn't document the schema because it was too complex and we didn't feel we could support it. Within weeks, almost all of our major customrs had it reverse-engineered anyway. SQL is very easy to get at!

    kernel level SQL data

    There's no such thing. SQL data is stored in tables. You use queries to get at it. Period.

    Also, your story doesn't make any sense. The article says Office 11 is in Beta already. IIRC, the SQL Server and Palladium stuff in the OS doesn't come until Longhorn. Do you think they will actually release a version of Office which won't work until their next OS (who knows when that will be) is released and adopted? How will they make money off all the people who recently upgraded to Windows XP then?
  • by sir99 ( 517110 ) on Thursday October 24, 2002 @12:07PM (#4522753) Journal
    GSView [wisc.edu] (requires Ghostscript) works pretty well on Windows. It's also free beer/speech, depending on which version you get (old versions get relicensed as GPL when a new version is released). As for editing, I don't know of anything besides acrobat that edits PDF directly.
  • by donutello ( 88309 ) on Thursday October 24, 2002 @12:48PM (#4523135) Homepage
    I think the reason MS is doing this is obvious. Look at their financials - they *really* need people to upgrade to the new version of Office. End-users don't buy Office any more, CIOs and the like do. These people are just not gonna be impressed by another new word-processing feature, but they might be motivated to upgrade if they thought that they were opening up all their data to re-use by other programs.


    Uhh.. from this article [microsoft.com].

    Information Worker turned in healthy revenue growth of 26 percent, reflecting customer adoption of Microsoft Office XP through multi-year licensing programs. Customers acquiring Office this quarter included ChevronTexaco, Lockheed Martin, MetLife, Newell Company (Rubbermaid) and the US Department of the Army, Program Executive Office, Aviation.

    and

    Microsoft Corp. today announced revenue of $7.75 billion for the quarter ended Sept. 30, 2002, a 26 percent increase over revenue of $6.13 billion for the same quarter last year. Operating income for the first quarter was $4.05 billion, compared to $2.90 billion in the same period last year. Net income and diluted earnings per share for the first quarter of fiscal year 2003 were $2.73 billion and $0.50, which included an after-tax charge for investment impairments of $291 million or $0.05. For the same period of the previous year, net income and diluted earnings per share were $1.28 billion and $0.23, which included an after-tax charge for investment impairments of $1.22 billion.

    "Results for the first quarter were exceptionally strong, exceeding our expectations. During the quarter, we saw broader customer adoption of our licensing programs than we anticipated, as customers recognized the value of entering into long-term licensing agreements for our products. This strength in licensing led to solid growth for Windows® XP, Office XP and .NET Enterprise Servers," said John Connors, chief financial officer at Microsoft. "Consistent with our view at the outset of this year, the global economic outlook continues to be uncertain, however we remain committed to making the investments necessary to drive long-term product innovation and customer value across our businesses."

  • by Anonymous Coward on Thursday October 24, 2002 @02:47PM (#4524108)
    If this [microsoft.com] is "unreadable" or "obfuscated", then you've got your eyes closed.

Thus spake the master programmer: "Time for you to leave." -- Geoffrey James, "The Tao of Programming"

Working...