Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Data Storage

Journal evought's Journal: ECMA-376 (OOXML) ECMA Responds to Comments

As reported on Groklaw ECMA has responded to comments from the ISO/JTC 1 Fast Track 30-day comment period. The original comments are included in an appendix at the end of ECMA's document. I found many of the comments enlightening, particularly the continued confusion expressed by both ECMA and the ISO representatives over how the "Fast-Track" process is supposed to work and the fact that "conflict" is not defined even though over 200 standards from ECMA have used this process so far.

Not as many countries gave negative comments as were expected. Interestingly, the American National Standards Institute (ANSI), the US representative to ISO, abstained from commenting. ECMA-376 must still go through a five month discussion period and be passed by a two-thirds vote of a thirty representative committee. Computerworld states that it expects eleven countries to vote against at this time, but this, of course, can change.

Below is the text of an email I sent to ANSI. We'll see if anyone reads it.

It has come to my attention that ANSI abstained from participating in the 30 day comment period for the ISO/IEC JTC 1 Fast Track process for the ECMA-376 standard "Office Open XML File Formats" due to an inability to reach consensus. As a business owner, former IT professional, former member of the Austin Group technical committee, and interested party in the recent efforts to standardize the preparation and storage of electronic government documents, I am writing to express my deep concerns with the proposed standard.

First of all, as is echoed by a number of ISO representatives and IT professionals, ECMA-376, a specification derived from Microsoft Office 2007's Office XML format, obviously conflicts with and duplicates the scope and purpose of ISO 26300 (ODF): to support common office formats, including word processor documents, spreadsheets, etc. It is confusing and counter productive to have two standards promoted by the same organization for the same purpose. ECMA glosses over this issue by stating that other arguably overlapping ISO standards such as HTML and PDF or SVG and CGM exist. In these cases, however, there are clear discriminating factors between the formats, such as the different industries and needs served by CGM (CAD and industrial design) and SVG (web and graphical design).

ECMA also states that ECMA-376 and ISO 26300 serve different markets, specifically that while ISO 26300 was designed from the ground up to serve existing and future document needs in a sensible and standards conformant manner, ECMA-376 was also designed to serve the needs of storing legacy documents which were stored in a number of existing binary formats. It may be argued from examination of the ECMA-376 specification that the data model of these legacy formats was the driving design factor.

While this does, indeed, distinguish the purpose of the two formats, the latter is of dubious value as an international standard. ECMA-376's distinguishing feature becomes that it takes a number of obscure, complex, and opaque binary file formats and converts them to a single monstrously complex (over 6,000 pages) obscure, complex, and opaque text format which contains numerous references to behavior in legacy applications which is never described. Rather than reference existing ISO standards for, e.g. times and dates, inline graphics, percentages, colors, citations, etc., ECMA-376 defines its own legacy encodings, in some cases, multiple conflicting encodings for similar data. It is difficult to see why anyone would want to recommend this format for new documents and it makes little sense to create a standard which is deprecated from its inception.

There are many examples of strange format choices within the ECMA-376 document structure. There are two calendars for expressing dates. One is based on a Gregorian calendar with a 1904 epoch, the other has a 1900 epoch with an (incorrect) assumption of a 1900 leap year. Rather than placing the burden of normalizing date stamps on a conversion program, the format continues to propagate a bug from Lotus 1-2-3 date handling. Percentages are expressed inconsistently and sometimes bizarrely. In some instances they are bare integers, such as "71" (in contrast to HTML "71%"). In some places they are expressed as integer fiftieths of a percent, so that, for instance, "200" represents "4%". In another place, they are represented as discrete constants, such that "pct87" represents "87.5%" [not a typo]. This is not a single technical detail nor a series of technical details, but rather an overall design choice for ECMA-376 to remain as close to the legacy data models as possible, thus requiring applications to retain those data models in perpetuity. ISO-26300, by contrast, puts the burden on conversion programs to retain knowledge of the legacy structures and normalize the data when producing a conformant document.

This design is most clearly expressed in ECMA-376 by tags like "autoSpaceLikeWord95" which requires the application to duplicate the unspecified behavior of a legacy application.

When faced with converting a legacy document, an application has several choices with regards to some of these obscure features and idiosyncrasies:

  1. Convert the document 1:1, leaving legacy features intact. This will mean that many applications, even though technically ECMA-376 conformant, will not render the document correctly. Indeed, Microsoft's own products have been criticized for mishandling their own legacy formats. It is difficult to see how the proposed standard adds to interoperability in this case.
  2. Attempt to convert the legacy schema, with or without human intervention, to adhere to modern conventions, such as converting broken dates and reinterpreting legacy layout options in terms of modern features. Here again, it is difficult to see what benefit ECMA-376 adds as ISO-26300 was specifically designed for this case.
  3. Convert to ISO-26300 1:1 leaving legacy features intact by using ODF extensibility features. Essentially, key-value pairs can be added to express legacy constraints like "autoSpaceLikeWord95" for applications to interpret if they wish. This has the benefits of not requiring a new standard and reusing existing standards for internal data such as times, dates, colors, languages, percentages, etc. Rendering and interoperability would be no worse than for the majority of ECMA-376 applications. ISO-26300:2006 may be extended, if necessary, to standardize some of these keys.

The ECMA-376 format is of clear value to Microsoft in supporting its legacy products and existing applications suite. By documenting this file format, Microsoft allows others to find value in the format as they will and to increase interoperability with Office 2007. Microsoft certainly deserves the community's thanks for this action. However, ECMA-376 is large, overly complex, does not promote general interoperability, and is of dubious value to the international standards community. Having two similar and competing standards will confuse the marketplace and cause balkanization of document storage. The lack of standardization within ECMA-376's internal structures duplicates the work of existing, mature standards, and locks application developers into perpetual support of legacy, non-standard data schema.

I hope that ANSI will pay due attention to this standard as the JTC-1 process progresses.

This discussion has been archived. No new comments can be posted.

ECMA-376 (OOXML) ECMA Responds to Comments

Comments Filter:

The key elements in human thinking are not numbers but labels of fuzzy sets. -- L. Zadeh

Working...