So the patent works like this: Instead of storing markup within a document, you instead store the markup separately from the raw data and then map each markup element to a character position in the raw data, like this:
--Original document--
<foo>This is a foo</foo><foo><bar>This is a foo bar</bar></foo>
--i4i patented storage--
Raw document:
This is a foo This is a foo bar
Metadata Map:
1 <foo> 0
2 </foo> 13
3 <foo> 14
4 <bar> 14
5 </bar> 31
6 </foo> 31
The idea is that you should be able to edit the raw data, or the markup, independently of one another. The patent outlines three core scenarios: 1) Taking an existing document with inline markup and separating the text and the markup, 2) Generating a "separate data and markup" document from scratch, and 3) Combining the markup and raw data of a doc generated from scenario 1 or 2 back together to produce a document with the markup inline.
So why is this neat? The patent claims that you can edit both the content and the markup independently of one another. Except that you would require a specialized editor that manipulates both components to be able to do this and still maintain the "mapping" of markup to raw data. Hate to say it, but I can already do this on normal, inline-markup documents using notepad, or any WYSIWYG HTML editor.
The other claim is that you could apply any map to any raw data. Except that, unless the character positions of semantic elements in the raw data were exactly where the "Metadata Map" expected them to be, the result would be a huge mess. Practically speaking, the application of a metadata map to multiple documents (since the map is based on character position) would most likely require additional inline tags to align the separate metadata to the content, thus defeating the whole purpose of the patent. Or maybe you could establish a "standard sentence length" in order to allow one map to be applied to different documents - that would be great. :P
I'm having a hard time understanding how the technology described in this patent is actually useful at all, let alone how Microsoft has infringed on it. In fact, if they *did* actually use this technology, then they deserve to be punished for using a stupid idea.