I think you're wrong. From the coverage I've read, it's a method of processing and manipulating XML documents, and they designed an piece of XML editing software around it which they showed to Microsoft and Microsoft then stole the ideas from.
News coverage of technical things is so effing horrible. Most tech articles are written by people who don't understand programming but don't see why that should stop them from broadcasting their misinterpretation of technical information. You should just read the patent; most of it is very clearly-written.
It does not predate XML, and has nothing to do with XML-based standards.
Filed in 1994, it does predate XML. It doesn't predate SGML, though, and since core XML is essentially the same thing, it's probably safe. However, I it does affect XML-based standards -- specifically the ones that separate content from structure/presentation.
The Patent
It's a way to separate content from structure. So, for example, where and SGML document would store data like "<p>Hi <i>friend</i></p>", they store it as two separate pieces of data. The content piece would be "Hi friend", the structure piece would be "0:p, 3:i, 9:/i, 9:/p" (roughly). So now if you wanted to format that document differently, you could just use a different structure piece; the content piece doesn't change.
This exact technique obvious, so I don't think it should have been awarded a patent. But maybe what's obvious to us in 2009 may not have been obvious to the patent examiner in 1994 and, in any case, it doesn't look like any of the affected parties are going to try and argue obviousness. The important question is how generally will their technique be interpreted?
Taken narrowly, it's a way of putting XML-like tags in a separate file, mapping them back into the content using byte offsets. This is easy enough to work around. Taken broadly, it's a way of separating content from structure. So, any time you augment the content in one file by some kind of annotations in another, you're violating their patent. So HTML and CSS are problematic because the style information is in a separate file, even though the mapping is done using tag and class names and not using byte offsets.
I don't know much about patent litigation, so I don't know how much leeway they give plaintiffs. But I doubt Microsoft Word uses their exact technique; they probably do something similar to HTML+CSS or XSLT. So this victory could indicate that the courts are interpreting the technique broadly. Which sucks. Man, patents like this are killing the industry.
and "colleges" are usually either private religious based high schools or technical/vocational training institutes
Not where I'm from. [long list of "colleges" in the US]
I think marxz was continuing on the "Outside of America" theme of the parent post.
Man, the BetaNews article is horrible. Practically everything — except for the direct quotes from the Google blog post — is incorrect. I somehow expect more from someone who goes by "Scott M. Fulton, III".
Google's public documentation shows Protocol Buffers (which has yet to be formally abbreviated) is indeed conceptually different from XML, in that it's rooted more in procedural logic than structural declaration. In XML, there's a schema which defines the structures of tables and recordsets, which is separate from the document that relates the contents of records in that structure.
Nope, they're conceptually the same. The ".proto" files are like DTD or XSD. The actual document data is stored in a binary format (though there's also a text representation). The data manipulation API is similar what you get from Castor or JAX-B.
But here, in an unusual departure from the norm, the default values for these members are set to digits (for strings or literals) or values (for numerals) that define their place in a sequence -- where they fall within a record. Imagine if data were streamed onto recording tape, the way it used to be in the late 1960s and '70s. It's that streaming of the data sequence, without all the fenceposts, that differentiates XML from Protocol Buffers, by taking out all those markups that say when an entry or a record starts and stops.
The "= number" at the end of a field definition is not a "default value". It is a numeric tag that identifies that field. That said, "= number" is quite unintuitive syntax; maybe something like "@number" would have been less confusing.
Looking at some of the documentation, I don't think the aforementioned numbers directly index the field's location in the record. They lay down the present fields one after another, probably putting each field's tag number before the field data. This also allows them to avoid sending fields that use the default value. So they still need to specify how long each record is — either with "fenceposts" between records or a "length" specifier before each record.
"The four building blocks of the universe are fire, water, gravel and vinyl." -- Dave Barry