Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
User Journal

Journal Quantum Jim's Journal: DOCTYPE declarations for versioning information

Over at Anne's journal there is a debate about using DOCTYPE declarations as versioning information. For example, the external subset for HTML 3.2 is different from the external subset for the HTML 4.01 family. There are also different external subsets for each "subversion" in the HTML 4.01 family. i.e. Transitional, Strict, and Frameset versions. Some people think this doesn't work. My opinion is that DOCTYPE declarations can be used to specify version information for the following reason:

  1. Say you have two documents with identical DOCTYPE decarations. Both the internal, external, and root element declarations are the same. Then you can say their syntactic doctypes are identical. The structure of each document must conform to the same SGML rules.

  2. The two documents could have different semantic doctypes though. That is, the meaning content in one document could mean something totally different in the other one even though they both conform to the same syntax rules in the DTD. For example, say the "rel" attribute is defined with character data content in the DTD. One document may specify that the rel attribute specifies a relationship and should be a character string; the other could specify that those attributes specify links and should be URI references.

  3. There could also be syntatic differences not captured by the syntatic doctypes. For example one spec may indicate that "name" and "id" attributes MUST have identical content if both are present; however, this is impossible to specify in a DTD.

  4. Therefore, the content of the external and internal subsets can not be used to differentiate between languages or versions of a single lanugage like HTML. There could be different syntatic or semantic meaning not captured by the doctype.

  5. External subsets are specified with either a Formal Public Identifier (FPI) or a URI reference to a DTD. However in practical applications a FPI uniquely identifies a resource just like a URI, so I'm going to assume both are the same for this line of reasoning, and I'll call both DTD names. DTD names have the following property: the owner of the name gets to determine what it means. For HTML's DTD names specified by the W3C, only the W3C gets to say what they mean, for example.

  6. Therefore, it is legal for a DTD names to indicate the doctype for a single language including semantic and syntatic requirements not captured in the content of the DTD. The meaning of the HTML 4.01 Strict doctype string is unambiguous even though the DTD's content may not specify all of the semantic or syntatic requirements of HTML 4.01 Strict.

  7. If you change the external subset's DTD name, it may or may not refer to the same language. Even if the external subset contains the same content, other requirements not encoded by the DTD could be different. Therefore, you can ONLY use the cannonical DTD names for unambiguously identifing the resource with third parties.

  8. Even when using cannonical DTD names, if you change the root element then you might no longer conform with the specification. For example, changing the root element while using HTML 4.01 strict's DTD name violates the global structure semantics of HTML 4.01 strict. Therefore, the document is not valid HTML even though it is syntatically valid SGML. Note that the DTD name still specifies the particular language you are using even though the content of resolving the DTD name is not enough to validate the document as HTML.

  9. If you add an internal subset, then the meaning of those changes is undefined even though the syntax of those changes specified as well as can be by the content of the subset. The content of the internal subset simply cannot capture the semantic or all the possible syntatic requirments you specify. Therefore if anything in the internal subset conflicts with the HTML 4.01 strict's external subset, or additional elements or attributes or attribute lists are defined, then the resulting lanugae is not HTML even though it is valid SGML for example.

  10. There is an exception to the above point. If the internal subset contains entity declarations with valid HTML content as its content (even though the entity by itself may not be valid HTML content), and those entity declarations don't interfere with the HTML DTD, then the meaning of those entities is clear (it is defined by SGML) and the specified syntax of the external subset is unchanged. Therefore, it is still HTML of the specified version for example.

Thus if you use the same internal subset content (with an exception), the same external subset declaration, and same root element declaration as the HTML language version you are declaring, then your document is HTML. If you change anything (with an exception), then it can never be unambiguously determined to be HTML by a computer.

If you can specify different languages using the above rules, then you can specify different versions of a language family using the above rules. Every version of a language family is a different language. They may share certain semantics, but they are not compatible except as explicitly defined by the language family's specification.

Therefore, you can use DOCTYPE declarations to specify version information. Q.E.D.

This discussion has been archived. No new comments can be posted.

DOCTYPE declarations for versioning information

Comments Filter:

This file will self-destruct in five minutes.

Working...