Comment Re:Holding off using it for other reasons (Score 1) 265
I don't really care if it's served as XML or not, the point is that if it's not well formed XML it becomes a massive ballache to deal with, because XML tools and libraries are so prevalent.
It's only syntax, it shouldn't be a big deal. There's plenty of XML-based tools that are useful, and HTML5 goes to some lengths to define the text/html (i.e. non-XML) syntax so you can still use those tools and just translate the syntax at the edges.
The text/html and XML syntaxes are based on exactly the same underlying conceptual model (the DOM tree), so you can switch without any radical changes. E.g. the validator.nu HTML5 parser implements the same APIs as standard XML parsers - drop it in front of your existing XML tools and libraries, stick an HTML serialiser on the other end, and your system can work pretty much the same as before (with the bonus of working for any arbitrary page on the web, not just the tiny fraction that are well-formed XML).
The ethos surrounding HTML5 is that well, lots of old sites didn't follow newer standards, so lets make those web sites standard by taking everything they did shit, and making that standard.
Who is helped by a standard that almost everybody ignores? If you, say, want to write code to parse HTML pages, and you try to implement what HTML4 specifies (based on SGML), your code will be pretty useless because HTML4 is incompatible with reality and you'll get incorrect output most of the time (stray characters, incorrectly nested elements, half the page text disappearing inside a misparsed script element, etc). Similarly if you implement what XHTML specifies, you'll fail since most pages aren't well-formed XML. You can declare that those pages are broken and non-standard but that doesn't stop them from existing and being a serious problem for anybody writing software that interacts with the web.
Nowadays you can just implement what HTML5 specifies (or find a library that already does it), and your parser will work identically to the current or near-future versions of all major browsers - it's defined in enough detail that there's no ambiguity in how to process any stream of bytes. That's never been possible before, when the standards were focused on some vision of a simple coherent syntax and refused to deal with the messy details that are critical in real life.
If you want to document a set of best practices for writing HTML, with rules for lowercase names and closing tags and quoting attributes and for indentation etc, that's fine and would be nice (especially if you could find a way to motivate people to follow the best practices - a decade of promoting XHTML doesn't seem to have stopped people writing terrible code so we need a better way). Meanwhile, HTML5 is solving the harder problem of how to cope with people who ignore those rules.