
Online Comics Syndication in XML 57
gravling writes: "Jason McIntosh has written an interesting article on XML.com about ComicsML, a language he's invented to allow online comics artists to describe and syndicate their work. Using ComicsML can let you do similar things to the UserFriendly search engine, but on a web-wide basis."
Re:Somewhat offtopic -- why UF? (Score:1)
We all know that if there are thought police on any /. subject anymore, it's User Friendly, as I can see from your present moderation total.
Of course, I don't think Friends or 3rd Rock From the Sun are very funny either, but apparently millions of other people do. The things that pass for humor seem awfully subjective.
--
Re:ANOTHER grammar? (Score:1)
As a sidenote I recommend that anyone not quite sure about XML/Schemas check out XMLSpy [xmlspy.com] (Disclaimer: I don't work for that company, nor do I know anyone who does. I just think it's a very nice program for getting practical experience relating to XSD schemas, etc.).
Usar Freindly? (Score:4)
Usar Freindley [somethingawful.com], Lunix friend.
(Yuo are WORST comic evar [somethingawful.com].)
Re:The way comics were meant to be read... (Score:1)
PS: the first panel is empty.
--
mrBlond
Easier online distribution? (Score:1)
That coupled with the fact that most independent comics don't even make there money back should be a fantastic motivator for going online but unfortunately, it just hasn't stuck. Personally, I plan to implement ComicsML as soon as possible...
Ten-fold, eh? (Score:2)
I think I speak for anyone who has actually written an XML parser when I say... What?
Of course XML is easy to parse. The difficulty in parsing HTML derives from it being widely abused. You can't rely on HTML to be well-formed when browsers like IE literally don't require you to close any tags you open (Closing a _table_ is optional, even. Whose bright idea was that?) In contrast, omissions like that simply aren't an option in XML. If your document isn't well formed, the parser won't try to parse it. End of story. (And if the parser does try, the parser is broken) Incidentally, people that write applications that use XML aren't writing the code to validate if a document is well-formed. If they are, they're wasting their time. Use a library, there are plenty of them, for virtually every popular language.
Now, whether or not the document conforms to a DTD, yes it's somewhat silly to post your DTD on a server that isn't readilly accessible. And we all know there's no such thing as 100% uptime, so what's the Right Answer?
So it's not the holy grail. Only a fool would say it is. But it's a much better option than everybody just making up their own (often binary) formats for describing things, because it sets the ground rules.
Ten-fold, eh? I'd love to hear specifics.
Re:ANOTHER grammar? (Score:1)
Re:ANOTHER grammar? (Score:2)
> Your post got me to (re-)look into Guile, but I was wondering if you (or anyone) had any more specific thoughts on what formats to use for configuration files, and what in particular you do with Guile that replaces what you would have done with XML.
I'm not a guru at either Guile or XML, and my use of Guile is evolving pretty quickly now that I have started using it regularly. For now, this is how I use it for configuration files.
I have a "data type" that I call a table, which is of the format (key data). The data can be more tables, giving a tree structure to the configuration, or it can be bottom-level data, serving as the leaves in a tree of data. So a simple example of a configuration file for a fictitious game would be:
[Sorry; I had to remove the indentation to get it past Rob's lame-o lameness filter. Lack of indentation really reduces readability.]
(configuration
(difficulty-level 7)
(sides
(good
(description "The Good Guys")
(restrictions no-nukes no-poison)
)
(bad
(description "The Bad Guys")
(restrictions none)
)
(ugly
(description "The Ugly Guys")
(restrictions no-teeth)
)
)
(graphics
(size (x 550) (y 490))
(theme penguins)
(images "mytiles.xpm")
(animation-speed
(chase-scenes 12)
(love-scenes 3)
)
)
)
In the "tree" metaphore, configuration is the root, difficulty-level, sides, and graphics are the first level of branches, etc., on down to the leaves where the sub-tables terminate in atomic data.
For simplicity, the following uses pseudocode rather than the actual Guile syntax.
You declare appropriate variables of the Guile SCM type and then use Guile's read to load the configuration file into your program as a Scheme object (without trying to evaluate it as a Scheme expression).
conf=read("~/.mygame/myconfig.scm")
Since you are using tables, you use Guile to define a function lookup(keyname,table) that converts the key-name string into a Guile symbol, and then looks it up in table table:
grap=lookup("graphics",conf)
speeds=lookup("animation-speed",grap)
tmp=lookup("chase-scenes",speeds)
...do whatever with the value...
tmp=lookup("love-scenes",speeds)
...do whatever with the value...
Your program just runs down the tree like that, looking for whatever data it needs. When you get to the bottom, you use a Guile built-in to convert the data to an integer, string, or whatever your program expects.
Some data can be iterative, too. In the example, sides is a list that you can modify in your config if you want to define more player sides (say, for AI opponents). Your configuration reader just uses your lookup to find sides, and then iteratively loads one side record at a time until you run out of definitions. (In Scheme terms, you process the car with your lookup function, throw away the car, and continue your iteraton on the cdr.)
The lookup function is really easy to define, and you put it in your library directory so all your programs can use it. It just converts the keyname string to a Guile symbol, and then uses the built-in assoc to find that symbol as the key for anything in the cdr of table. If all your data is in the table format, it works to look up anything, working down the tree recursively. For instance, if you wanted the y size and didn't need anything else, you could do:
y=lookup("y",lookup("size",(lookup("graphics",con
You can also easily define a recursive check_table that verifies that something you loaded is in fact a table structure, in order to trap errors early if a user has screwed up his config file.
The only things I don't use lookup for are the iteration as described above (but even then I use lookup to find the iterative definition, and then use it again to parse each element in the iterative structure), and to get the bottom level data out of the "leaves", e.g. to parse:
(y 490)
I have a library function get_int that accepts a leaf of the form (key integer), extracts the second element, and converts it from a Scheme integer to a C integer, and similarly for other atomic data types.
Also nice, Guile does garbage collection, so you can use it to splice things together out of the configuration and throw away the husks without having to explicitly collect all the trees of objects that you created.
--
Indexed comics (Score:1)
Another example of how most rating/moderation systems suck, like the way slashdot comments/moderation have very low signal to noise ratio. Just adding a bit more noise, thank you!
Re:Easier online distribution? (Score:1)
To this end, Keenspot/space's AutoKeen is brilliant (developed by Nukees [nukees.com]' Darren Bleuel)... artists don't have to worry about programming, they just have to insert simple tags in the HTML that do a lot of behind-the-scenes work.
If it was a pressing issue, Darren could probably develop an AutoKeen indexing system. I'm not sure how useful that would be for some comics... indexing my own strip would be like indexing a novel.
Re:Will this enable me to understand Zippy? (Score:2)
Honestly, Zippy isn't that hard to get if you realize that essentially every joke is based on a reference to pop culture, generally from the '60s or '70s. Of course, there's lots of times that I "get" the joke, but don't find it particularly amusing.
Re:Will this enable me to understand Zippy? (Score:2)
At least two:
"HELLO KITTY gang terrorizes town, family STICKERED to death!"
(Personally, I was exposed to Zippy quotes before seeing the actual Zippy comic strip. I had been hoping that most of the quotes would actually make sense when taken in context.)
Re:How about some Marvel? (Score:1)
http://www.marvel.com/dotcomics/ [marvel.com]
Marvel has made some online editions of its Ultimate line available. Issues #1 of Ultimate Spider-Man and Ultimate X-Men are both up there. Check them out. They are both excellent.
The way comics were meant to be read... (Score:3)
<PANEL>
</PANEL>
<CHARLIE_BROWN ACTION="RUNNING"></CHARLIE_BROWN>
<LUCY ACTION="HOLDING_FOOTBALL"></LUCY>
<PANEL>
<CHARLIE_BROWN ACTION="RUNNING"><THINKING_BUBBLE TEXT="I'm going to kick it this time!"></THINKING BUBBLE></CHARLIE_BROWN>
<LUCY ACTION="HOLDING_FOOTBALL"><GRIN STYLE="MISCHEVIOUS"></GRIN></LUCY>
</PANEL>
<PANEL>
<CHARLIE_BROWN ACTION="FALLING"><SCREAM TEXT="WAUUUGGHH!!!"></CHARLIE_BROWN>
<LUCY ACTION="YANKING_FOOTBALL"></LUCY>
</PANEL>
</STRIP>
Well, if that ain't funny, I don't know what is...
-----------------------
Re:ANOTHER grammar? (Score:1)
I honestly don't see XML as much of a step forward. I'm not saying it's bad; I just don't think it solves any important problems.
-- Kris
Re:comics in one place (Score:1)
Does anybody read "real" comics anymore? (Score:4)
The only comics that do not heavily use panel layout are the 3-6 panel comics found in newspapers. All of the mainstream comics that are popular on the newstand from Marvel, DC or any of the other publishers require laying out 28-32 pages with ~6 to 10 panels per page.
Panels are not necessarily rectangular, they may not align nicely. ComicML seems to actually reduce the expressiveness of a dead tree medium for the sake of making it techie cool with XML.
an unabashed comics fan,
vic
re: Dope Dragon (Score:1)
Will this enable me to understand Zippy? (Score:2)
I mean, Comics I Don't Understand [crimeweek.com] is a useful resource but it assumes that the strip makes at least a particle of sense.
Although I suspect Scott Adams was right -- Zippy has one joke and it's on the reader.
Unsettling MOTD at my ISP.
Re: Dope Dragon (Score:1)
What does it accomplish? Create / Edit animation, maybe even personalize it or distribute it.
Its all about pictures, integrate a smoothe story line. How hard is that?
Re:ANOTHER grammar? (Score:2)
Oh, you can zip it? Great, let me run out and link the zip libraries into my application. What? There's licensing issues? Well, what do I do know?
ZIP [info-zip.org] gzip [gzip.org] and bzip [redhat.com] are all available under very liberal free licenses (no copyleft restriction, OK to use in both closed and open source software).
gzip and bzip2 aren't difficult to use for intermediate (1 to 2 years of experience) C programmers either. I don't know about ZIP because I've never used it, but it's probably not much harder.
Re:ANOTHER grammar? (Score:2)
XML seems like a fairly decent way to store our configuration info (and it will allow an overall configuration to link to other sub-configurations, which is nice for our app.)
But your comment on Guile got me thinking; probably it is overkill. But perhaps the flexibility of a full programming language would be beneficial for configuration, although it may not be meant to solve the same problems as XML.
Your post got me to (re-)look into Guile, but I was wondering if you (or anyone) had any more specific thoughts on what formats to use for configuration files, and what in particular you do with Guile that replaces what you would have done with XML.
Re:Hmm... this pretty blatantly ignores manga. (Score:3)
Thanks for the feedback. I need to clear up this statement; I didn't mean that I based ComicsML's first tagset around Western comics ideas to the exclusion of all else, but rather that I created them based on what I knew best, which I decided to label as 'Western' since I'm not nearly as familiar with manga, only enough to know that Eastern comics have developed their own idiomset, and I didn't want to look like I was ignoring it. (Ironically.
It's important to note that ComicsML's panel-description markup detail logically what's going on, not physically. So there's no giant-sweatdrop tag, no more than there's a Western-style sweat-flying-off-the-forehead tag. ComicsML would, instead, have a this-character-is-nervous tag, or something similar. Things like this are visual idioms that are crucial to the comic, but not so appropriate to its descriptive markup.
As for the other issues you raise, about unusual layout and non-verbal balloons, these are both examples of the many challenges and questions ComicsML has ahead of it. It's pretty much open to all suggestions, right now, and I'm glad you bring these up! Now I invite you and other interested parties to bring them up in email to me, or on the ComicsML mailing list (see esp. the ComicsML resource page [jmac.org]), instead of on Slashdot, where they'll go away in a couple of days. ;)
Bzzt! So sorry, but you lose! Please play again, McIntosh-san!
Okay, thanks.
J
MacOS Open Source [jmac.org]
Re:ANOTHER grammar? (Score:1)
Whew. That's one relief. When I saw that an AC had replied, I was afraid I was about to get flamed for recommending Guile over XML.
--
What a woefully uninformed troll (Score:2)
First of all, XML documents don't need to conform to any DTD in order to be parsed or be useful. Documents that elect to specify DTDs indicate public URNs so that the DTD can be obtained from the network if it isn't present locally. That's why you distribute the DTD with the program. The public URN of a DTD is essentially for backup, in case a local version can't be found. There is no need to hit a remote server to parse or validate an XML document. No developer in his or her right mind would intend or require this.
I guess this is opposed to "superior" minds who spend their time groking knock-off Unix-isms a decade or two out of date. Are you really making this argument (in public, no less)? XML is a simplified version of SGML, which has been around for years, and is NOT easy to wrap your brain around if you're not a "document head". XML was designed to eliminate the infrequently used complications of SGML and make it suitable for everyday use, without losing the underlying advantages of SGML. Because of this, it is fairly straightforward, but this is exactly its beauty. XML is human readable and robust, both huge advantages, not the least in distributed computing, which is why we're seeing it all over the place now.If it's as "consistent" and "simple" as you indicate, then why is it so hard to parse? This is trolling at its best. The thing that makes XML so productive, and a significant advance to the state of the art, is the fact that you simply link in the pre-built, ready-to-run XML parser of your choice and it does all the parsing work for you. XML parsers exist for every language under the sun. The idea here is that instead of writing your own code for manipulating the low level structure of your data, you use someone else's standard code, and you worry only about the content of the data.
Let me say this again: There is no need to hit a remote server to parse or validate an XML document. You are just plain wrong.
Wrong. Your users will thank you for using XML, because they can actually see the data that's being stored & used by your application because it's human-readable. They will thank you because the format of the data is readily apparent, and can be used by other applications simply by parsing the XML document.What are you smokin', Joe?
Re:Does anybody read "real" comics anymore? (Score:2)
It's not intended to *replace* the comic, it's intended to sit alongside the markup and supply easily-indexable information such as the authorship details and even the dialogue, in much the same way META tags on a site can store non-visible data for search engines.
-Ciaran
Re:Ten-fold, eh? (Score:1)
Anyone could have written a spec/grammar for some application domain and implemented a parser in lex/yacc very easily. I don't see the big step forward that XML is, except that you don't need to write the yacc grammar anymore. Instead you have to write the DTD and use XSL to transform the input, or use an XML parser to get some internal tree representation of your input (which is possible with yacc-tools too).
Then the question is: is a yacc grammar so much more difficult to write as a DTD? I can't tell for sure since I'm quite experienced with yacc and not with DTD, but I doubt it.
Re:Usar Freindly? (Score:1)
OMG!
That was the funniest thing i've seen all day!
Re:ANOTHER grammar? (Score:1)
(i) you don't understand XML very well
(ii)you haven't done much with xml parsers
A few points... (I'm not going to bother copy/paste your comment)
(i)DTDs don't have to be remote.
(ii)Most XML apps will not touch the parser code, just the API (eg. using libxml or the like)
(iii)XML solves a lot of the problems earlier, 'looser' SGML based standards brought on, eg HTML. and it's 'optionally-closed-tags' for instanced. That's way HTML was written in XML ( XHTML ), TO MAKE IT EASIER TO PARSE!! I'm not going to venture as far as to say that writing XML parsers (from scratch) is easy. I have in the past, and I'm currently working on one right now - It is not cake. But it is far from difficult and the benefits of a very portable language is worth it
(iv)What the hell does a XML grammer have be standardised by the W3? There are tons of these grammers out there. And that's exactly what I think the W3C had in mind when they created XML. To have those specialized languages parsable WITH ONE PARSER.
I have a new bandwagon for you, "the bandwagon of people who know little about a technology but blindly bash it for exactly that".
I'm sorry man but I'd suggest you stop over at http://www.w3.org/ and do some reading
Not out of luck for Dilbert! (Score:1)
See http://www.bfmartin.com/finder/ [bfmartin.com]
It even includes a downloadable XML index, though it's not as sophisticated as what's proposed.
BFMartin
Overtired (Score:1)
I guess that makes HTML the charred-black pot.
"come off crisp and play up to the cynic
clean and schooled right down to the minute"
Do it yourself (Score:1)
That's why my startup page has a lot of comics directly obtained from several newspapers. All you have to do is put the strip you want on a SRC= of the IMG tag.
This can't be done with a lot of comics that change the name of the image on a daily basis, unless the change of the name is a function of the date (month-day-year). In that case you can use a little javascript to grab these images
Re:comics in one place (Score:1)
Evan Reynolds evanthx@hotmail.com
Re:How about some Marvel? (Score:1)
Re:But can it do this...? (Score:1)
Re:ANOTHER grammar? (Score:1)
Comment time warped back to 1994 so author can understand the validity of what they wrote
I'm sorry to go off on such a rant, but I am SO tired of everything being done in an HTML format. It's not that it's a particularly great solution, it's just that it's the new hot standard. Furthermore, let's face it, HTML is real easy. So easy that very mediocre minds can grasp it and feel like they're "on top" of the current technological trend. Puh-leeze
I think the author of this rant has forgotten what happens when you democratize a technology and make it available to "very mediocre minds."
Re:fp? (Score:1)
-----
How about some Marvel? (Score:1)
Sure we're good at illegally swapping songs, but what about other media formats?
Dilbert (Score:1)
Comics copyrights? (Score:3)
Unless we're talking only indy artists (I doubt United Features Syndicate would want Peanuts strips easily travelling, and then being searched, on the web).
Re:ANOTHER grammar? (Score:2)
You do realize that's what XML is about, right? By itself, XML is no more useful than plain old SGML (though the syntax is nicer). Without grammars, XML is pointless for sharing data. Sure, you can use XML to do menial little things like handle configuration for an application, but where it really shines is the ability to specify a set of rules for makring up different types of data. Having multiple grammars Just Makes Sense (tm), as a single grammar can't be expected to gracefully handle all the many different applications for XML.
As for server downtime causing parser problems, I see two ways around it -- either distribute your schema so that others can download it and use it locally with their parsers, or have some method of "certifying" schemas, which would then be hosted somewhere stable like w3c.org. As the latter most likely won't happen save for the most visible of schemas, I think the former has the most potential. Sure, there are potential versioning problems, but those can be worked around.
comics in one place (Score:3)
I can see it now... (Score:3)
(ActionBubble)POW!!!(/ActionBubble)
(ActionBubble>ZAP!!!(/ActionBubble)
(ActionBubble>BANG!!(/ActionBubble)
(SpeechBubble)You are no match for my Kung Fu skills!!!!(/SpeechBubble)
I am just waiting for... (Score:1)
'This is ment to be funny and is not a flame or troll..'
PLIF has the best search I've seen (Score:3)
--
Webcomics.com has hundreds of great comics (Score:1)
We have been supporting netscape channels for both our top comics and our newest comics, and types of syndication for over 6 years, both inbound comics and out bound listings. We'd gladly support a comicsML that makes sense.
We've already seen that there are lots of issues with providing a viable living for online cartoonists. I don't know if this is the answer, or even part of the answer, but I'm all for taking a look.
ANOTHER grammar? (Score:5)
I'm sorry to go off on such a rant, but I am SO tired of everything being done in an XML format. It's not that it's a particularly great solution, it's just that it's the new hot standard. Furthermore, let's face it, XML is real easy. So easy that very mediocre minds can grasp it and feel like they're "on top" of the current technological trend.
Puh-leeze
As a result we now have a plethora of half-baked, almost-finished grammar specifications littering the internet landscape and plugging up the W3C standards pipelines.
I'm making a predication. Most of these standards will either (1) be forgotten or (2) be rushed through and signed off as standards. I hope and meditate for the first.
XML is great for some types of data, but it's advocates are so blinded by its simplicity and consistency they overlook flaws immediately obvious to more experienced developers. Despite the press, XML is NOT that easy to parse. The same hassles we experience with HTML parsers are magnified tenfold. Furthermore, it often depends on grammar definitions that reside on remote servers. This introduces all the hassles of network-based programming into what should be simple standalone client applications. Finally, it's big. I mean REAL big. Oh, you can zip it? Great, let me run out and link the zip libraries into my application. What? There's licensing issues? Well, what do I do know?
Please, for pete's sake, when you feel the temptation to create another XML grammar, think about what you are doing. Just say no. Your users will thank you.
Somewhat offtopic -- why UF? (Score:1)
Give me Sluggy Freelance.
----
Dope Dragon (Score:1)
I would write the DTD to define backgrounds or layers of images and then plot their movement. If you could define songs or sounds for elements, and then define the timing of the element, you could put animations together in a format that could be easier to write than, say a POVRay file, with pretty cool and consistent content in less time.
Its a bit like scripting a scene rather than drawing it, but it makes me think more of Sandman [mayhem.com] comic than a Piranha Club. [piranhaclub.com]
Re:ANOTHER grammar? (Score:1)
FWIW (not much, really), I did exactly one hobby project in XML. Yech.
I switched to Guile [gnu.org] immediately afterward. You can use it just like a ML if that's all you want to do, and you'll find it way easy to parse. As an added bonus, you can embed code in your data/documents/stylesheets if that's what you want. (Watchout for viruses, though.)
--
XML Schema/DTD Proliferation (Score:1)
--
Peanuts no... Duplex yes! (Score:3)
United Media may not want that, but the other major comic syndicate (United Express, IIRC) seems to have a good attitude about it...
Both syndicates have always had 'one month' of each strip available - but last year the Uexpress website (www.uexpress.com) made a drastic change..
Last November, they put all of their comics online in a 'back issue' format.. instead of only showing one month of strips, you can go back all the way to 1996 (or whenever their website started carrying the strip - Duplex goes back to August of 96) - Calvin and Hobbes is being carried in its' entirety (more or less, they are revealing one at a time - offset by 11 years of the original strip date, so today's strip is from April 18, 1990; but it starts at November 17, 1985)
Contrast this with BC or Meg, which are so paranoid, they obfusicate the strip filename in a lame attempt to prevent someone from using a robot to download the strip.
You may not be able to get dilbert or Peanuts, but it wouldn't surprise me if Uexpress.com indexed their comics like this.
Hmm... this pretty blatantly ignores manga. (Score:4)
I think this line pretty much speaks for itself, but I will raise a few more points. The internet has allowed comics to pretty firmly break the traditional limitations of print. This DTD seems to want to codify everything inside those old limitations. That's a pretty limiting point of view, I think.
Where are the tags to show art that crosses multiple panels? Where are the tags to show 'visual' thought bubbles. Where is the anime-style giant sweatdrop tag? Where are the tags to show 'emotional' sound effects, as are often displayed in manga and manga-based comics?
Unfortuneately, this DTD pretty firmly ignores everything that doesn't go along with western newspaper-style comics, despite the fact that the author wants to let people break out of those old traditions.
Bzzt! So sorry, but you lose! Please play again, McIntosh-san!
XML doc types are not a standard! (Score:1)
So you're right, this isn't any different than proprietary browser tags, but why would you expect it to be? Using XML to describe your data doesn't mean that you need to submit yourself to a standards body for approval.
I hear all sorts of people make the absurd statement that their application is XML compliant and therefore somehow more "open". Why someone would believe this is beyond me.
XML-based standards are another thing altogether. For example, SOAP is a (proposed) standard that is represented using XML, but the fact that it uses XML doesn't make it a standard.
Invisible Agent
But can it do this...? (Score:2)
Re:comics in one place (Score:2)
You can do this with a neat Perl program called NewsClipper [newsclipper.com]. There's support for a ton of comics [newsclipper.com] and it's really easy to use.
It's not just for comics, actually. It can grab headlines from tons of sites - news, weather, games, technology (/. included). It's a nice program.
Re:ANOTHER grammar? (Score:5)
By the Great Spirit, do we really need another XML grammar?
It's a schema (unfortunately apparently just a DTD). Backgrounder (which is probably unnecessary for you but for others that will misread your message and assume you are implying that there's all these crazy "standards" for XML) : XML is merely the basic rules by which the data is encapsulated, but without agreeing on a standard set of data organization standards (schemas) you really haven't acheived much (and this is something that most XML zealots and detractors fail to understand). If I said "Give me your resume in that new whiz bang `XML format'" I will have achieved nothing and would get a huge mess of sloppy data piles that would have to be analyzed, etc. One might be a binary Word document enclosed in the root tag, while another one might be heirarchial with tonnes of attributes, etc ("object oriented"). If on the other hand I said "Give me your resume conforming to the schema blah blah (giving a namespace like http://www.hr-xml.org/blahblah.xsd) you will know EXACTLY the format that your data should be encapsulated in, that the xsd:timeInstant field is ISO 8601, that the character set can be encapsulated just so, that the character field must follow specific rules, that it must have this set of fields in this order : I can then build a validation engine (that will have a local copy of that schema obviously : I can't see any situation where you'd be working with remote schemas. At most you will view it as a namespace. Schemas once published become like a COM interface : immutable, and when I say that my program conforms to MonkeyShema 1.0 at the namespace location of blah blah then I have that intrinsincally in the logic of the program and noway would I do a get of that schema everytime I wanted to parse something) that says "Does this conform to the rules?" and from there it can parse through it sucking out the values into the HR database. XML + XSD is the standard, and an absolutely brilliant one, that despite the frothy rantings of critics, is incredibly valuable. XML+XSD+XSLT is offering a solution that the industry has never had, certainly not this evolved.
As mentioned though the true power of XML is really in the schemas : The standard way of defining the data (see http://www.w3.org/XML/Schema [w3.org]). When people have such a clearly defined, standard method of describing communications between two products that is of immeasurable value. The idea behind each of these schema standards is to do exactly that : Start agreeing on some basic standard schemas. See Biztalk.org [biztalk.org] for examples.
Your opposition sounds primarily to be fear of change (which is one of the most imposing software development problems). XML+XPath+XSLT+XSD is quite a load to learn, so firstly I take issue with your claim that XML is "simple" : Can it be simple to start into? Absolutely. Just like C can be easy to start into.
int main(int argc, char *argv[]) {
  printf("Hello world!\n\r");
};
Does that diminish the power of C? Not in the slightest. XML represents a great leap forward in our ability to describe the data that we pass back and forth, and the standards are of enormous value.
BTW: Obviously eventually some fully spec'd XML compression will be standardized, such as XMill [att.com]. XML is heavily compressible : Often, paradoxically, moreso than the same data stored in a proprietary format. However it is usually a moot point because XML usually comes into play where no existing communications were taking place.
Cheers!