Uhm...to say that there is no unified protocol for video and voice on XMPP just doesn't match reality.
The jingle specs are fairly universal in the XMPP world. Google's, interestingly enough, is actually a bit out of date at this point, but they've promised to update to the jingle specs once the XSF has settled them, which has only really happened pretty recently.
Other clients that support some level of jingle A/V, where some of them may be audio only (and remember, there's basically no support needed at the server level for any of this) are Psi, Cocinella, Spark (in Windows), and now Pidgin. Talkonaut is a mobile (WinMo and Symbian) client that does jingle voice. More niche clients that have support are some of the IP PBX systems like Asterisk and FreeSwitch. There are others that are listed in places that have support for it, but I don't know the degree of that support, so I'm not going to list them...others can speak up if they know better on some of the others.
iChat is definitely the outlier in the XMPP world for not supporting jingle, or at least supporting something jingle-like (Google hasn't moved up to the standard as specified yes, as I said).
Oh, and just to knock down a bit of bias...I'm typing this on a Mac, so ostensibly, I'm one of those snobby iChat users as well, except that I don't use it.