Why was my characterization of their approach "hardly fair"?
You called it cheating.
Someone -- either the researchers or their press people -- decided to hype it as finding a general failing in DNNs (or "AI" as a whole).
It pretty much is. If you input some data far away from the training set you'll wind up at a completely arbitrary point in the decision boundary.
The research is not at all what it is sold as.
The research shows very nicely that the much-hyped deep learning systems are no different in many ways from everything that's come before. They have a few lovely illustrations of things that fool it, some of which are what you'd get if you follow the decision boundary a good way from the data, rather than jumping in at a random point.
I'd say there's not a huge amount novel in the research, but it's certainly not cheating.
Don't multi-class identification networks typically have independent output ANNs, so that several can have high scores?
My understaning is that they usually have one output node per class, but the previous layers are all common to the different classes.
I assumed, perhaps incorrectly, that the 99+% measures they cited were cases where only one output class had a high score, and the rest were low.
I'd expect that too.
If they were effectively using single-class identifiers, either in fact or by considering only the maximum score in a multi-class identifier,
Isn't that uisually how it's done? You have a bunch of outputs the strength of which indicates class/not class for a bunch of classes, then you take max over them to find out which class is dominant. Most ML algorithms are generalised to multiclass by using a one-versus-all or one-versus-one system like that (usually the former since the latter hasa quadratic cost).
Only a relatively few (e.g. trees and therefore forests) naturally support multiple classes.