"It is not acceptable to assume that it "just works" in the absence of evidence." No argument there, but where's the evidence that standardized tests are a valid metric for students, let alone entire schools? There are piles of studies that show increasing pressure results in poorer than normal performance. I don't see any reason to believe that high stakes testing is a valuable metric for anything; their use also relies on unconfirmed and dubious assumptions.
As to sorting out where things are applicable and where they are not, again I can't argue, but... how do we separate when something is useful by itself and when it's only useful in conjunction with other factors? For example, what if smaller class sizes are only useful if you take advantage of the fact that they enable differentiated instruction? I'm reluctant to rely on a gut feeling of what "should" work, but how much better is a study that doesn't look at its data in context? (And of course, I don't know that it didn't, but these are some of the things I'd like to see addressed off the top of my head).