[Michael Mann's] original hockey stick graph [...] has been substantially borne out by any number subsequent studies using different data sets.
No it hasn't. At least, not if by "different data sets" you mean data sets that don't intersect with his. The long-term hockey-stickness of the result comes from just a few of the same cherry-picked proxy sets reused over and over again in slightly different combinations.
Suppose Mann does a study using ten data sources including one that goes back a long time and has a hockey-stick signal - say, the Graybill bristlecone series - and nine others that just contribute noise to the mix. If some other researcher does a study that *also* includes Graybill's strip-bark bristlecone pines (a type of data that the National Academy of Sciences said "should be avoided") but swaps out one or more of the nine "random noise" series, it'll probably still have a similar shape. That doesn't mean Mann's conclusion was correct. It might just show GIGO. The output is related to the input.
One problem here is that we don't have a lot of really long data sets. The few we do have have been snooped and massaged a hundred ways - researchers *already know* what shape they have in the inputs before they do the analysis. Reusing data sets invalidates a lot of the standard verification statistics.
Another problem is that the people doing these studies don't seem to be specifying clear and objective input criteria. The researchers just pick a bunch of series they happen to have convenient access to. So there's no way to exclude the possibility that unconscious biases encouraged selection of input sets that specifically got the results they wanted. The fact that by coincidence they keep reusing the *same* sets over and over even when others are available does tend to argue in that direction.
Another problem is that different people are using different definitions of "hockey stick" and what it means for one result to be "like" another. RealClimate likes emphasizing the "blade" part, so for them it seems like even borehole studies that only go back to 1600 are said to"confirm" a HS, even though all that shows is it's warmer now than it was in the Little Ice Age - which no skeptic ever doubted. For the skeptics the main issue is with the "shaft" part - how large was the variance over the last two thousand years? Long-term proxy studies that get their main shape from tree rings tend to be flat in the past because tree rings aren't good long-term temperature proxies, tend to have a sudden upswing in the modern era because some sort of "calibration" step or arbitrary ad-hoc selection data-snooped the choice of a particular set of trees that happen to have a recent growth pulse, and don't drop again at the end because the last bit of data ("regression towards the mean" in a set that suddenly jumped due to a random growth pulse) is arbitrarily discarded and replaced with the instrumental record ("Mann's trick" to "hide the decline").
Long-term (thousand-year or longer) proxy studies that don't get their primary shape from tree rings tend to show a larger variance in the past, a warmer MWP (roughly as warm in 1000-1100 as in recent decades), and a colder Little Ice Age than Mann found.
One example of a study that doesn't rely on tree rings for its fundamental shape is Moberg: Highly variable Northern Hemisphere temperatures reconstructed from low- and high-resolution proxy data Nature, Vol. 433, No. 7026, pp. 613 - 617, 10 February 2005. Quote: "According to our reconstruction, high temperatures - similar to those observed in the twentieth century before 1990- occurred around AD 1000 to 1100, and minimum temperatures that are about 0.7K below the average of 1961-90 occurred around AD 1600. This large natural variability in the past suggests an important role of natural multicentennial variability that is likely to continue. Here's the reconstructed shape found by Moberg (gif). This study was criticized by the RealClimate gang; Moberg's response to these criticisms is this followup study. Quote: "Hence, the M05 approach does not routinely inflate low-frequency variance."
Another example is the Loehle Study. RC also criticized that one and Loehle wrote a followup study which fixed all their concerns and had substantially similar findings, but the corrected version seems to be behind a paywall. eg, here. Here's the reconstructed shape found by Loehle (gif).