You are correct that the temperature observations in the four datasets are not all statistically independent. There are a finite number of weather observation stations in the world, so of course there will be overlap in the raw data used to generate the datasets. That's why I described them as "methodologically independent datasets, all derived from raw temperature data from land and ocean surface temperature observations." The State of the Climate report uses similar language. In other words, they might share input data, but the methods used to generate the final datasets (e.g., how to perform data quality control, how to interpolate missing data, etc.) are independent. That was all I meant. So yes, you are correct that the data themselves are not all independent. The semantics are messy and annoying, so I am sorry if what I meant wasn't clear.
ALL the datasets they use in this study are directly controlled by the NOAA. They are each adjusted and calibrated... by the NOAA...
But that's objectively not true. Look at the methods used to generate the datasets. Yes, all four datasets use GHCN, but some of them use other data sources in addition to GHCN, and they don't all use ERSST. They take very different approaches to deciding which stations to include, how to correct for missing data, and so on. So yes, there is overlap in source data, which is probably inevitable if you are trying to compile a global dataset, but the final products certainly do not "come from the NOAA" or even rely exclusively on NOAA data. (Caveat: This is based on my non-expert readings of dataset summaries and descriptions.)
Look, I definitely see your point about the datasets not being statistically independent. That is absolutely correct. But claiming that they are all "directly controlled by the NOAA" and "adjusted and calibrated by the NOAA" comes across as disingenuous. It's probably best just to say that they're not statistically independent and leave it at that.
my bias corrects from 35 to 30. The figures based on the math alone were showing something around 35 to 38 percent. But given that we've had corrections to the models and the figures going on for years and they always correct them DOWN... I personally decide to read the numbers as being slightly lower than cited if only in anticipation of the next correction.
Whatever works for you, I guess, but applying an arbitrary 5% downward adjustment because your gut tells you the numbers might be biased is not very defensible. Unless you have actual evidence that the station readings are biased upwards, or that the datasets are fudging the anomalies upwards, you really have no idea whether a correction is needed, let alone how large of a correction to apply. You could be right -- I don't know, and neither do you. Arbitrarily changing the study results because of a hunch is sketchy, at best. Consider the opposite: Some have argued that the JMA dataset underestimates the true extent of climate warming, but I doubt you'd accept an arbitrary 5% upward correction as a result, and neither would I.
Regardless, I suspect we can both agree that, in the end, the precise probability that 2014 was actually the warmest year isn't all that important. The general trend probably matters more, and no one is disputing that 2014 was one of the hottest five or ten years on record.
Anyway, thanks for the interesting (and civil) discussion.