Petabytes? Are you serious? WTF where they storing? If you have 256 sites, each logging 100MB a day of climate data (a ridiculous amount of data), that's a bit more then 108 years of storage per petabyte.
I would also point out that its not as simple as getting the data from the original source.
1) First of all, there are a lot of original sources, which ones exactly did they get data from?
2) The data was on magnetic tape and paper form, who knows what is still readable or available
3) Data gets skipped, overlooked, entered in error, etc.
4) Often times there are several steps involved in post processing the data.
This is why you keep your raw data every step along the way. Do you really think a person now, 20+ years after the fact, can go through the process ( accumulating, organizing, ingesting, analyzing, and processing data from disparate data sets) and come up with the same data set? Really?
I would seriously doubt UEA would be able to reconstruct 90% of the original data set with a high degree of certainty
I agree, your methods and results should be reproducible. However in this case they destroyed the original temperature data for a bunch of stations for many years. This data is not reproducible (unless you have a time machine). You can’t rerun this simulation.
In their own words: "We do not hold the original raw data but only the value-added (quality controlled and homogenized) data." Thus, all anybody has access too is their post processed data after it has been cleaned up and who knows what (if anything) removed. Even if you trust them to not massage the data in any way (suspect given their recent history) there is still no way to verify the assumptions, method, validity and correctness of the post processing.
This is at best catastrophic and unbelievably sloppy scientific work.
Thank you!
I have never understood why some IT people have to be dicks. Yes, sometimes you will have to deal with morons, the clueless, the malicious, and everywhere in-between. Welcome to the real world, put on your big boy pants and deal with it! The vast majority of people are just trying to do their job and made a honest mistake. I think it's time IT realizes that their job depends on 'DFUs'. Do you know what you call a IT guy without any user? Unemployed.
There is also the issue of paper thickness, ink layer, sharp edges and (god forbid) staples. These combine to make a material less then ideal for wipping ones' ass. As I always say, stick with TP for your ass, magazine for your brain in the loo.
Words to live by!
How is that? Are you trolling or is my 'humor' button stuck?
I guess it depends on the type of scientific computing you are doing. If you need a cluster to crunch numbers, don't use python. However, there are huge areas in scientific computing where: 1) speed isn't the primary concern or 2) languages like python are fast enough. Also, python has some pretty significant scientific computing tools like scipy (see http://www.scipy.org/), visualization using matplotlib (see http://matplotlib.sourceforge.net/ ), etc. I personally know a lot of people doing scientific computing and general research who use python.
If speed was the only concern, people wouldn't be using tools like Matlab, IDL, python, and the like. Obviously, a significant number of people doing scientific computing find these tools fast enough.
Always draw your curves, then plot your reading.