Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×

Comment Re:TDD (Score 4, Insightful) 156

For the purpose of release testing, though, the only thing you care about is whether or not there was a crash. If there was a crash, don't release. Back out the busted patch and release the working version. Then you can spend your time debugging the busted patch, which requires the logs and all.

Comment Re:Five Star (Score 1) 627

It is extremely reactive, so I'd imagine refining it is pretty difficult as well. Also, it is not "right up there on the most abundent element on this planet". It's actually only the 33rd most abundant element in the Earth's crust (out of 78 elements occuring naturally in the crust). Occurrence is only about 20 parts per million. http://en.wikipedia.org/wiki/Abundance_of_elements_in_Earth's_crust

Comment Re:At least they're not rolling their own. (Score 2) 138

You should not write a C++ interpreter. You especially shouldn't write an interpreter of a language that looks almost just like C++, but is different from it in unpredictable ways, some of which contribute to bad coding habits and/or make normal C++ more difficult to learn.

Strictly sequential files are a bad model for data if most of your time is spent constructing more-and-more elaborate subsets of that data. When we want to examine a subset, we practically have to make a complete copy of all the data falling into that subset. You want to make a small tweak to your selection? Make a new copy all over again.

Comment Re:At least they're not rolling their own. (Score 2) 138

Cycles are rarely the issue for us in HEP, and when they are, all we need is more nodes to split the problem into smaller pieces (wiki: embarassingly parallel problem). The actual computational needs are (typically) pretty small. The main bottleneck is usually data throughput. We discard enormous amounts of data (that may or may not be useful, depending on who you ask) simply because we can't store it anywhere close to as fast as we can make it (many orders of magnitude difference between the data production rate and the data storage rate). And then, when we're analyzing the data we've taken, our CPUs tend to sit idle while they wait on the disk to read another block of events, which then take a only a few cycles to add in to the necessary histograms. It only gets worse when the data is somewhere far away on the network. And it gets even worse when you want to select a subset of the data -- with our systems you have to make a full copy of the subset.

There are two big wins that modern big data has developed that we could benefit greatly from if the switchover costs weren't too high. The first is distributing data over many disks on many nodes and bringing the code to the data instead of bringing the data to the code. The more disks your data is on, the less you have to wait on seek times. The second is storing the data in a way that is not strictly sequential in a single set of files, so that if you want to look at a subset of the data, you can effectively do that without having to make a copy of that subset.

Slashdot Top Deals

"A car is just a big purse on wheels." -- Johanna Reynolds

Working...