Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror

Comment Re:Oh, Please... (Score 5, Informative) 485

> I'm curious if anyone else in the hard sciences
>has had the same problem. Is there anyone out
>there that can comment on number crunching in
>fields like high energy physics, where Excel would
>surely run out of memory before spitting out
>anything useful? I just want to make sure that my
>bias is justified.

Well, for my undergrad physics labs, I've been able to use the StarOffice spreadsheet; but that's working with a small data set. For serious number-crunching I would never use any sort of "Office" product; they're not designed for that task.

As for high-energy physics: you're talking about very large amounts of data there. Running a detector simulation on tens of thousands of Monte Carlo events gets you several gigs of data. We need many terabytes of storage for both real data and Monte Carlo simulations. Excel isn't going to handle that sort of stuff, especially with the amount of analysis you have to do with the data. Essentially all of the code is written by physicists expressly for the purpose of data analysis. Have you heard of "grid computing"? It's where the LHC is headed. High-energy physics really pushes the frontiers of computing power. We need lots of space and lots of processing power. No commercial tool works. I'm often not satisfied with the state of the code I work with for CDF (the Collider Detector at Fermilab), but for what we do it's the only option. Keep in mind there's far more data coming into the detector than we can process; we use triggers to cull the potentially good events. Then we analyze them, looking for tracks and calorimeter clusters and things like that; from there we build up physics objects (electrons, photons, taus, jets). At the end we cull the important parts - what we really need to do analysis - and put them into "ntuples" that we can work with more easily. At that point you have a fairly manageable amount of data from which you can extract the events that interest you (for some channels, there might only be 4 interesting events out of all those that are recorded!) So eventually you're working with a data sample that you can afford to process in detail. Analysis is done there, but you always might have to go back to the original data.

Sorry this has been kind of disjointed, but I don't have time to re-organize it now.

Slashdot Top Deals

If all else fails, lower your standards.

Working...