Catch up on stories from the past week (and beyond) at the Slashdot story archive


Forgot your password?
Check out the new SourceForge HTML5 internet speed test! No Flash necessary and runs on all devices. ×

Software to Make Blue Gene Top 200 Teraflops 171

An anonymous reader writes "New Scientist has a story about the most intensive computer program ever created. It runs on IBM's big beast, Blue Gene/L, at Lawrence Livermore National Laboratory in California and carries out 207.3 teraflops (trillion cacluations per second). The program, called Qbox, performs very complex quantum calculations to simulate the behaviour of thousands of atoms in three dimensions. Wow."
This discussion has been archived. No new comments can be posted.

Software to Make Blue Gene Top 200 Teraflops

Comments Filter:
  • by LiquidCoooled ( 634315 ) on Friday June 23, 2006 @04:34PM (#15592323) Homepage Journal
    It does not perform very complex quantum calculations, instead
    It simulates interactions between 1000 molybdenum atoms under high pressure using equations that take the quantum behaviour of electrons into account.

    Also, when its not being used to dynamically model atomic structures, the IRS uses it to calculate Bill gates's taxes.
    • by rolfwind ( 528248 ) on Friday June 23, 2006 @05:06PM (#15592565)
      And it almost makes the requirements for Vista!
    • by Memnos ( 937795 )
      At the unfortunate risk of repeating myself on Slashdot (Oh, the Humanity!) you are correct. It is intrinsically impossible for a discrete-state system to model quantum mechanical events, unless you somehow sneaked under the Planck limit (There is no spoon..) So, they're faking it.. However, if it is a good model of "reality", then it is good science. If it can predict, it is useful.
      • It's not "fake" so much as it's an approximation. I guarantee you the know by exactly how much they are in error (but not in what direction!). The Schroedinger Equation that is at the heart of this represents the probability (well its modulus does, at least) of something as a continuous function of space and time. These scientists make errors in that the equations that they use are discrete (in terms of mathematical degrees of freedom, strictly speaking, by discretizing space and time directly) models of th
      • by mfago ( 514801 )
        impossible for a discrete-state system to model quantum mechanical events
        Huh? QM was a while ago, but I'm afraid you'll have to give a reference or two. You're saying that Density Functional Theory [] is impossible? The authors (of DFT) did win the Nobel proze a while ago, so I'm sure I'm missing something. Mind you, any implementation is only an approximation, but that's true of almost any computational method.
        • Yes. I am am saying that a discrete-state-system, such as a Markov chain, cannot follow quantum mechanical events. QM state reduction is not beforehand deterministic because it it follows a wave function that be known beforehand in its full vector state (e.g. position and velocity.) If you wish references I would need to look them up, except for my remembrances of Richard Feynman and Stephen Hawking lecturing to me on this subject, and my own experiments. But I can find them. That neither obviates your
      • > However, if it is a good model of "reality", then it is good science. If it can predict, it is useful.

        Only if it is open source. Otherwise, it belongs in the Journal of Irreproducible Results. Unless I can reproduce the numerical experiment, the predictions are as meaningful as a call to the psychic friends network.
  • by wiz31337 ( 154231 ) on Friday June 23, 2006 @04:34PM (#15592325)
    Yeah, but can it beat Kasparov at chess?
    • by elrous0 ( 869638 ) * on Friday June 23, 2006 @04:53PM (#15592476)
      It's so powerful, it can beat Kasparov in chess and monitor millions of phone calls for the NSA *at the same time*!


      • by elrous0 ( 869638 ) * on Friday June 23, 2006 @05:14PM (#15592617)
        Geez, I am sick of getting modded down for this. Can /. please stop giving the White House unlimited mod points?


    • No, but it's already mapped his genome and is working on a clone that will be completely under its control.
  • by vishbar ( 862440 ) on Friday June 23, 2006 @04:38PM (#15592353)
    New Scientist has a story about the most intensive computer program ever created.

    Too bad for Q-Box that their title will be stripped of them so soon. Vista's almost here.

    Wait a minute, Vista? Nevermind...Q-box should have it for a long while.

    • Since QBox's title is for requiring the most computing power to carry out its intended application, Vista may well unseat it. It's just that QBox's intended application is extremely complex quantum physics calculations, and Vista's intended application is letting people check their email. So... not quite a victory for Vista.
  • "Wow."

    More importantly, at what FPS does it play WoW?

    Though I wouldn't be surprised if it needs a new graphics card for Crysis []...
  • by stratjakt ( 596332 ) on Friday June 23, 2006 @04:42PM (#15592383) Journal
    I mean, I'm sure I could use up more than 200 teraflops with my "while (1);" program.
    • while(1); uses no FLOPS. OTOH, if you used while (1.0);...

      (And for those of you who are humor-impaired, I do realize that neither would use any FLOPS because they would both be optimized into L1: jmp L1).

    • Don't be silly.

      Everyone knows Linux can finish that loop in 5 seconds.
      • Linux can finish that loop in 5 seconds

        Not *that* infinite loop. The "infinite" loop that Linux and any other OS can finish in 5 seconds (if the CPU speed is right) is:

        int n;
        for (n = 1; n > 0; n++) ;

        This loop will actually finish because n will overflow and become negative after it reaches the largest value that can be represented as an integer in the machine it's running.

        • yarbo@oxygen /crap/src/temp/inf $ cat inf.c
          int main(){
          int n;
          for (n = 1; n > 0; n++) ;
          return 0;

          yarbo@oxygen /crap/src/temp/inf $ time ./inf

          real 0m6.761s
          user 0m6.748s
          sys 0m0.003s

          #and just for fun
          yarbo@oxygen /crap/src/temp/inf $ gcc -O2 -fomit-frame-pointer inf.c -o infO2
          yarbo@oxygen /crap/src/temp/inf $ time ./infO2

    • It's pretty funny to see that mentioned because just the other night I was thinking about the "while" function as I've never really thought of this before but I realized you could just put a "1" in the parentheses and it would return true indefinitely and you could get some pretty fun results, perfect for prank-related endeavours and so on. Then I thought, "oh, like all those texts from the 90s [that I didn't understand]"... ;)
    • Sorry, but I imagine you'd keep one of the many processors very busy, with the rest left idling away.
      Now, spawn a thread for each processor running this, and you might have something =-)
      • with a good compiler designed for the machine, something like:

        #define NUMPROCS x
        int array[NUMPROCS];

        function getval(int indx) {
        return array[indx];

        while(1) {
        for(i=0; iNUMPROCS; i++) {
        array[i] ^= getval(i);

        should probably be optimised for multiple processors. I'm not sure how fine-grained the optimisation is, but I doubt you have to manually launch threads to get
    • Hopefully that would use exactly ZERO FLOPS given that only the integer unit would be used and no floating point calculations would be made :-p
  • (Score:5, Interesting)

    by sarlos ( 903082 ) on Friday June 23, 2006 @04:43PM (#15592402)
    So in essence, it takes about .2 teraflops per atom... And that was only after spending a lot of time condensing the algorithms. This makes me wonder two things. First, what do these equations look like such that it takes 200 gigaflops just to model one atom. Second, over what timeframe does this simulation take place? Are we talking real-time, calculating for 50 years, what?

    Regardless, as a computer scientist, I say way to go to these guys, this is damn impressive.
    • I can't imagine it's real time. From what I understand, most chaotic simulations are far far slower than real time.
    • by tpjunkie ( 911544 ) on Friday June 23, 2006 @04:52PM (#15592468) Journal
      It doesn't take .2 teraflops to model one atom, or even two atoms, even account for effects on the quantum level.. However, when you take into acount that each atom will more or less interact with every other atom, you have a massive amount of interactions to model. Thats what takes so much processing power.
    • (Score:3, Insightful)

      by MustardMan ( 52102 )
      So you're a computer scientist, but you apparantly don't understand Big-O notation or the concept that algorithms don't neccesarily scale linearly with the number of elements.
    • (Score:5, Informative)

      by mhore ( 582354 ) on Friday June 23, 2006 @04:57PM (#15592507)
      So in essence, it takes about .2 teraflops per atom... And that was only after spending a lot of time condensing the algorithms. This makes me wonder two things. First, what do these equations look like such that it takes 200 gigaflops just to model one atom. Second, over what timeframe does this simulation take place? Are we talking real-time, calculating for 50 years, what?

      0.2 TFlops per atom, yes. But there are 1000 atoms, and it's molybdenum which has 42 eletrons... so that's 42,000 particles that all interact with each other. Still... that's not too many. But maybe they're considering interactions between nuclei, too. Who knows...

      As for your question about what the equations look like? They're probably very nasty integrals of sines and cosines and what not to various odd (read: strange) powers and stuff. I do fairly computationally intensive simulations on some big IBM machines and just simple equations can amount to quite a bit of calculations. Nothing like what these guys are doing, though.

      Finally... what time frame is the simulation over? I'd wager VERY SHORT times, maybe nanoseconds or something like that. Even casual "molecular dynamics" simulations can only probe very short timeframes. Their coarse-grained cousins can maybe do microseconds or milliseconds.


    • (Score:3, Informative)

      In a classical physical system the time to compute what happens to N particles typically grows as a polynomial in N. The masses and positions of the particles form a 6N dimensional space (3 for velocity, 3 for position) and you're typically trying to trace a path through that 6n-dimensional space.

      In quantum mechanics the state of the system is defined by a wavefunction on a 3N dimensional space. The state of a system is no longer a point, it's a *function* on a 3N dimensional space. That means that at any

      • Actually, in classical molecuar dynamics, the algorithm is usually N^2. However, in this case "N" is the number of _electrons_, not atoms, i.e. 42000 electrons.

        Oh, and this is not classical physics, but QM. Thus each electrons wave function has to be represented by a (possibly substantial) set of basis functions. Not sure if anyone's been able to get Density Functional Theory (DFT) to scale that high, but if so, DFT scales as (IIRC) either N^7 or N^9. Ouch! Sure there are tricks, such as pseudopotentials th
        • the algorithm is usually N^2

          As I say, modulo a polynomial. The complexity of quantum systems typically grows exponentially because we're looking at the tensor product of the subsystems.

          I'd love to find out a bit more about the algorithms used here. And I'd be interested to know what kind of validation there is for the methods. I guess I can start here []. (My background is more particle physics than many-body systems.)

    • Quantum Monte Carlo (Score:3, Informative)

      by poszi ( 698272 )
      First, what do these equations look like such that it takes 200 gigaflops just to model one atom.

      The article is light on details but I suppose the only quantum algorithm that can handle 1000 atoms is Quantum Monte Carlo []. The problem is that the algorithm is cubic with the number of particles (and has a huge prefactor). So in essence 1000 atoms is 1000^3=10^9 more time consuming than one. And I'm sure they still use dramatic simplifications, even though they have the most powerful computer. They probably

    • As another computer scientist (specializing in algorithms), I think this is inefficient and needs further research :)
  • by Weaselmancer ( 533834 ) on Friday June 23, 2006 @04:44PM (#15592407)

    The program, called Qbox, performs very complex quantum calculations to simulate the behaviour of thousands of atoms in three dimensions.

    "Molest me not with this pocket calcualtor stuff." []

  • by ScottLindner ( 954299 ) on Friday June 23, 2006 @04:46PM (#15592418)
    How do they know they got it right?
  • by fred_sanford ( 678924 ) on Friday June 23, 2006 @04:48PM (#15592434)
    Oblig. H2G2. "Here I am, brain the size of a planet and they ask me to take you down to the bridge. Call that job satisfaction? 'Cos I don't." - Marvin
  • Just wait... (Score:4, Informative)

    by Raul654 ( 453029 ) on Friday June 23, 2006 @04:48PM (#15592436) Homepage
    BlueGene/L has a sister project, Cyclops64 (formerly known as BlueGene/C) due out sometime late in 2006 or early 2007. My research group is (a) helping IBM do hardware verification on it. and (b) developing the systems software for it [esp. the compiler]. Cyclops64 could very well blow BlueGene/L out of the water.
    • The compiler sounds like about as much fun as the one for Cell.
      Sounds like a very interesting project. I guess you have no problem writing and debugging multithreaded code?
      • Re:Just wait... (Score:5, Interesting)

        by Raul654 ( 453029 ) on Friday June 23, 2006 @05:41PM (#15592802) Homepage
        Cell was designed around one single objective - to get a clock rate as sickeningly high as possible, because clock speed cells. Trust me when I say that programmability was not (at all) a consideration (I should mention - my research group got one of the very first Cell processor's sent to the US. We are currently in the process of implimenting OpenMP on it to make it a little nicer to program).

        As far as writing multi-threaded code, I've spent the last 5 months rewriting the NAS CG benchmark to work effeciently on Cyclops64, which will probably play some part of my PhD thesis. (A sidenote: All of NASA's NAS implimentations are written in Fortran (except Integer Sort), which would have necessitated me rewriting NAS-CG in C. Fortunately, I didn't have to start from scratch, because the Japanese had already done the hard part []).
        • I did notice when I read the description of the Cyclops64 that the CPU seemed a bit more balanced than the Cell. It almost seemed like the inverse of the Cell with multiple threaded units tied to an FPU. I would guess that the FPU is optimized for double precision operations vs the Cell being optimized for single.
          Does the Cyclops64 support out of order execution?
          Just kind of wondering. My programing is limited to Xscale, Intel, and AMD cpus. The big cool toys fascinates me.
          • Re:Just wait... (Score:3, Insightful)

            by Raul654 ( 453029 )
            The compiler [the current version, at any rate] is based on gcc. So it sports the same out-of-order execution you would expect to get from compile-time optimization. I am not sure if it has hardware-based re-ordering. My guess would be that no, it does not, but without the Principles of Operation in front of me, I couldn't say (the advisor borrowed my paper copy for IPDPS 2006 and hasn't given it back yet).
            • If the hardware doesn't support reordering wouldn't you get a big performance hit if the you use a gcc's standard optimization? I am just a compiler user not a writer so I could be totaly wrong, But if I don't ask I will never know. Over all it looks very cool but very programer dependant. For a super computer that isn't a terrible thing.
              I also assume that the interger units are basied on the Power ISA.
              • I don't think the integer units are based on anything. The whole chip is being custom designed from scratch. (Interestingly enough, the VHDL code for the chip is being written by only one guy - the project leader himself).

                As far as instruction re-ordering -- for parallel computation, the big peformance hits occur with waits, synchronizations/barriers, and locks/mutexes. Making these cheap and reducing the number of them is the biggest way to increase performance.
                • "for parallel computation, the big performance hits occur with waits, synchronizations/barriers, and locks/mutexes"
                  Is Cyclops64 using a shared memory system? Most clusters I have seen used message passing. On those systems your bottle necks tend to be in message passing.
                  Yes mutexes are a lot of fun. I tend to use mutexes in my code just long enough to make a copy of the data structure for the thread to use. Yes it is cheating and relatively inefficient but it is also pretty safe and keep blocking to a minim
    • How does it compare to ? []
      • AHH! I forgot to close the HTML tag properly(Note to self: USE PREVIEW!)

        The question mark should have the word "Roadrunner" before it.

        Also for those who don't want to follow the link. Roadrunner is a supercomputer being developed at Los Alamos National Laboratory with aims to run at a sustained petaflop.
      • The most important sentence in that article: "If a 'go' decision is made to pursue the goal of a sustained petaflop, a final phase would be executed, with plans for completion at the end of 2007" The whole world is racing to build the world's first computer to sustain one petaflop. It's only a matter of time. I'm told the Japanese project (which is already underway) is expected to finish sometime around 2008/2009. Our project, C64, has been going since 1999, and I think it's got a really good shot of being
  • by jhw539 ( 982431 ) on Friday June 23, 2006 @04:52PM (#15592460)
  • Imagine a _________ cluster of those.

    Well done, you may now enter. Gaming room to the right, pron cubicles left, and crazy linux hardware center up ahead.
    We hope you enjoy your stay at Geek Heaven.
  • HPCWire Interview (Score:4, Informative)

    by multimediavt ( 965608 ) on Friday June 23, 2006 @04:56PM (#15592499) []

    There's some additional info about BlueGene and what Livermore thinks of it here. What this interview neglects to mention is the millions of dollars being spent on IBM and internal developers to get this code (and any others) working on BlueGene. I was briefed by the hardware and software teams that built BlueGene and I can tell you, it's no easy task to bring apps to that platform. Kuznezov seems to trivialize it in the interview and I'm gonna have to go back and review the process again. Maybe it has changed since my briefing in early 2004, but somehow I doubt it.
  • i thought there were more dimensions in the subatomic world [] o_O
  • by ultramk ( 470198 ) <ultramk&pacbell,net> on Friday June 23, 2006 @05:16PM (#15592630)
    I wonder what the cubes [] represent?

    Oh, wait. Qbox. Nevermind.

  • Like finding The Answer to The Ultimate Question Of Life, the Universe and Everything
  • Does anyone know what these calculations are trying to determine? In essence, what's the central problem to determining the reliability of old nuclear weapons? I would have thought they're doing simulations of detonation of these aged weapons, but the article talks about using molybdenum, which isn't a fissile material.
  • performs very complex quantum calculations to simulate the behaviour of thousands of atoms in three dimensions.

    Sounds impressive, but that's only about a 10 atoms on a side.

    • that's only about a 10 atoms on a side

      Exactly. That only goes to show how much CPUs still have to evolve. Every time someone mentions a new more powerful CPU here in /. there are people who ask "why, what's the use?". For many types of physical simulations, the most powerful CPUs in the world are still pathetically slow.

      And that's also a reason why carefully optimized code in C or Fortran with the inner loops written in assembler is still needed. Java, or Ruby, or Python, or any other interpreted language

  • by ender_ ( 131275 ) on Friday June 23, 2006 @06:28PM (#15593088) Homepage
    Imagine, if you will, taking this super-computing ability out a few years. Can the U.S. justify the invasion of a country X because X successfully simulated an attack on the U.S? Or maybe they just had the computing power to simulate it.

    To the UN: We'd like you to look at these satellite images that clearly show a super computer simulating the destruction of the U.S. We have to take out these terrorists and we're willing to go it alone.

    Afterward: Well it turns out that they didn't have the computing power at all, the images we had were of a mobile home park.
  • Wow indeed (Score:3, Interesting)

    by mnmn ( 145599 ) on Friday June 23, 2006 @09:47PM (#15594037) Homepage
    Thousands of atoms. Shrodingers/Bohrs equations for all of them.

    This has interesting consequences for the study of plastics, DNA, virii and other complex molecules.

    Perhaps the program can run in a loop trying every possible atomic combination to produce the best of certain attributes, as in give me the hardest material or give me an easy to manufacture room temp superconductor. It bypasses the whole invention/discovery step.

The less time planning, the more time programming.