Catch up on stories from the past week (and beyond) at the Slashdot story archive


Forgot your password?
Check out the new SourceForge HTML5 internet speed test! No Flash necessary and runs on all devices. ×

The Potential of Science With the Cell Processor 176

prostoalex writes "High Performance Computing Newswire is running an article on a paper by computer scientists at the U.S. Department of Energy's Lawrence Berkeley National Laboratory. They have evaluated the processor's performance in running several scientific application kernels, then compared this performance against other processor architectures. The full paper is available from Computer Science department at Berkeley."
This discussion has been archived. No new comments can be posted.

The Potential of Science With the Cell Processor

Comments Filter:
  • by Anonymous Coward on Sunday May 28, 2006 @08:58AM (#15420063)
    Insightful? Ah... no.

    Scientific users code to the bleeding edge. You give them hardware that blows their hair back and they will figure out how to use it. You give them crappy painful hardware (Maspar, CM*) that is hard to optimize for, then they probably won't use it.

    Assembly language optimization is not a big deal. Right now the biggest thing bugging me is that I have to rewrite a core portion of a code to use SSE, since SSE is so limited for integer support. As this is a small amount of work, and the potential gains are so large (about 4x), it doesn't make sense not to do this. Some of it will be hand coded and optimized assembler. This is how we have to program. Scientists need the fastest possible cycles, and as many of them as possible ... at least the ones I know need this. There are a few who do all their analysis on Excel spreadsheets. They don't need much in the way of speed. The rest of us do.
  • by JanneM ( 7445 ) on Sunday May 28, 2006 @10:09AM (#15420285) Homepage
    Hand optimizing code is what I do as a game developer and I can assure you that it is very relevant to my job.

    It makes sense for a game developer - and even more an embedded developer. You spend the time to optimize once, and then the code is run on hundreds of thousands or millions of sites, over years. The time you spend can effectively be amortized over all those customers.

    For scientific software the calculation generally changes. You write code, and that code is typically used in one single place (the lab where the code was written), and only run a comparatively few times, indeed sometimes only once.

    For a game developer to spend three months extra to shave a few seconds of one run of a piece of code makes perfect sense. For an embedded developer using a couple of months' worth of development cost to be able to use a slower, cheaper chip, shaving a dollar of the production of perhaps tens of millions of gadgets makes sense.

    For a graduate student (cheap as they are in the funny-mirror economics of science) to spend three months to make one single run of a piece of software run a few hours faster does not make sense at all.

    In fact, disregarding the inherent coolness factor of custom hardware, in most situations it just doesn't pay to make custom stuff for science when you can just run it for a little longer to get the same result. In fact, not infrequently have I heard about labs spending the time and effort to make custom stuff, but by the time they're done, the off the shelf hardware had already caught up.

  • bang, buck, effort (Score:4, Informative)

    by penguin-collective ( 932038 ) on Sunday May 28, 2006 @10:35AM (#15420361)
    Over the last several decades, there have been lots of parallel architectures, many significantly more innovative and powerful than Cell. If Cell succeeds, it's not because of any innovation, but because it contains fairly little innovation and therefore doesn't require people to change their code too much.

    One thing that Cell has that previous processors didn't is that the PS3 tie-in and IBM's backing may convince people that it's going to be around for a while; most previous efforts suffered from the problem that nobody wanted to invest time in adapting their code to an architecture that was not going to be around in a few years anyway.
  • by FromWithin ( 627720 ) <`moc.nihtiwmorf' `ta' `ffuts'> on Sunday May 28, 2006 @10:55AM (#15420435) Homepage

    So the Cell is great because there's going to be millions of them sold in PS3's so they'll be cheap. But it's only really great if a new custom variant is built. Sounds kind of contradictory.

    Did you not read the last bit?

    On average, Cell is eight times faster and at least eight times more power efficient than current Opteron and Itanium processors, despite the fact that Cell's peak double precision performance is fourteen times slower than its peak single precision performance. If Cell were to include at least one fully utilizable pipelined double precision floating point unit, as proposed in their Cell+ implementation, these speedups would easily double.

    So it's really great already. If it was tweaked a bit, it would be ludicrously great.

  • by infolib ( 618234 ) on Sunday May 28, 2006 @11:09AM (#15420465)
    The fact is that most scientists use high-level software (MATLAB, Femlab, ...) to do their simulations.

    Indeed, most scientists. They also know very little about profiling but since the simulation is used only maybe a hundred times that hardly matters.

    The cases we're talking about here are where thousands of processors grind the same program (or evolved versions of it) for years as the terabytes of data roll in. Such is the situation in weather modelling, high energy physics and several other disciplines. That's not a "program" in the usual sense, but rather a "research program" occupying a whole department including everyone from "domain-knowledge" scientists down to some very long haired programmers who will not shy away from a bit of ASM. If you're a developer good at optimization and parallellism there might just be a job for you.
  • by Darkfred ( 245270 ) on Sunday May 28, 2006 @11:13AM (#15420482) Homepage Journal
    Did Sony pay you or did Mr. Kutaragi come over to your house and type it for you.

    Have you seriously never seen anything like this before? As a professional ps2/360/ps3 developer I have to say that I was seriously underwhelmed by this demo. Every one of the effects has been used before. THe original xbox has every effect he mentioned. And HL2 has a significantly more complex lighting system and postprocessing effects.
    The demo appears to be a single high-poly character in a texture mapped box. The demoer admits that this is a cut-scene quality model. I believe this scene could be rendered on an original xbox with similar 'visual' quality. Why not use some of those polys to make a realistic background? Black on PS2 looked better. And they couldn't even show a solid second of actual gameplay.
    I think it will be an amaxing game, but the demo was no technical achievement. It was a hurried render test for an obviously incomplete engine. Bragging about poly count when your competition can push 1.5x-3x as many is not going to win them any points either.

  • by adam31 ( 817930 ) <adam31 AT gmail DOT com> on Sunday May 28, 2006 @12:42PM (#15420807)
    Actually bullshit.

    Actually, it's not bullshit. Simple C intrinsics code is the way to go to program the Cell... there's just no need for hand-optimized asm. Intrinsics has a poor rep on x86 because SSE sucks. 8 registers. A source operand must be modified on each instr, no MADD, MSUB, etc.

    But Cell has 128 registers and a full set of vector instructions. There's no danger of stack spills. As long as the compiler doesn't freak out about aliasing (which is easy), and it can inline everything, and you present it enough independent execution streams at once... the SPE compiler writes really, really nice code.

    The thing that does need to be hand-optimized still is the memory transfer. DMA can be overlapped with execution, but it has to be done explicitly. In fact, algorithms typically need to be designed from the start so that accesses are predictable and coherent and fit within ~180kb. (Generally, someone seeking performance would do this step long before asm code on any platform anyway...)

  • by adam31 ( 817930 ) <adam31 AT gmail DOT com> on Sunday May 28, 2006 @04:10PM (#15421537)
    I am also an experienced assembly programmer, and I too shared your mistrust of the compiler. However, I started SPE programming several months ago and I promise you that the compiler can work magic with intrinsics now. Knowledge of assembly is still helpful, because you need to have in mind what you want the compiler to generate... make sure it sees enough independent execution clumps that it can cover latencies and fill both the integer pipe and FP pipe, understand SoA vs AoS, etc. But you get to write with real variable names, not worry about scheduling/pairing of individual instructions or loop unrolling issues.

    Some of my best VU routines that I spent a couple weeks hand-optimizing, I re-wrote with SPE intrinsics in an afternoon. After some initial time figuring out exactly how the compiler likes to see things, it was a total breeze. My VU code ran in 700 usec while my SPE code ran in 30 usec (@ ~1.3 IPC! Good work, compiler).

    The real worry now is becoming DMA-bound. For example, assuming you're running all 8 SPEs full-bore, and you write as much data as you read. At 25.6 GB/s, you get 3.2 GB/s per SPE, so 1.6 GB/s in each direction (assuming perfect bus utilization), so @3.2 GHz, that's 0.5 Bytes/cycle. So, for a 16-byte register, you need to execute 32 instructions minimum or you're DMA-bound!

    Food for thought.

  • by jericho4.0 ( 565125 ) on Monday May 29, 2006 @12:46AM (#15423171)
    Maybe true on our computers, but not on supercomputers.
  • by Tough Love ( 215404 ) on Monday May 29, 2006 @04:57PM (#15425803)
    A programmer hour is much more valuable than a machine hour

    You forgot to take into account the team of scientists waiting for the machine to produce a result.

Basic unit of Laryngitis = The Hoarsepower