Catch up on stories from the past week (and beyond) at the Slashdot story archive


Forgot your password?

The Potential of Science With the Cell Processor 176

prostoalex writes "High Performance Computing Newswire is running an article on a paper by computer scientists at the U.S. Department of Energy's Lawrence Berkeley National Laboratory. They have evaluated the processor's performance in running several scientific application kernels, then compared this performance against other processor architectures. The full paper is available from Computer Science department at Berkeley."
This discussion has been archived. No new comments can be posted.

The Potential of Science With the Cell Processor

Comments Filter:
  • by Watson Ladd ( 955755 ) on Sunday May 28, 2006 @07:43AM (#15419901)
    The paper did a lot of hand-optimization, which is irrelevent to most programmers. What gcc -O3 does is way more importent then what an assembly wizard can do for most projects.
  • by Anonymous Coward on Sunday May 28, 2006 @07:50AM (#15419919)
    "The paper did a lot of hand-optimization, which is irrelevent to most programmers. "

    But not to programmers who do science.

    "What gcc -O3 does is way more importent then what an assembly wizard can do for most projects."

    Not an unsurmountable problem.
  • by Anonymous Coward on Sunday May 28, 2006 @07:55AM (#15419925)
    Hand optimization _is_ relevant to scientific programmers
  • by TommyBear ( 317561 ) <> on Sunday May 28, 2006 @08:07AM (#15419945) Homepage
    Hand optimizing code is what I do as a game developer and I can assure you that it is very relevant to my job.
  • by MooUK ( 905450 ) on Sunday May 28, 2006 @08:27AM (#15419988)
    I think you misunderstand what HPC actually is.

    High performance computing is that which you'd want to throw a huge Beowulf cluster at, or possibly a supercomputer or twenty. Not three small pathetic cores.
  • WTF? (Score:5, Insightful)

    by SmallFurryCreature ( 593017 ) on Sunday May 28, 2006 @08:57AM (#15420057) Journal
    First off you are talking about consoles being sold at a loss. NOT their components.

    IF IBM was the maker of the chip they would most certainly not sell them at a loss. Why should they? Sony might sell the console at a loss to recoup the loss from game sales but IBM has no way to recoup any losses.

    Then again IBM is in a parnetship with Sony and Toshiba so the chip is probaly owned by this partnership and Sony will just be making the chips it needs itself.

    So any idea that IBM is selling Cells at a loss is insane.

    Then the cost of the PS3 is mostly claimed to be in the Blu-ray drive tech. Not going to be off much intrest to a science setup is it? Even if they want to use a blu-ray drive they need just 1 in a 1000 cell rig. Not going to break the bank.

    No the cell will be cheap because when you run an order of millions of identical cpu's prices drop rapidly. There might even be a very real market for cheap cells. Regular CPU's always have lesser quality versions. Not a problem for an intel or AMD who just badge them celeron or whatever but you can't do that with a console processor. All cell processors destined for the PS3 must be off similar spec.

    So what to do with a cell chip that has one of the cores defective? Throw it away OR rebadge it and sell it for blade servers? That is were celerons come from (defective cache)

    We already know that the cell processor is going to be sold for other purposes then the PS3. IBM has a line of blade servers coming up that will use the cell.

    No I am afraid that it will be perfectly possible to buy Cells and they will be sold at a profit just like any other cpu. Nothing special about it. they will however benefit greatly from the fact that they already got a large customer lined up. Regular CPU's need to recover their costs as quickly as possible because their success will be uncertain. This is why regular top end cpu's are so fucking expensive. But the Cell allready has an order for millions, meaning the costs can be spread out in advance over all those units.

  • by stengah ( 853601 ) on Sunday May 28, 2006 @09:46AM (#15420200)
    The fact is that most scientists use high-level software (MATLAB, Femlab, ...) to do their simulations. Altough theses scientists may be interested by any potential speed-up to their workflow, they are not willing to invest any bit of their time to translate all their codebase to asm-optimized C. Thus, the ball is in the hands of software developpers, not scientists.
  • Re:WTF? (Score:4, Insightful)

    by Kjella ( 173770 ) on Sunday May 28, 2006 @10:03AM (#15420261) Homepage
    So what to do with a cell chip that has one of the cores defective? Throw it away OR rebadge it and sell it for blade servers?

    Use it. Seriously, that's why there's central + 7 of them, not 8. One is actually a spare so that unless it's either flawed in the central logic or two separate cores, the chip is still good. Good way to keep the yields up...
  • by samkass ( 174571 ) on Sunday May 28, 2006 @10:08AM (#15420280) Homepage Journal
    What seems to be more important than that is:

    "According to the authors, the current implementation of Cell is most often noted for its extremely high performance single-precision (32-bit) floating performance, but the majority of scientific applications require double precision (64-bit). Although Cell's peak double precision performance is still impressive relative to its commodity peers (eight SPEs at 3.2GHz = 14.6 Gflop/s), the group quantified how modest hardware changes, which they named Cell+, could improve double precision performance."

    So the Cell is great because there's going to be millions of them sold in PS3's so they'll be cheap. But it's only really great if a new custom variant is built. Sounds kind of contradictory.
  • by Anonymous Coward on Sunday May 28, 2006 @10:09AM (#15420284)
    Methinks that the point was that if a GAME development company is going to fork over the cash for ASM wizards, a company spending a few hundred mil. building a super-computer might just consider doing the same. Maybe.

    And I know from Uni that many profs WILL hand optimize code for complex, much used algorithms. Then again, some will just use matlab.
  • by Anonymous Coward on Sunday May 28, 2006 @10:22AM (#15420330)
    x86, the commodity, has registers from the days when RAM was faster than the CPU (ie 8-bit days)

    The tacked on FPU, MMX, SSE SIMD stuff whilst welcome still leaves few registers for program use

    The PowerPC on the otherhand has a nice collection of regs, and as good if not better SIMD--The CELL goes a big step further

    More regs = more varibles in the CPU = higher bandwidth of calculation
    be they regular regs or SIMD regs.
      That plus the way it handles cache
    Could be a pig to program without the right kind of compiler optimizing
    Would that mean game developers using FORTRAN 95?
  • by penguin-collective ( 932038 ) on Sunday May 28, 2006 @10:27AM (#15420340)
    Except for a tiny minority of specialists, most scientific programmers, even those working on large-scale problems, have neither the time nor the expertise to hand-optimize. Many of them don't even know how to use optimized library routines properly.
  • by jmichaelg ( 148257 ) on Sunday May 28, 2006 @11:41AM (#15420570) Journal
    Lest anyone think they actually ran "several scientific application kernels" on the Cell/AMD/Intel chips, what they actually did was run simulations of several different tasks such as FFT and matrix multiplication. Since they didn't actually run the code, they had to guess as to some parameters like DMA overhead. They also came up with a couple of hypothetical Cell processors that dispatched double precision instructions differently than how the Cell actually does it and present those results as well. They also said that IBM ran some prototype hardware that came within 2% of their simulation results, though they didn't say which hypothetical Cell the prototype hardware was implementing.

    By the end of the article, I was looking for their idea of a hypothetical best-case pony.

  • by zCyl ( 14362 ) on Sunday May 28, 2006 @03:59PM (#15421501)
    Hand optimization or writing portions of code in assembler is
    the last thing 85% of these people want to do. They don't want
    to be computing experts to do their science/research.

    When you're talking about reuseable modules like an FFT or matrix multiplication, then many scientists doing simulations would love to have a hand optimized FFT or matrix module to plug in as a simulation component. Even if they don't know a drop of assembly themselves, having the optimized module available can make a large difference in running time for big simulations.
  • by Sycraft-fu ( 314770 ) on Sunday May 28, 2006 @04:40PM (#15421648)
    Hey it makes a real difference. There's a great quote that shows up on /. from time to time that goes along the lines of "The difference between tehory and reality is that in theory there's no difference but in reality there is."

    Researchers are very good at simulating things that have little or nothing to do with reality. It all looks good in theory according to their formulas, but they fail to take something in to account. As an example take the defunct Elbrus E2K computer chip. It was supposed to be an awesome processor that would kick the crap out of anything Intel or AMD offered. It was being designed by people with real computer experience, Elbrus made several Soviet supercomputers. Basically, the chip was to be their Elburs 3 supercomputer reimplemented on one chip.

    Everything looked good in simulations... But obviously nothing has ever come of it. The E2K never hit the market, and it and followups have been nothign but vapourware. Why? Well again, because of the difference between theory and reality. The design was all well and good on a VHDL simulator, but the hard part of chip design is not developing some powerful stuff in VHDL, it's developing powerful stuff that can be actually fabbed to a real chip.

    So as with anything like this, I reserve judgement until I see real silicon. To me this looks like people getting overly excited about something that doesn't exist yet. Yes, the Cell is good in theroy, we know that, that's not the issue. The issue is how will it really perform against other chips running real code. That we don't know, and won't know for some time. One simple issue that will have to be dealt with is compiler inefficiencies. Most sicentific code isn't written in assembly, often it's Fortran. Well, if there's one thing Intel's got it's a rockin' Fortran compiler. So even if the Cell's units are actually more pwoerful in theory, if the code it gets isn't optimized it may not matter.

    Either way, any time I hear things about what an amazing jump forward some new tech will be, I am skeptical. It just generally seems that doesn't happen. Improvements happen in small jumps, not nearly an order of magnitude of increase (which is what they are claiming with the 8x faster stat).
  • by MonaLisa ( 190059 ) on Sunday May 28, 2006 @05:18PM (#15421788)
    The authors discuss hand tuning and assembler coding for Cell, but not necessarily for the other processors. Their 2D FFT results, for example, are a factor a 10 slower than others I have seen. Also, for the IA64 and Opteron, the performance many of these numerical kernels are highly dependent on the compiler used. The IA64 especially is very sensitive to compiler optimization to keep the 6 pipeline slots busy and also generate memory prefetch instructions at the right time to prevent stalling. As often seems to occur in these sorts of HPC comparisons, they spend a lot of time hand opitmizing for a particular platform, and compare it to other platforms that have not necessarily received the equivalent effort. As has been noted above, how much time you have to spend developing, debugging, and tuning a code matters a lot. This is particularly true for research codes. Finally, who uses single precision for scientific computing anymore? Any field that I am aware of that would use large FFTs, large linear algebra solvers, etc. requires at least double precision to get anything meaningful.
  • by Anonymous Coward on Sunday May 28, 2006 @05:24PM (#15421811)
    I heard about labs spending the time and effort to make custom stuff, but by the time they're done, the off the shelf hardware had already caught up

    Haha, dude, have you ever run tests that take weeks to complete? The FLOPS improvement shown in that paper is around a factor 8 compared to AMD64 machines. You jump from weeks to days in simulation time. That is HUGE.

    As for the development time, doing a basic optimisation will give already give you a great boost in performance. You do not have handcode each and every instruction/function. As a side note, we already spend weeks on optimising pieces of code for SSE/SSE2/SSE3. I would guess using another set of assembler would not delay us too much. Especially if we can gain 8x performance.

    Our lab also does video coding, processing 8 times faster would mean that we can go from demonstrating our technology on 352x288 (CIF) sequences to demonstrating it on 720p (HD) sequences. That is if we keep it realtime, or we could process 8 CIF streams at once. Now that is WAY impressive.
  • by Anonymous Coward on Sunday May 28, 2006 @08:52PM (#15422549)
    If a simulation will run for several months, saving a weeks worth of run time is adventageous. That could translate into more time to do analysis, publishing sooner than a competitor, reduced overhead, etc.

    But as with any of the examples you gave, a cost benefit analysis needs to be considered.

    And with any optimization strategy, it is often better to use better data structures than to tune serial instruction streams.

    For the Cell, this might translate into reshaping data chunks to better fit the local processor environment.

In 1869 the waffle iron was invented for people who had wrinkled waffles.