Japan's Petaflop Supercomputer 161
slashthedot writes "Japan has built the fastest supercomputer in the world. While the BlueGene/L contains 130,000 processors, Japan has managed to create the first Petaflop supercomputer, called MDGrape-3, with just 4808 chips, and it cost just $9 million to develop."
Wow (Score:5, Funny)
Re:Wow (Score:2)
Re:Apparent source page for device data (Score:4, Funny)
I've seen the videos of it a few times and stumbled across entire collections of them! they call it something like bukkake.
Progress (Score:5, Informative)
1,500 $ (Score:2)
Re:1,500 $ (Score:3, Insightful)
Re:1,500 $ (Score:2)
And calling this a Petaflop supercomputer is similarly misleading, for roughly the same reason. The PS3 gets its 2 TF from the GPU, which can process 384 flops per cycle in an architecture built specifically to shade pixels. Likewise this MDGrape-3 is built at the hardware level to solve the n-Body problem, and that's it.
Re:1,500 $ (Score:2, Interesting)
Re:1,500 $ (Score:2)
True - but we're talking general purpose operations here.
Re:1,500 $ (Score:2)
Re: (Score:1)
Re:Progress (Score:2)
Re:Progress (Score:2)
Sure.
RSX: You could put a more powerful GPU into a PC and get better performance numbers, so why count GPU performance power? Also, you cannot do 64-bit floating-point math with ANY GPU at the moment, and has non-IEEE-standard accuracy, so remove it from the equation.
Each SPE can do 25.6 Gflop/s theoretical (180 Gflop/s for all 7), but only for 32-bit (non IEEE-standard) values. For 64-bit accuracy, tests have shown the thorou
500Gflop with one computer chip for cheap... (Score:1)
Not Quite Progress Yet (Score:3, Informative)
machines like this (Score:2, Interesting)
Every one is so concerned with internet safety, on would
Re:machines like this (Score:5, Insightful)
Re:machines like this (Score:1)
If the resources are available to crack rc5, to do distributed based work on a cure for cancer, and crunch data captured from radio antennas in search of little green men from mars, then I think we have the know-how necessary get some thing like this up and running.
It mak
Re:machines like this (Score:4, Informative)
Well the examples that you mention are not really the same as "attempting to break software and search for problems long before release." If I understand these issues correctly: (1) (with apologies to crypto specialists) RC5 cracking required lots of CPU time to factor a big-ass number, (2) projects like Folding@Home aren't "looking for a cure for cancer," they're running (I think) quantum chemistry simulations to find out how certain molecules can act in certain situations, and (3) SETI@Home is looking for specific patterns in signal data. In all three of these cases, there's a few common (maybe not so simple) operations that need to be applied to a large set of data or initial conditions, and that's why they need lots of machines, or fast machines.
Figuring out how clever people will take advantage of a particular implementation of a web browser or TCP/IP stack is a completely different class of problem IMHO. Yeah, maybe there's some clever AI techniques that may simulate attack attempts, and maybe they could come up with attacks that nobody has thought of yet, but a really fast computer will not somehow magically solve these kinds of problems for us. There's a lot of hard science and software engineering that needs to be done first.
Re:machines like this (Score:2)
It's not about speed.. (Score:1)
It's not about being fast. It's about creative ways to do things that interfaces weren't intentended for.
Your idea would work out as soon as you have a way to replace artists with computers.
Actually (Score:1)
In fact, you could put thousands of these machines together for less than 10 billion. For 10 billion dollars you could crack any reversible cryptographic algorithm in the universe on a weekend. I call that world domination.
Maybe Gates still has interesting things to do with his life after all.
Efficiency (Score:3, Interesting)
Re:Efficiency (Score:1)
Re:Efficiency (Score:1, Offtopic)
Re:Efficiency (Score:1)
Re:Efficiency (Score:3, Interesting)
This computer is efficient at what it does largely because it's extremely specialized. It's built specifically for working on molecular dynamics, but from the looks of things, it's probably close to useless for nearly anything else.
As such, it would probably work quite nicely for Stanford's folding@home project (which studies protein folding, i.e. molecular dynamics). It probably would not work very well for seti@home, bec
Re:Efficiency (Score:4, Insightful)
Of course, if you compare USEFUL results, it's Folding@home: lots (over 50 papers), SETI: 0
The Japan box will be faster for a little while then Folding@home, but will also likely produce RESULTS instead of just alot of global warming.
Re:Efficiency (Score:2)
Ah, I wasn't aware of that -- I mentioned SET primarily because the OP did. My own spare cycles all go to F@H...
Quite true -- and IMO, likely to remain that way (and thus, my decision about where to contribute...)
Incorrect chip count (Score:5, Informative)
Re:Incorrect chip count (Score:3, Informative)
Are there any good articles on this machine that anyone would care to share?
Re:Incorrect chip count (Score:5, Informative)
http://mdgrape.gsc.riken.jp/modules/tinyd0/index.
MOD PARENT UP (Score:2)
As someone else already said, and mentioned in Parent's link, this is a very specific machine, for Molecular Dynamics simulations, everything from memory handling to processing is optimized only for handlig particles and doing force calculations on them. Therefore, it'll serve a relatively small market.
That said, I'm very curious to see how fast it'll run gromacs [gromacs.org], the MD program I use. This is pretty optimized for parallel simulations already, and I'm able to do the calculations I
Re:Incorrect chip count (Score:2)
I wonder how much lower they could have pushed the power draw by using a 90nm or 65nm fab?
note: The system "cost" $9 mil because... that's what their budget was. The chip builders ate some of the cost.
Re:Incorrect chip count (Score:2)
OK, since your use of "we" suggests you are somehow involved (which I doubt), I checked the Riken site (http://www.rikenresearch.riken.jp/roundup/31/) which states
"MDGRAPE-3 is a large system that consists of 201 units of 24 MDGRAPE-3 chips, 64 parallel servers each containing 256 of Intel's newest Xeo
Purchasing Advice (Score:5, Funny)
Uses a large walk-in closet? (Score:5, Interesting)
Re:Uses a large walk-in closet? (Score:2)
Because nobody is writing paralleisable code, or if you like, computer languages don't readily support multi-threaded code. It's always a construct verging on a hack that frequently goes horribly, horribly wrong. Until multi-threading in languages is as seamless and usable as calling a sub routin, parallel computing will never take off.
Re:Uses a large walk-in closet? (Score:2)
Typically, they use libraries (not built-in language features) to do it.
And it's not done using multi-threading.
What isn't that common yet is consumer apps that are parallelized. Scientific apps got there a decade ago.
Re:Uses a large walk-in closet? (Score:2)
Re:Uses a large walk-in closet? (Score:1)
The cost of this computer is actually much higher than $9 million. If you rtfa, you'll see that much of the computer was effectively donated by outside companies. The CPU design was done by Hitachi. Intel supplied other hardware as well as SGI Japan. None of this is factored into the $9 mil. It's likely that the actual cost was many multiples of that.
Not just a flop (Score:4, Funny)
"Not just a flop, but a flop a million billion times over."
cheaper and more efficient (Score:3, Insightful)
but the good think about it is that it is more energy efficient. it seems the trend in desktop/servers right now are also going to the supercomputers. maybe they could include a performance per watt ratio in the top500 list as well.
Specialised (Score:3, Informative)
Say what?!? (Score:3, Informative)
Japan has managed to create the first Petaflop supercomputer, called MDGrape-3, with just 4808 chips...
FLOP = floating operation [per second].
PETA = 10 ^ 15, or "a quadrillion".
(10 ^ 15) / 4808 = about 207,986,688,852, which would indicate that each chip is running at several hundred TERA-hertz [and, even then, the machine would have to possess an operating system so efficient that it could consistently perform one floating point operation per clock increment, which seems extraordinarily unlikely].
Or is this an "analog" computer and are these "analog" FLOPS?
And no, I did not RTFA.
Re:Say what?!? (Score:5, Informative)
The Cell processor is not running at 200GHz. There's this concept called 'parallelisation', it's how your graphics card can do dozens, if not hundreds, of operations per clock cycle. In Cell's case it can do 8 (number of SPUs) * 4 (128-bit registers, SIMD) * 2 (units) = 64 SP FLOPS per clock cycle, and that's not including the PPU which has VMX128 and an FPU itself.
However make the Cell processor calculate IEEE conformant FLOPS, and it gets a double precision score of around 20GFLOPS. Still good though.
The above was from memory, details may vary, figures are roughly correct, YMMV, etc.
Point (Score:2)
Re:Say what?!? (Score:2)
Re:Say what?!? (Score:3, Interesting)
I think this was an ibm/fujitsu collaboration and ibm had md-grape and dropped it because of the market and fujitsu is still making the grape..
FYI the reaso
Re:Say what?!? (Score:4, Informative)
Quoting another link you can see how they reached these numbers (which I take issue with):
- http://mdgrape.gsc.riken.jp/modules/tinyd0/index.
With that answered, I'm confused. Another poster sent along that link which explains what Riken will do. I'm confused about that actually. Reading the page, based on the verb usage, either someone didn't understand future and past tense (possible, but unlikely), or they haven't built the entire box yet. Perhaps I'm reading a bit too much into it... it's quite possible that someone simply hasn't updated the website.
Based on the webpage, all of the calculations to reach 1 petaflop are based on theoretical peak performance measurements, extrapolated from the theoretical peak of a single special-purpose ASIC which has been built, but may or may not have been actually placed into a fully configured system. Nothing talks about measured benchmarks, and the OP's article contains the same theoretical extrapolated numbers.
Anyone know if they've actually built it?
~ Mike
giga not tera (Score:4, Insightful)
Furthermore, many processor architectures have instructions to do several basic floating point instruction in one step. For instance, PowerPC has a one-cycle multiply-accumulate instruction (multiply and add in one step), so for marketing purposes, a PowerPC has twice the flops. Now, imagine if you have a vector processor that has a highly-optimized instruction for taking square roots or doing trig in one cycle. A square root operation will translate into dozens of basic flops (add, multiply, subtract). Such a processor might therefore be rated at 208 gigaflops even though its operating frequency is <1GHz.
Re:giga not tera (Score:2)
I love acronyms that are explained incorrectly..
Floating Point Operations Per Second per cycle
If you assume that the reader doesn't know the meaning, then just write it out to begin with.
incorrect chip count... (Score:2)
19,122 Xeons.
(1 * 10 ^ 15) / (2 * 10 ^ 4 ) = 5 * 10^10.
That's 50 billion floating-point operations per second. If each Xeon is dual-core, it's 25 billion ops per core per second. If they're 4GHz processors, then it's 6.1 ops/cycle. I'm not sure how it achieves that. Even multiply-add fused instructions only do 2 ops per cycle.
I still have to ask if this is achiveable.
Re:incorrect chip count... (Score:2)
Re:Say what?!? (Score:2)
It implies nothing of the sort. A single chip could have several floating point pipelines.
Re:Say what?!? (Score:2)
floating-point operations per cycle, for 1.15 e 15 flops/sec.
The Xeons do not contribute to the total; they essentially
act as the microcode program that tells the vector units
what to do next.
While optimized for moldyn, it would be readily repurposed
for a wide range of large-scale computations, including
solving massive ensembles of linear systems. Indeed, I
would be quite pleased to write a Fortran-2005 compiler or
a Matlab compiler for this beast, if anyo
Petaflop? (Score:2)
PETA flop (Score:2)
9 million? (Score:4, Insightful)
Re:9 million? (Score:3, Insightful)
Remember the green cross code: Stop, Read, then Post.
Our penis so small, your american penis so large.. (Score:3, Insightful)
Where are the really neato results we should be getting from these? I'm tired of "Country X builds massive TeraWatt computer system." I want to read about "Country X mapped the cancer genome" or some such.
Besides, these are relatively not impressive. Sure in the 50s, 60s, 70s, 80s we were maturing the technology. Inventing new technology, analyzing it, etc. Now it's more of the same. Huge budget, lots of space and infiniband connections...
Show me the MFlops/Watt rating of this? Are they improving it? Are we wasting less resources? The irony of this is they pollute by wasting tons of energy, all so we can predict global warming or whatever.
Tom
Re:Our penis so small, your american penis so larg (Score:5, Informative)
"Show me the MFlops/Watt rating of this?"
No problemo!
The number of flops: (10 ^ 15) / 4808 = about 207,986,688,852 flops per chip, - from a previous poster.
The number of watts: 300,000 - from the manufacturers' site = 62 watts/chip
207,986,688,852 / 62 = 33,546,240 flops (33 MFlops) / watt.
Re:Our penis so small, your american penis so larg (Score:1)
Re:Our penis so small, your american penis so larg (Score:5, Insightful)
And if you're trolling, yeah, you got me, so congratulations.
Re:Our penis so small, your american penis so larg (Score:2)
Re:Our penis so small, your american penis so larg (Score:2)
Or was it a case of having loads of money, room and a friendly merchant at Fry's?
That's my complaint. It was different with the first Crays. Nothing like it existed before. They had to invent new technology to accomplish it. This is more a case of networking via gige and optical then stacking box upon box.
Tom
Re:Our penis so small, your american penis so larg (Score:2)
magnitude difference in flops between that kind of commodity system and MDGRAPE-3.
Gig-E is a pretty sad sort of MPP inte
4808 chips -- Alas, it is still bottlenecked by... (Score:2, Funny)
Vector Processing? (Score:1)
What specialized hardware? I would really like to read a more technical article about this machine. I would guess that the Japanese focused on vector processing like they did in the design of the Earth-Simulator [wired.com].
The best supporting evidence I have for this conclusion is the comparison of Japan's last two supercomputers:
Sun Fire X64 Cluster [top500.org]
Earth-Simulator [top500.org]
Sun Fire has 10,368 processors with a Rmax(GFlops) of 38,180.
Earth-Simulator
Re:Vector Processing? (Score:2)
Actually, it probably could be used in protein folding -- but not the others.
Maybe -- but IBM has at least talked about a Blue Gene/P (P for petaFLOP). I haven't seen much about it recently, so it may be open to some question. OTOH, IB
Weather Predictions Expalined! (Score:1)
It all makes sense now. When they predict 90% chance of rain three days in a row and we don't see a drop, they relly meant that it will rain sometime between now and thirty or forty years from now.
Re:Weather Predictions not explained (Score:2)
Predicting long term weather trends is easier than daily weather conditions in your area.
When fluid dynamics and computers are to a level to handle compressible fluids at the scale needed, the predictions will still be off to places that aren't the focus. Frequently the predictions for my city only come true to part of the city.
"Computer" ? (Score:2)
Re:"Computer" ? (Score:2)
Oh just... (Score:1)
glxgears (Score:2, Funny)
Not even close! (Score:2, Insightful)
Re:Not even close! (Score:2)
Re:Not even close! (Score:2)
Hmm - this is interesting in and of itself. What I mean by this, is that here is a very specialize (and I assume, Turing complete) computer, doing one particular job, and doing it amazingly well. Now, let us suppose the simulation of particles it does according to known physics is complete (I know it isn't). If it were, then in theory it
Re:Not even close! (Score:2)
Now, let us suppose the simulation of particles it does according to known physics is complete (I know it isn't).
When I said "complete", I really meant "complete", and I also know that this isn't (currently) attainable, if it ever is (maybe with quantum computing, but maybe not). I have known about chaos since I first encountered it studying fractal algorithms as a high school student in 1989 or so (played around with them on an Apple IIe and a Tandy Color Computer 3).
I am not trying to
MadDog Grape is my favorite flavor too! (Score:2)
Does this deserve Top 500? (Score:2)
Personally, I don't think it should qualify. Otherwise the EFF's $250,000 Deep Crack, which could only crack DES (although faster than tens of thousands of regular computers at that time), would qualify too.
New Blue? (Score:2)
Not comparable (Score:3, Informative)
Darn algorithms! (Score:2)
Not all progress needs to be brute force. But brute force is much more fun to brag about.
-
Precision? (Score:2)
Comparison MDGrape-3, BlueGene/L & Earth Simul (Score:2, Informative)
http://www.bloglines.com/blog/ITnomad?id=126 [bloglines.com]
Cheers, Alex.
Idiotic summery. (Score:3, Informative)
it does nothing but calculate 1/sqrt(dx^2+dy^2+dz^2)*variable, but really really often.
Grape 6, 5 years or so ago, was already running at 200Mhz, had a throughput of one force calculation per pipleline and 6 pipelines on once chip. So it counts as 1.2 billion force calculations, each being (1* inverse, 1 sqrt, 3 adds, 3 squares, 2 fmul, ect).
A lot of flops, but totally useless as general purpose computers.
Singularity (Score:2)
Re:Singularity (Score:2, Funny)
Re:Singularity (Score:2)
What do you mean? That some super-intelligent AI created somewhere in the world, would want to have an active part in human
If you think this... (Score:2)
The honest answer is "we don't know", and that we should continue on (for whatever that means) doing what we do...
It did not cost $9m to develop. (Score:2)
In short, Riken had almost nothing to do with the process, except for the design of the single custom chip involved, and even then, most of the work was done by outside firms who wanted the press. And even then, it still cost the host organization $9 million!
Re:Yeah (Score:3, Funny)
Y'know, I have a feeling I should really post this as anonymous coward.
For once the subtitle is right on (Score:2, Informative)
Re:Imagine... (Score:5, Funny)
With a side order of hot grits!
A tip: if you can fit your message in the subject line, then do it, particularly when you
I remember back when that comment would have gotten +5 "Whoa duuuuude" mods.
Yet you can still get good mods if you say:
"A petaflop that fits in a closet for just $9M for the first one? You could make more for a couple million, at least by the time you got your [impressive knowlegeable-sounding ultra-tech adjectives] cluster interconnect together - why not spend a quarter of a billion and push the limits of computing out another couple orders of magnitude? This thing can do protein folding, so it can likely do bomb physics and a bunch of other big-money problems that can be represented in similar math."
Which translates to:
"Imagine a Beowulf cluster of these!"
Re:Imagine... (Score:2)
Re:Imagine... (Score:2)
Re:bullshit alert!! (Score:2)
Firstly, it has been a long time since processors only managed to do one instruction per clock. Modern chips do about 8. That alone means that 200GFLOPS equates to about 25GHz.
Next, you get SIMD instructions. This lets a single instruction work on multiple data elements in parallel. Most modern CPUs have 4-way SIMD, but 8-way is not unheard of. This brings it down to 3.125GHz.
Now, factor in the fact that you can get
Re:It is petaflops not petaflop (Score:2)