Slashdot Log In
Japan's Petaflop Supercomputer
Posted by
CmdrTaco
on Sun Jul 30, 2006 08:38 AM
from the renders-a-million-tentacles-a-minute dept.
from the renders-a-million-tentacles-a-minute dept.
slashthedot writes "Japan has built the fastest supercomputer in the world. While the BlueGene/L contains 130,000 processors, Japan has managed to create the first Petaflop supercomputer, called MDGrape-3, with just 4808 chips, and it cost just $9 million to develop."
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Wow (Score:5, Funny)
Re:Apparent source page for device data (Score:4, Funny)
(http://www.lwacaw.com/)
I've seen the videos of it a few times and stumbled across entire collections of them! they call it something like bukkake.
Progress (Score:5, Informative)
(http://godgab.org/)
machines like this (Score:2, Interesting)
Every one is so concerned with internet safety, on would think that at some point massive resources with be set forth in order to effectively deal with the flaw finding few out there making it difficult for the rest of to simply enjoy the benefits of the internet.
Re:machines like this (Score:5, Insightful)
Re:machines like this (Score:4, Informative)
Well the examples that you mention are not really the same as "attempting to break software and search for problems long before release." If I understand these issues correctly: (1) (with apologies to crypto specialists) RC5 cracking required lots of CPU time to factor a big-ass number, (2) projects like Folding@Home aren't "looking for a cure for cancer," they're running (I think) quantum chemistry simulations to find out how certain molecules can act in certain situations, and (3) SETI@Home is looking for specific patterns in signal data. In all three of these cases, there's a few common (maybe not so simple) operations that need to be applied to a large set of data or initial conditions, and that's why they need lots of machines, or fast machines.
Figuring out how clever people will take advantage of a particular implementation of a web browser or TCP/IP stack is a completely different class of problem IMHO. Yeah, maybe there's some clever AI techniques that may simulate attack attempts, and maybe they could come up with attacks that nobody has thought of yet, but a really fast computer will not somehow magically solve these kinds of problems for us. There's a lot of hard science and software engineering that needs to be done first.
Efficiency (Score:3, Interesting)
(http://godgab.org/)
Re:Efficiency (Score:4, Insightful)
(http://www.mithral.com/~beberg/)
Of course, if you compare USEFUL results, it's Folding@home: lots (over 50 papers), SETI: 0
The Japan box will be faster for a little while then Folding@home, but will also likely produce RESULTS instead of just alot of global warming.
Incorrect chip count (Score:5, Informative)
Re:Incorrect chip count (Score:5, Informative)
http://mdgrape.gsc.riken.jp/modules/tinyd0/index.
Purchasing Advice (Score:5, Funny)
Uses a large walk-in closet? (Score:5, Interesting)
(http://members.tripod.com/RomanaImperia | Last Journal: Friday April 22 2005, @03:20PM)
Not just a flop (Score:4, Funny)
(http://slashdot.org/~davidwr/journal/ | Last Journal: Friday November 09, @09:19PM)
"Not just a flop, but a flop a million billion times over."
cheaper and more efficient (Score:3, Insightful)
but the good think about it is that it is more energy efficient. it seems the trend in desktop/servers right now are also going to the supercomputers. maybe they could include a performance per watt ratio in the top500 list as well.
Say what?!? (Score:3, Informative)
Japan has managed to create the first Petaflop supercomputer, called MDGrape-3, with just 4808 chips...
FLOP = floating operation [per second].
PETA = 10 ^ 15, or "a quadrillion".
(10 ^ 15) / 4808 = about 207,986,688,852, which would indicate that each chip is running at several hundred TERA-hertz [and, even then, the machine would have to possess an operating system so efficient that it could consistently perform one floating point operation per clock increment, which seems extraordinarily unlikely].
Or is this an "analog" computer and are these "analog" FLOPS?
And no, I did not RTFA.
Re:Say what?!? (Score:5, Informative)
(Last Journal: Tuesday May 04 2004, @09:18PM)
The Cell processor is not running at 200GHz. There's this concept called 'parallelisation', it's how your graphics card can do dozens, if not hundreds, of operations per clock cycle. In Cell's case it can do 8 (number of SPUs) * 4 (128-bit registers, SIMD) * 2 (units) = 64 SP FLOPS per clock cycle, and that's not including the PPU which has VMX128 and an FPU itself.
However make the Cell processor calculate IEEE conformant FLOPS, and it gets a double precision score of around 20GFLOPS. Still good though.
The above was from memory, details may vary, figures are roughly correct, YMMV, etc.
Re:Say what?!? (Score:4, Informative)
(http://www.hollinger.net/ | Last Journal: Thursday April 14 2005, @03:43PM)
Quoting another link you can see how they reached these numbers (which I take issue with):
- http://mdgrape.gsc.riken.jp/modules/tinyd0/index.
With that answered, I'm confused. Another poster sent along that link which explains what Riken will do. I'm confused about that actually. Reading the page, based on the verb usage, either someone didn't understand future and past tense (possible, but unlikely), or they haven't built the entire box yet. Perhaps I'm reading a bit too much into it... it's quite possible that someone simply hasn't updated the website.
Based on the webpage, all of the calculations to reach 1 petaflop are based on theoretical peak performance measurements, extrapolated from the theoretical peak of a single special-purpose ASIC which has been built, but may or may not have been actually placed into a fully configured system. Nothing talks about measured benchmarks, and the OP's article contains the same theoretical extrapolated numbers.
Anyone know if they've actually built it?
~ Mike
giga not tera (Score:4, Insightful)
Furthermore, many processor architectures have instructions to do several basic floating point instruction in one step. For instance, PowerPC has a one-cycle multiply-accumulate instruction (multiply and add in one step), so for marketing purposes, a PowerPC has twice the flops. Now, imagine if you have a vector processor that has a highly-optimized instruction for taking square roots or doing trig in one cycle. A square root operation will translate into dozens of basic flops (add, multiply, subtract). Such a processor might therefore be rated at 208 gigaflops even though its operating frequency is <1GHz.
Petaflop? (Score:2)
9 million? (Score:4, Insightful)
Our penis so small, your american penis so large.. (Score:3, Insightful)
(http://libtom.org/)
Where are the really neato results we should be getting from these? I'm tired of "Country X builds massive TeraWatt computer system." I want to read about "Country X mapped the cancer genome" or some such.
Besides, these are relatively not impressive. Sure in the 50s, 60s, 70s, 80s we were maturing the technology. Inventing new technology, analyzing it, etc. Now it's more of the same. Huge budget, lots of space and infiniband connections...
Show me the MFlops/Watt rating of this? Are they improving it? Are we wasting less resources? The irony of this is they pollute by wasting tons of energy, all so we can predict global warming or whatever.
Tom
Re:Our penis so small, your american penis so larg (Score:5, Informative)
(http://trolltalk.com/ | Last Journal: Sunday November 11, @07:43PM)
"Show me the MFlops/Watt rating of this?"
No problemo!
The number of flops: (10 ^ 15) / 4808 = about 207,986,688,852 flops per chip, - from a previous poster.
The number of watts: 300,000 - from the manufacturers' site = 62 watts/chip
207,986,688,852 / 62 = 33,546,240 flops (33 MFlops) / watt.
Re:Our penis so small, your american penis so larg (Score:5, Insightful)
And if you're trolling, yeah, you got me, so congratulations.
4808 chips -- Alas, it is still bottlenecked by... (Score:2, Funny)
Vector Processing? (Score:1)
(http://slashdot.org/~NousCS | Last Journal: Saturday March 12 2005, @03:14PM)
What specialized hardware? I would really like to read a more technical article about this machine. I would guess that the Japanese focused on vector processing like they did in the design of the Earth-Simulator [wired.com].
The best supporting evidence I have for this conclusion is the comparison of Japan's last two supercomputers:
Sun Fire X64 Cluster [top500.org]
Earth-Simulator [top500.org]
Sun Fire has 10,368 processors with a Rmax(GFlops) of 38,180.
Earth-Simulator has 5,120 processors with a Rmax(GFlops) of 35,860.
That's 49% less processors with 94% the processor power*.
Here's the original article link:
http://www.businessweek.com/globalbiz/content/jul
*Only comparing one aspect of performance.
Weather Predictions Expalined! (Score:1)
It all makes sense now. When they predict 90% chance of rain three days in a row and we don't see a drop, they relly meant that it will rain sometime between now and thirty or forty years from now.
"Computer" ? (Score:2)
Oh just... (Score:1)
glxgears (Score:2, Funny)
Not even close! (Score:2, Insightful)
MadDog Grape is my favorite flavor too! (Score:2)
(http://www.slashdot.org/ | Last Journal: Wednesday December 20 2006, @03:29PM)
Does this deserve Top 500? (Score:2)
Personally, I don't think it should qualify. Otherwise the EFF's $250,000 Deep Crack, which could only crack DES (although faster than tens of thousands of regular computers at that time), would qualify too.
New Blue? (Score:2)
(http://slashdot.org/~Doc%20Ruby/journal | Last Journal: Thursday March 31 2005, @01:48PM)
Yes but........ (Score:1)
Not comparable (Score:3, Informative)
(http://zzz.zggg.com/)
Darn algorithms! (Score:2)
(http://www.mithral.com/~beberg/)
Not all progress needs to be brute force. But brute force is much more fun to brag about.
-
flops don't replace skill (Score:1)
(http://some.where.else/)
Experts believe that the nation with the most machines near the top of the ranking generally has the most competitive economy.
Oh come on - were these American experts by chance? How about flops/head? But lets think for a moment. Do raw flops count, or is it what you do with them? Once you have a big computer, it's easy to generate lots of numbers. The art of science, though, is to abstract your question, so you can make some useful predictions. Otherwise you might as well just measure the world that's out there, in all its complexity.
More tech specs (Score:1)
This is the future for supercomputing. (Score:1)
Lets do order of magnitude computations here, pair of general purpose cpu cores use about 100M transistors not counting cache. An adder takes 1000 transistors. So with cpu:s transistor budget you get 100000 adders running in parallel. In overall the performance difference would be 1000x for the asic design over general purpose solution. As for not counting cache is important since you probably want the ondie storage for the temporary values, and caches transistor density is far higher than logics. And thats not the best case not worst case scenario but more or less what to expect in general rule if you don't saturate the memory in which case you should add more or faster memory channels or change algorithm for less bandwith limited, still can make trade offs that no off the shelf CPU could reasonably make. In overall you still get atleast 10x performance increase over going for standard cpus. So expect 1000x to 10x on code that runs EXTREMELY optimally on general purpose chip. Of course you CAN construct a case where general purpose computer beats the special purpose one. But more than often that case cannot use lots of processors as once you can parallerize the special purpose wins.
The problem with special purpose is that you cannot do everything, you can do one thing and that thing VERY WELL.
You just change the control logic to a logic solving the problem.
AI (Score:1)
With such great power and such few processors, this will cause other (but not all) computing technology to migrate in that direction.
I can see the average PC doing 15 Terra flops with in the next 5 years. This, if I am accurate, would put the home PC in the processing realm of the human brain. Is it possible that an AI which could pass the Turing test with near 100% of the subjects is not long behind? Humanoid robots and robotic transportation?
Should we put a "Three Laws Treaty" on the international table?
Better Anime? (Score:1)
Precision? (Score:2)
Comparison MDGrape-3, BlueGene/L & Earth Simul (Score:2, Informative)
(http://itnomad.wordpress.com/ | Last Journal: Monday August 25 2003, @02:21PM)
http://www.bloglines.com/blog/ITnomad?id=126 [bloglines.com]
Cheers, Alex.
It's the topology, sillypants! (Score:1)
It is theorized that a complex tolopogy resembling a four-dimensional Hello Kitty will run roughly twenty times as fast.
~
Idiotic summery. (Score:3, Informative)
it does nothing but calculate 1/sqrt(dx^2+dy^2+dz^2)*variable, but really really often.
Grape 6, 5 years or so ago, was already running at 200Mhz, had a throughput of one force calculation per pipleline and 6 pipelines on once chip. So it counts as 1.2 billion force calculations, each being (1* inverse, 1 sqrt, 3 adds, 3 squares, 2 fmul, ect).
A lot of flops, but totally useless as general purpose computers.
Singularity (Score:2)
(http://www.pontifier.com/)
Google's facility qualify as a super computer? (Score:1)
It did not cost $9m to develop. (Score:2)
(http://sc.tri-bit.com/ | Last Journal: Sunday July 08, @02:36AM)
In short, Riken had almost nothing to do with the process, except for the design of the single custom chip involved, and even then, most of the work was done by outside firms who wanted the press. And even then, it still cost the host organization $9 million!
It is petaflops not petaflop (Score:1)
From the article.. (Score:1)
No, but they should be worried when a 'technology magazine' sees the need to explain that 298 is a larger number than 250.. Yes, this might be shocking, but after you substract 298 from 500, you are only left with 202. And no, 202 is not larger than 298, even if you take the whole of it. So, yes, if you have 298 apples of a total of 500, noone will be able to have more than you. Next, we will have a closer look at the letter 'G'.
cluster (Score:1)
(http://mattbrundage.com/)
Re:Yeah (Score:3, Funny)
(http://www.modthemovies.com/ | Last Journal: Saturday October 27, @11:59PM)
Y'know, I have a feeling I should really post this as anonymous coward.
Re:Imagine... (Score:5, Funny)
(Last Journal: Tuesday May 16 2006, @08:46PM)
With a side order of hot grits!
A tip: if you can fit your message in the subject line, then do it, particularly when you
I remember back when that comment would have gotten +5 "Whoa duuuuude" mods.
Yet you can still get good mods if you say:
"A petaflop that fits in a closet for just $9M for the first one? You could make more for a couple million, at least by the time you got your [impressive knowlegeable-sounding ultra-tech adjectives] cluster interconnect together - why not spend a quarter of a billion and push the limits of computing out another couple orders of magnitude? This thing can do protein folding, so it can likely do bomb physics and a bunch of other big-money problems that can be represented in similar math."
Which translates to:
"Imagine a Beowulf cluster of these!"
Re:bullshit alert!! (Score:1)
(http://www.osgeek.blogspot.com/)
Re:bullshit alert!! (Score:2)
(http://theravensnest.org/ | Last Journal: Sunday October 07, @07:05AM)
Firstly, it has been a long time since processors only managed to do one instruction per clock. Modern chips do about 8. That alone means that 200GFLOPS equates to about 25GHz.
Next, you get SIMD instructions. This lets a single instruction work on multiple data elements in parallel. Most modern CPUs have 4-way SIMD, but 8-way is not unheard of. This brings it down to 3.125GHz.
Now, factor in the fact that you can get 2-4 cores in a single chip. This brings it to between 800MHz and 1.5 GHz. There is hardly a spectacular clock speed. If the chip is optimised for a particular operation (as these are) then it is hardly beyond the realms of possibility. Oh, and by the way, the NVIDIA 7800 GTX gets 200GFLOPS, so it's not even that unusual.
Re:Imagine... (Score:2)
(http://www.faqs.org/rfcs/rfc3675.html)