flaming-opus - Slashdot User

Comment Cores per memory controller. (Score 1) 251

by flaming-opus on Monday December 08, 2008 @07:19PM (#26040943) Attached to: IEEE Says Multicore is Bad News For Supercomputers

I'd love to see each core on a massively multicore design get its own memory controller. I'm not holding my breath, however. If you think of a 32-core CPU, it's pretty unlikely that most supercomputer or cluster vendors are going to pay for 32 dimms for each cpu socket. So then you're talking about multiple memory channels per memory stick. You can still get ECC using 5 memory chips per channel, so you can imagine 4 channels fitting on a memory riser. Cray does this on the X2. Then 32 channels would only require 8 dimms, which is reasonable. Then what do you do for 64-core CPUs?

It's tricky, and the problem for the market is that it's expensive. Can you get the commodity CPU vendors interested in such a thing, given that most of their addressable market is not in the supercomputing space?

I think We're gonna see more cores in a CPU that there's bandwidth to use. They might increase the bandwidth a bit, but probably just enough to get good linpack numbers.

Comment Vectors yes, but the bandwidth to use them? (Score 1) 251

by flaming-opus on Monday December 08, 2008 @07:08PM (#26040809) Attached to: IEEE Says Multicore is Bad News For Supercomputers

vector processing in commodity designs isn't enough. Of course we are going to see it, at this point it's not very expensive to add. Adding vector processing for increased flops is easy. The hard part is the bandwidth. One of the reasons the X1 processors were expensive, was that they were custom, but so are the network chips in commodity-cpu supers, and they only add $1000/node. The real cost of X1-style memory is that you have 64 channels of memory, which is a lot of wires, dimms, memory parts, etc. There's a very real cost to all the memory components needed to get the kind of bandwidth you need to support a high-throughput vector pipeline.

The commodity processor vendors aren't going to do this sort of thing, as it adds to the cost of the chip, but provides nothing to the bulk of their customers who are running mysql, apache, or halflife.

The one hope I have is something like the core2 architecture, where ddr3 is used for desktop processors, and fbdimm is used for server parts. The two components share a lot of architecture, and only a few of the asic cells are different. If a cpu vendor were interested in the HPC market, they could design a cpu to use a standard memory channel for desktop/low-end server parts, and something more expensive, but higher bandwidth for the HPC space. It would mean HPC specific processors, but sharing most of the engineering with the commodity part. Maybe Cray could license them the design for their weaver memory controller in the X2. It's kind of like the AMB on a FB-DIMM, but it includes 4 channels of DDR2 on each stick of memory.

Comment Re:Time for vector processing again (Score 1) 251

by flaming-opus on Monday December 08, 2008 @06:53PM (#26040603) Attached to: IEEE Says Multicore is Bad News For Supercomputers

Not quite.
see http://www.pnl.gov/science/images/highlights/computing/cray.pdf
http://www.pnl.gov/topstory.asp?id=320
http://www.cacr.caltech.edu/news/story.cfm?ID=30

Comment Re:Time for vector processing again (Score 1) 251

by flaming-opus on Friday December 05, 2008 @01:03PM (#26004265) Attached to: IEEE Says Multicore is Bad News For Supercomputers

The problem is that no idea doubles the rate at which supercomputers advance. Most of the ideas out there jump foreward, but they do it once. Vectors, streams, reconfigurable computing. All of these buzzwords once were the next big thing in supercomputing. Today everyone is talking about GPGPUs. None of them go very far. How much engineering goes into the systems? How long does it take to get to market? How difficult is it to rewrite all the algorithms to take advantage of the new machine? What proportion of the codes see a real advantage on the new machine? Can your company stay afloat long enough to reap the rewards? (remember that supercomputing is a tiny niche market compared to computing in general.)

I've seen a lot of "game changing ideas" come along in the supercomputing world. Commodity computing is the only one left.

Comment Re:Time for vector processing again (Score 1) 251

by flaming-opus on Friday December 05, 2008 @12:53PM (#26004141) Attached to: IEEE Says Multicore is Bad News For Supercomputers

Back in the 90s, there were custom super-computer processors (both vector and scalar), that were faster than desktop processors for all supercomputing tasks. This hit a wall, as the desktop processors became faster than the custom processors, at least for some tasks. If you can get a processor that's faster for some tasks and slower for others, but costs 1/10th the price of the other, you're probably going to go with the cheap one. The world has petaflop computers because of the move to commodity parts. Noone could afford to build 160,000 processor systems from YMP processors.

btw, multi-cores are pretty terrible for desktop applications. They really excel for server transaction processing, but most desktop users haven't any use for more than 2 cores. A radical shift in programing is going to be needed before massively multi-core processors are any use to a desktop user.

Comment There are still vector processors out there. (Score 2, Insightful) 251

by flaming-opus on Friday December 05, 2008 @12:47PM (#26004031) Attached to: IEEE Says Multicore is Bad News For Supercomputers

NEc still makes the SX9 vector system, and cray still sells X2 blades that can be installed into their xt5 super. So vector processors are available, they just aren't very popular, mostly due to cost/flop.

A vector processor implements an instruction set that is slightly better than a scalar processor at doing math, considerably worse than a scalar processor at branch-heavy code, but orders of magnitude better in terms of memory bandwidth. The X2, for example, has 4 25gflop cores per node, which share 64 channels of DDR2 memory. Compare that to the newest xeons where 6 12 gflop processors share 3 channels of DDR3 memory. While the vector instruction set is well suited to using this memory bandwidth, a massively multi-core scalar processor could also make use of a 64-channel memory controller.

The problem is about money. These multicore processors are coming from the server industry. web-hosting, database-serving, and middleware crunching jobs tend to be very cache-friendly. Occasionally they benefit from more bandwidth to real memory, but usually they just want a larger L3 cache. Cache is much less useful to supercomputing tasks, which have really large data-sets. The server-processor makers aren't going to add a 64-channel memory controller to server processors; it wouldn't do any good for their primary market, and it would cost a lot.

Of course, you could just buy real vector processors, right? Not exactly. Many supercomputing tasks work acceptably on quad-core processors with 2 memory channels. It's not ideal, but they get along. This has put a lot of negative market pressure on the vector machines, and they are dying away again. It's not clear if cray will make a successor to the X2, and NEC has priced itself into a tiny niche market in weather forcasting, that is unapproachable by other supercomputer users, for price reasons.

Comment Re:Attempt to sensationalize? (Score 1) 138

by flaming-opus on Tuesday November 18, 2008 @09:01PM (#25811119) Attached to: New Top 500 Supercomputer List

right you are. The contracts for these machines were signed a couple of years ago. They might have sped things up, in order to get on the top500 list, but they didn't add hardware just for a little showmanship. These labs can afford to put out a bunch of press releases related to top500, but they don't care enough about it to spend many millions of dollars.

The list reflects the computers, the computers don't exist for the sake of the list.

Comment Not New. (Score 1) 366

by flaming-opus on Tuesday November 18, 2008 @03:41PM (#25806681) Attached to: AIX On the Desktop Is Getting the Boot

I think it's amazing power workstations lasted as long as they did. SGI quit the biz years ago, DEC is lost in the annuls of history, sun sells workstations, but they are just PCs. The workstation market is gone, history, no more. It's been almost 10 years that an intel/linux box with a good gamer graphics card has been outrunning dedicated workstations costing ten times as much.

Comment Re:Looks like Cray jumped the gun... (Score 1) 138

by flaming-opus on Tuesday November 18, 2008 @12:20PM (#25802613) Attached to: New Top 500 Supercomputer List

The press releases from Oak Ridge and Cray (unlike the summary on slashdot) were careful to claim jaguar as the fastest computer for "open science". They were, no doubt, aware that Los Alamos might have bought more hardware since June.

The machine at Los Alamos is used for classified Department of Energy projects; probably simulating nuclear warhead functionality on the aging pile of B83s and B61s sitting in the US arsenal.

The machine at Oak Ridge gets used for unclassified research that ends up in peer-review journals, on a variety of topics. Stuff like Climate models, Fusion energy research, protein synthesis models, cosmology. I'm sure there's a little competiton between the labs, but they really have different missions, so I bet they don't put too much stock in it.

Comment Re:Looks like Cray jumped the gun... (Score 1) 138

by flaming-opus on Tuesday November 18, 2008 @12:11PM (#25802423) Attached to: New Top 500 Supercomputer List

I suspect cray is much more interested in getting paid to build really complex supercomputers. Supercomputer vendors don't compete in the top500, supercomputer buyers do. Cray could build a 2 petaflop computer, or 5 or 20, if a customer came along with a large enough check. So could IBM, or HP, or SGI. It's really up to the customer.

That said, customers are only marginally interested in getting to the top of the top500. Those computers have a job to do, and that job isn't getting to the top of some artificial benchmark that isn't all that representative of the real jobs run by the users.

IBM's roadrunner has a lot fewer nodes, but I'm not sure I'd say it has fewer processors. Each node has 2 opterons and 4 cell processors. Do you program one MPI rank on the node, or on an opteron core + cell, or on each cell SPU? IF you're using pure MPI programming, that's really 36 processors per node. If you use hybrid MPI/openMP then you really reduce the number of MPI ranks. The hybrid approach requires a smaller order of MPI parallelism, but requires a much higher order of thread parallelism and vector parallelism. There are some codes that are really going to fly on the cell processors of roadrunner, and some that will crawl. Los alamos obviously decided that the codes they care about are likely to run well on that machine. Oak Ridge, when taking bids for Jaguar, decided that more traditional processors with a lot of memory capacity and interconnect bandwidth were more suited to their codes. I'm sure that neither lab used linpack to decide which machine to build.

Comment Re:Folding@Home Contribution? (Score 1) 154

by flaming-opus on Saturday November 15, 2008 @01:42PM (#25771279) Attached to: Jaguar, World's Most Powerful Supercomputer

There's no reason, whatsoever, to use a highly-connected, high-bandwidth HPC machine, like Jaguar, on distributed computing jobs. There are other very worthy jobs that can be run on such a system, that can't be run on a pile of desktops all over the internet. Use the real supercomputers for real supercomputer jobs. There are plenty of idle xbox in the world for distributed computing.

Slashdot Top Deals