flaming-opus - Slashdot User

Comment Re:Low Power Supercomputer (Score 1) 109

by flaming-opus on Wednesday November 04, 2009 @03:33PM (#29983740) Attached to: The Story Behind a Failed HPC Startup

That said, there are a lot of task that do parallelize well. There's a large market for machines with >5k cores. Often with a significant share of the jobs running on >1k cores. The big HPC sites (weather, energy, defense, research, seismic) have invested the last 3 decades into making parallel algorithms to solve their problems; first with shared memory parallel, but massively parallel has been the name of the game for at least 15 years.

Because your algorithm doesn't scale, does not mean that there is no market for parallel machines. Cray, HP, IBM seem to be making a lot of money selling parallel machines. Sicortex just couldn't make their architecture awesome enough to take sales away from the entrenched players.

Sicortex isn't the only vendor to fail in the HPC space. With or without a low power architecture, it's a hard market to make a lot of money in. It's an easy market to get into, so a lot of people try, but it's not easy to stay profitable, and the investors wanted to lower their risks.

Comment Re:Lesson learned (Score 1) 109

by flaming-opus on Wednesday November 04, 2009 @10:34AM (#29977846) Attached to: The Story Behind a Failed HPC Startup

looking at the top 100 is pretty misleading, however. The TAM for a low end cluster is still several times larger than the market for massive supers. A very small number of customers can still adapt to weird architectures and everyone else uses x86 + linux. Also, just about everything non-x86 has failed to gain much market, apart from IBM. IBM manages to keep this going by sharing their development costs with non-HPC products. Cell is a video game processor; Power6 is the foundation of their database servers; Blue Gene is a close derivative of their embedded systems IP.

I'd call the high-end of the market a duopoly of IBM and x86(mostly intel, AMD mostly because of Cray) The mid-range and low-end: all x86.

Comment Re:Lesson learned (Score 1) 109

by flaming-opus on Wednesday November 04, 2009 @10:26AM (#29977740) Attached to: The Story Behind a Failed HPC Startup

I'd like to agree with this one. The bulk of the market is in the low-end, but the low end is going to be reluctant to embrace anything unusual. Sicortex uses mips processors, which means you can't use your off-the-shelf applications. Even if the rack of intel blades uses more power, and takes up more space, a low end cluster still isn't that large, or that power-hungry. You're not talking about building a whole new building to house the machine.

The high end, where custom applications rule, is more likely to embrace a custom architecture; Cray vector, IBM power, Itanium still play in this arena. However, the largest sicortex machine really can't play in the big leagues. 5000 low-power mips processors is a pretty modest cluster, even if the network is good. The big leagues also means you're dealing with the established HPC customers, who are very demanding on the software and service front.

The low end has a lot of market, but the competition is fierce, and the margins small. The high end requires a lot more infrastructure than an 80 person company can provide. In all cases, developing a new processor is very expensive. Intel and AMD spend billions of dollars designing each generation of chips, and have the tools to build them with full custom logic, instead of asic designs. Once sicortex invests all that money in designing the processors, they still have to build a machine around that. Then you have to build a software stack and service organization. Then, you have to sell the thing into a competitive marketplace.

Tough row to hoe.

The low end is a larger market.

Comment Numalink vs. infiniband. (Score 1) 159

by flaming-opus on Monday April 06, 2009 @02:59PM (#27479561) Attached to: Rackable Buying SGI Assets For $25M?

If you look at the history of infiniband, it was always intended to be something like numalink+xio. Origionally you were supposed to connect peripherals, storage, and processor nodes onto this big network and add and remove them all dynamically. It got scaled down from that, and now is pretty much used as a high-speed network, with the occasional RAID attached directly to it. Numalink can be used in this way too. One does not need to make a single numa domain from an altix.

The numalink chip also has the extended cache directory logic in it, which allows large numa machines. Importantly, that version of large is large on the scale of numa database servers, but rather small on the scale of supercomputers. Even SGI has to fall back to infiniband for the really large machines, such as the two big systems at nasa. It's not as feature-rich as numalink, but it'll scale to tens of thousands of nodes, sorta affordably. I should note that there's no reason that the cache director chips can't talk to one another over an infiniband network. Noone has invented this chip, but the network can be an independant piece.

I agree that SGI has long had great technology, and useful products. (I reserve the term great products, as they have tended to have great strengths coupled with great weaknesses) But I would not say that their products have been successful. If they had, SGI wouldn't have been circling the bowl for the last ten years. SGI learned how to make a lot of money when they were at the top of a growing market. They never learned how to make money in a shrinking market, or how to transition to a profitable spot in a different market.

Comment I think you are mistaken about lustre. (Score 1) 159

by flaming-opus on Monday April 06, 2009 @02:39PM (#27479291) Attached to: Rackable Buying SGI Assets For $25M?

actually, I know you are mistaken about lustre. Lustre is a regular kernel filesystem just like CXFS, stornext, GFS, or GPFS. In the case of puma or bluegene you have to link it into the application, but not on linux. The point, however, remains. SGI has used CXFS to sell its hardware, which was awesome at the time, but it limits the ability of rackable to make a business out of selling CXFS as a stand-alone product. IDeally you would want to sell CXFS licenses on every commodity cluster out there, your own, or the competition. Sun has that now, with lustre. Lustre is run on IBM clusters, HP clusters, even SGI clusters. I doubt rackable could turn CXFS into that product and displace lustre from very many machines.

Comment Innovators dillema. (Score 1) 159

by flaming-opus on Monday April 06, 2009 @02:31PM (#27479173) Attached to: Rackable Buying SGI Assets For $25M?

SGI has long suffered from the classic theme of the innovator's dilema. They invented really cool graphics technology in the 80s and early 90s. It performed very well, but they charged an enourmous amount of money for it. Along comes the first generation of 3D graphics cards for PCs. At that point, SGI had the option of putting out a top notch PC graphics card. They could have become a dominant player in that market. Some business unit at SGI would sit in the place now occupied by nvidia. That's hard to do, as the graphics card division would undercut sales from the workstation division. However it's better to be undercut by an internal competitor than an external one.

Comment Re:It's real (Score 1) 159

by flaming-opus on Thursday April 02, 2009 @04:19PM (#27435945) Attached to: Rackable Buying SGI Assets For $25M?

Many of the old cray patents are expired, and some have gone to the new Cray Inc.

The problem on the IP front is that SGI already leveraged most of their IP a long time ago. Most of it is already sold, cross-licensed, or expired.

Comment Re:"little cooler than an SGI workstation..." (Score 4, Informative) 159

by flaming-opus on Thursday April 02, 2009 @11:29AM (#27430983) Attached to: Rackable Buying SGI Assets For $25M?

This is why SGI finally fell apart; you guys are all talking about SGI workstations. SGI hasn't been in the workstation business for years. There hasn't been a workstation business for years. HP,IBM,Sun sell workstations, but they are just rebranded PCs. Dec,DG,EnS,Intergraph,Appalo: all defunct.

Lately SGI has been selling low-end HPC clusters and a few mid-range altix machines. (and one really big one at nasa) The HPC business is a really difficult place to make money. SGI has never been good at keeping their operating costs down. Compared to their competition, they always seemed to employ a lot of people, and have a lot of irons in the fire, most of which never panned out.

SGI has always loved to engineer their way around problems; In a mature market one makes money by engineering a solution to a problem and then licensing it out to the rest of the world until it becomes an industry standard. Numalink could have been what infiniband is now. Infinitereality could have been what geforce is now. CXFS could have been what lustre is. XIO could be PCIe. SGI wanted to control it though. They tried to keep it all under the tent.

Comment Re:It's real (Score 1) 159

by flaming-opus on Thursday April 02, 2009 @11:14AM (#27430685) Attached to: Rackable Buying SGI Assets For $25M?

I'm sure they're not picking up the debt. Rackable doesn't have the assets to pick up that debt. They are picking up the company for essentially nothing, but SGI has lost money every quarter for years. So they can expect to take on those loses for at least a couple of quarters. They won't owe the creditors, but they still have to pay some sort of severence to all the people they let go, and figure out how to do something with SGI's customer list and try to turn it into new rackable business.

There are valuable parts to SGI. The CXFS/DMF business is valuable all by itself. Numalink is a good technology, but rackable isn't really in a possition to productize it in a profitable way; I'm not sure who they could sell/license it too either. Apart from that, they are buying the customer list, the sales team, and a few government contracts.

Comment Cores per memory controller. (Score 1) 251

by flaming-opus on Monday December 08, 2008 @07:19PM (#26040943) Attached to: IEEE Says Multicore is Bad News For Supercomputers

I'd love to see each core on a massively multicore design get its own memory controller. I'm not holding my breath, however. If you think of a 32-core CPU, it's pretty unlikely that most supercomputer or cluster vendors are going to pay for 32 dimms for each cpu socket. So then you're talking about multiple memory channels per memory stick. You can still get ECC using 5 memory chips per channel, so you can imagine 4 channels fitting on a memory riser. Cray does this on the X2. Then 32 channels would only require 8 dimms, which is reasonable. Then what do you do for 64-core CPUs?

It's tricky, and the problem for the market is that it's expensive. Can you get the commodity CPU vendors interested in such a thing, given that most of their addressable market is not in the supercomputing space?

I think We're gonna see more cores in a CPU that there's bandwidth to use. They might increase the bandwidth a bit, but probably just enough to get good linpack numbers.

Comment Vectors yes, but the bandwidth to use them? (Score 1) 251

by flaming-opus on Monday December 08, 2008 @07:08PM (#26040809) Attached to: IEEE Says Multicore is Bad News For Supercomputers

vector processing in commodity designs isn't enough. Of course we are going to see it, at this point it's not very expensive to add. Adding vector processing for increased flops is easy. The hard part is the bandwidth. One of the reasons the X1 processors were expensive, was that they were custom, but so are the network chips in commodity-cpu supers, and they only add $1000/node. The real cost of X1-style memory is that you have 64 channels of memory, which is a lot of wires, dimms, memory parts, etc. There's a very real cost to all the memory components needed to get the kind of bandwidth you need to support a high-throughput vector pipeline.

The commodity processor vendors aren't going to do this sort of thing, as it adds to the cost of the chip, but provides nothing to the bulk of their customers who are running mysql, apache, or halflife.

The one hope I have is something like the core2 architecture, where ddr3 is used for desktop processors, and fbdimm is used for server parts. The two components share a lot of architecture, and only a few of the asic cells are different. If a cpu vendor were interested in the HPC market, they could design a cpu to use a standard memory channel for desktop/low-end server parts, and something more expensive, but higher bandwidth for the HPC space. It would mean HPC specific processors, but sharing most of the engineering with the commodity part. Maybe Cray could license them the design for their weaver memory controller in the X2. It's kind of like the AMB on a FB-DIMM, but it includes 4 channels of DDR2 on each stick of memory.

Comment Re:Time for vector processing again (Score 1) 251

by flaming-opus on Monday December 08, 2008 @06:53PM (#26040603) Attached to: IEEE Says Multicore is Bad News For Supercomputers

Not quite.
see http://www.pnl.gov/science/images/highlights/computing/cray.pdf
http://www.pnl.gov/topstory.asp?id=320
http://www.cacr.caltech.edu/news/story.cfm?ID=30

Comment Re:Time for vector processing again (Score 1) 251

by flaming-opus on Friday December 05, 2008 @01:03PM (#26004265) Attached to: IEEE Says Multicore is Bad News For Supercomputers

The problem is that no idea doubles the rate at which supercomputers advance. Most of the ideas out there jump foreward, but they do it once. Vectors, streams, reconfigurable computing. All of these buzzwords once were the next big thing in supercomputing. Today everyone is talking about GPGPUs. None of them go very far. How much engineering goes into the systems? How long does it take to get to market? How difficult is it to rewrite all the algorithms to take advantage of the new machine? What proportion of the codes see a real advantage on the new machine? Can your company stay afloat long enough to reap the rewards? (remember that supercomputing is a tiny niche market compared to computing in general.)

I've seen a lot of "game changing ideas" come along in the supercomputing world. Commodity computing is the only one left.

Comment Re:Time for vector processing again (Score 1) 251

by flaming-opus on Friday December 05, 2008 @12:53PM (#26004141) Attached to: IEEE Says Multicore is Bad News For Supercomputers

Back in the 90s, there were custom super-computer processors (both vector and scalar), that were faster than desktop processors for all supercomputing tasks. This hit a wall, as the desktop processors became faster than the custom processors, at least for some tasks. If you can get a processor that's faster for some tasks and slower for others, but costs 1/10th the price of the other, you're probably going to go with the cheap one. The world has petaflop computers because of the move to commodity parts. Noone could afford to build 160,000 processor systems from YMP processors.

btw, multi-cores are pretty terrible for desktop applications. They really excel for server transaction processing, but most desktop users haven't any use for more than 2 cores. A radical shift in programing is going to be needed before massively multi-core processors are any use to a desktop user.

Comment There are still vector processors out there. (Score 2, Insightful) 251

by flaming-opus on Friday December 05, 2008 @12:47PM (#26004031) Attached to: IEEE Says Multicore is Bad News For Supercomputers

NEc still makes the SX9 vector system, and cray still sells X2 blades that can be installed into their xt5 super. So vector processors are available, they just aren't very popular, mostly due to cost/flop.

A vector processor implements an instruction set that is slightly better than a scalar processor at doing math, considerably worse than a scalar processor at branch-heavy code, but orders of magnitude better in terms of memory bandwidth. The X2, for example, has 4 25gflop cores per node, which share 64 channels of DDR2 memory. Compare that to the newest xeons where 6 12 gflop processors share 3 channels of DDR3 memory. While the vector instruction set is well suited to using this memory bandwidth, a massively multi-core scalar processor could also make use of a 64-channel memory controller.

The problem is about money. These multicore processors are coming from the server industry. web-hosting, database-serving, and middleware crunching jobs tend to be very cache-friendly. Occasionally they benefit from more bandwidth to real memory, but usually they just want a larger L3 cache. Cache is much less useful to supercomputing tasks, which have really large data-sets. The server-processor makers aren't going to add a 64-channel memory controller to server processors; it wouldn't do any good for their primary market, and it would cost a lot.

Of course, you could just buy real vector processors, right? Not exactly. Many supercomputing tasks work acceptably on quad-core processors with 2 memory channels. It's not ideal, but they get along. This has put a lot of negative market pressure on the vector machines, and they are dying away again. It's not clear if cray will make a successor to the X2, and NEC has priced itself into a tiny niche market in weather forcasting, that is unapproachable by other supercomputer users, for price reasons.

Slashdot Top Deals