flaming-opus - Slashdot User

Comment Re:Completely different contract/machine/goals (Score 1) 99

by flaming-opus on Tuesday November 15, 2011 @06:36PM (#38066870) Attached to: Cray Replaces IBM To Build $188M Supercomputer

The bulk of the new blue waters will be pure opteron nodes, with only 35 of the 270ish cabinets using GPUs. They obviously are assuming that most users will be running with only x86 cores. They ordered a few dozen cabs of GPUs, probably with an eye to where the industry will be heading over the lifetime of the machine, not where the users are today.

It's true that interlagos cores are a poor competitor to power7 core to core. However, they fair much better if you use the entire module. Think of interlagos as an 8-core processor with 2 threads per core, and all of a sudden it looks a lot better. Power7 is probably still better, but at ten times the cost.

Furthermore, just because a single node of power7 is an awesome node, does not mean that a many-thousand node supercomputer, composed of those nodes, is also awesome. If the IBM blue waters machine were just about the processors, they would have clustered together a bunch of bladecenter704s. They would not have bothered to bid the p775 system. If you want really fat SMP nodes, then they would have bid a bunch of p795s. Obviously they tried to make a really high-bandwidth shared-memory interconnect for the p775, and they failed. Either it didn't work reliably, wasn't fast enough, or cost too much. IBM didn't step away from the deal because the clock speed was 10% low, or because their costs rose enough to make their margin slim. You don't screw a customer in such a high-profile way unless you're going to lose a LOT of money on the deal.

I notice, looking at the top500 list, that IBM managed to sell the 10 cabinets of p775 that they built for NCSA. A few weather forecasting sites in canada and the UK are running 5 systems at 2 cabinets each. Tellingly, a couple of those customers have a pair of 2 cab systems, rather than a single 4 cab system. That tells me that the interconnect isn't scaling to larger numbers of nodes. A problem for a really big system like blue waters.

Comment Re:Aha. Bulldozer sucks my ass. (Score 1, Informative) 99

by flaming-opus on Monday November 14, 2011 @12:07PM (#38048948) Attached to: Cray Replaces IBM To Build $188M Supercomputer

It is true that it costs a lot to switch processors, but lets remember that HPC systems are also very price sensitive. Blue waters will have more than 50,000 processor sockets in it. Xeon processors may be better than opterons, but they also cost a LOT more. Multiply that by 50,000. In the benchmarks I've seen, 16 core opterons almost compete with 8 core xeons in raw performance, but blow the xeons away in price/performance.

Comment 2012 will be a big year for supercomputers. (Score 2) 89

by flaming-opus on Tuesday October 11, 2011 @03:20PM (#37682440) Attached to: Jaguar Supercomputer Being Upgraded To Regain Fastest Cluster Crown

Titan will be a hugely powerful computer. However, fastest supercomputer might be just out of reach. 2012 is also the year that Lawrence Livermore labs, also part of the Department of Energy, is planning to unveil their 20 petaflop BlueGene/Q computer name Seqoia. [http://www.hpcwire.com/hpcwire/2009-02-03/lawrence_livermore_prepares_for_20_petaflop_blue_gene_q.html]

That said, Seqoia will be a classified system for nuclear stockpile simulations. Titan will be a comparatively open system for wide ranging scientific discovery: government, academic, and industrial.

Comment Impressive if it were built today. (Score 3, Informative) 55

by flaming-opus on Friday September 23, 2011 @03:44PM (#37495504) Attached to: 10-Petaflops Supercomputer Being Built For Open Science Community

By 2013, 10 petaflops will be a competent, but not astonishing system. Probably top 10-ish on the top500 list.

The interesting part here will be the MIC parts, from intel, to see if they perform better than the graphics cards everyone is putting into super computers in 2011 and 2012. The thought is that the MIC (Many Integrated Cores) design of knights corner are easier to program. Part of this is because they are x86-based, though you get little performance out of them without using vector extensions. The more likely advantage is that the cores are more similar to CPU cores than what one finds on GPUs. Their ability to deal with branching code, and scalar operations is likely to be better than GPUs, though far worse than contemporary CPU cores. (The MIC cores are derived from the Pentium P54C pipeline)

In the 2013 generation, I don't think the distinction between MIC and GPU solutions will be very large. the MIC will still be a coprocessor attached to a fairly small pool of GDDR5 memory, and connected to the CPU across a fairly high-latency PCIe bus. Thus, it will face most of the same issues GPGPUs face now; I fear that this will only work on codes with huge regions of branchless parallel data, which is not many of them. I think the subsequent generation of MIC processors may be much more interesting. If they can base the MIC core off of atom, then you have a core that might be plausible as a self-hosting processor. Even better, if they can place a large pool of MIC cores on the same die as a couple of proper Xeon cores. If the CPU cores and coprocessor cores could share the memory controllers, or even the last cache level, one could reasonably work on more complex applications. I've seen some slides floating around the HPC world, which hint at intel heading in this direction, but it's hard to tell what will really happen, and when.

Comment Re:Translation (Score 1) 76

by flaming-opus on Tuesday August 09, 2011 @03:01PM (#37035798) Attached to: NCSA and IBM Part Ways Over Blue Waters

Not sell a system as big as Blue Waters, but using the same technology.The power 755, of which blue waters was supposed to be the prime example, is very powerful per node, has a lot of bandwidth in node, and between nodes, and could be quite useful in much smaller configurations. Tim Morgan at The Register indicates that IBM will still be selling smaller configurations of this machine. It's just hard to keep up that level of per-node performance across so large a machine, for the agreed upon cost.

Comment Re:Why did IBM do this, and what next for NCSA? (Score 1) 76

by flaming-opus on Tuesday August 09, 2011 @02:52PM (#37035676) Attached to: NCSA and IBM Part Ways Over Blue Waters

Yes. Good find. However, that sort of system speaks to the Altix' strengths. You program it like it's a SMP, you have one coherent memory space, and several hundred processor cores. This is the perfect use of an Altix. Of course SGI would rather you use your pre/post processing Altix next to a big ICE cluster, rather than a big IBM.

Comment Re:Translation (Score 1) 76

by flaming-opus on Tuesday August 09, 2011 @10:32AM (#37032552) Attached to: NCSA and IBM Part Ways Over Blue Waters

One of the big problems here is that this system was a one-off, that was not meant to be. IBM developed the system under the DARPA HPCS contract. They made a very capable system that is also very expensive. They hoped to sell a bunch of them; It looks like they sold just one. As such, all of the engineering costs are being amortised across just one machine. They couldn't leverage a bunch of smaller systems at other customer sites to stabilize the technology before deploying the monster big one at ncsa. Some of this is due to the success of their idataplex offerings, which have stolen the smaller sites away from Power7 machines.

I agree, though, that vendor lock-in is the name of the game in these sorts of systems. However, vendors do care about competing for the next contract, and try to keep engineering costs down. One of the ways you do that, of course, is to not make one-off systems.

Comment Re:Why did IBM do this, and what next for NCSA? (Score 1) 76

by flaming-opus on Tuesday August 09, 2011 @10:23AM (#37032476) Attached to: NCSA and IBM Part Ways Over Blue Waters

NSF already has a big cray XT5: Kraken at UofTenn. So the risk averse would probably say get a next generation XE6. Cray has announced an integrated GPGPU option, so NCSA could get a few cabinets of GPUs to play with, but integrated into a more traditional x86 super. The fact that NSF is already familiar with the machine could make this less risky.

However, this machine is not run by NSF, it's run by NCSA, who have no recent experience with Crays. Mostly they've been running whitebox clusters. They had SGI stuff half a decade ago, but nothing on the scale of what we're talking about here. I'd rule out SGI Altix, because it is not built to compete on price/performance, and not designed to scale this large, as a single system. IF SGI is in the running, it's probably an ICE cluster that would be used. If the problem with the IBM was cost, I don't think altix is going to fix that problem.

Comment Re:Why did IBM do this, and what next for NCSA? (Score 1) 76

by flaming-opus on Tuesday August 09, 2011 @10:12AM (#37032364) Attached to: NCSA and IBM Part Ways Over Blue Waters

I'm sure Cray can get up to speed in this time frame. They've done if before for the jaguar deployment. However, if they go with Cray, why install it at NCSA. The NSF already has a big Cray running at University of Tennessee. (Kracken) Why not just upgrade the existing cray? They already have the bugs worked out, they would just have to add more cabinets, and probably upgrade the processors.

Comment Re:light speed lag leads to higher latency (Score 1) 187

by flaming-opus on Wednesday August 04, 2010 @03:34PM (#33142210) Attached to: Rethinking Computer Design For an Optical World

Absolutely. I think the more likely case is that we're going to see RAM on the compute device, or at least on-package. In the world of cache, even traversing the processor die is a latency worth worrying about.

That said, how about optical numa? with HT or QPI the latency is already up above 100 ns, so adding an optical hop may be reasonable. How about using an optical cable to string together 2 single-socket motherboards into a dual-socket SMP? Not that you need optics to do this, but they make it possible to have nodes 3 meters apart instead of half a meter.

Comment different problem size in linpack. (Score 4, Informative) 247

by flaming-opus on Thursday June 03, 2010 @03:24PM (#32449304) Attached to: Mobile Phones vs. Supercomputers of the Past

I thought it was strange that the article author was reporting that a cray 1 only produced 3.4 mflops on linpack, which had a peak performance around 130 mflops. Looks like the author doesn't understand the benchmark very well.

If you look at the data quoted in the article, the n=100 result gives the Cray1 a score of either 3 or 12 mflops, depending on which entry you look at. There is no n=1000 result listed for the Cray 1, but one can expect, looking at the Cray XMP results, that it would be around 100, given the peak performance. The ETA10 would likely get a couple thousand mflops on linpack with n=1000.

The Cray 1 is more than a little dated. That said, if you look at supers from the early 90's, they still can do things that modern commodity hardware can't. As fast as a xeon or opteron is, it doesn't have 300Gbytes/second of memory bandwidth. Even late-80's supercomputers exceed desktops in some metrics, though probably not in raw ALU performance if the data all fits into L1 cache. The cost to run a desktop, however, is pretty compelling, and they don't leak freon when they run.

Comment Re:Low Power Supercomputer (Score 1) 109

by flaming-opus on Wednesday November 04, 2009 @03:33PM (#29983740) Attached to: The Story Behind a Failed HPC Startup

That said, there are a lot of task that do parallelize well. There's a large market for machines with >5k cores. Often with a significant share of the jobs running on >1k cores. The big HPC sites (weather, energy, defense, research, seismic) have invested the last 3 decades into making parallel algorithms to solve their problems; first with shared memory parallel, but massively parallel has been the name of the game for at least 15 years.

Because your algorithm doesn't scale, does not mean that there is no market for parallel machines. Cray, HP, IBM seem to be making a lot of money selling parallel machines. Sicortex just couldn't make their architecture awesome enough to take sales away from the entrenched players.

Sicortex isn't the only vendor to fail in the HPC space. With or without a low power architecture, it's a hard market to make a lot of money in. It's an easy market to get into, so a lot of people try, but it's not easy to stay profitable, and the investors wanted to lower their risks.

Comment Re:Lesson learned (Score 1) 109

by flaming-opus on Wednesday November 04, 2009 @10:34AM (#29977846) Attached to: The Story Behind a Failed HPC Startup

looking at the top 100 is pretty misleading, however. The TAM for a low end cluster is still several times larger than the market for massive supers. A very small number of customers can still adapt to weird architectures and everyone else uses x86 + linux. Also, just about everything non-x86 has failed to gain much market, apart from IBM. IBM manages to keep this going by sharing their development costs with non-HPC products. Cell is a video game processor; Power6 is the foundation of their database servers; Blue Gene is a close derivative of their embedded systems IP.

I'd call the high-end of the market a duopoly of IBM and x86(mostly intel, AMD mostly because of Cray) The mid-range and low-end: all x86.

Comment Re:Lesson learned (Score 1) 109

by flaming-opus on Wednesday November 04, 2009 @10:26AM (#29977740) Attached to: The Story Behind a Failed HPC Startup

I'd like to agree with this one. The bulk of the market is in the low-end, but the low end is going to be reluctant to embrace anything unusual. Sicortex uses mips processors, which means you can't use your off-the-shelf applications. Even if the rack of intel blades uses more power, and takes up more space, a low end cluster still isn't that large, or that power-hungry. You're not talking about building a whole new building to house the machine.

The high end, where custom applications rule, is more likely to embrace a custom architecture; Cray vector, IBM power, Itanium still play in this arena. However, the largest sicortex machine really can't play in the big leagues. 5000 low-power mips processors is a pretty modest cluster, even if the network is good. The big leagues also means you're dealing with the established HPC customers, who are very demanding on the software and service front.

The low end has a lot of market, but the competition is fierce, and the margins small. The high end requires a lot more infrastructure than an 80 person company can provide. In all cases, developing a new processor is very expensive. Intel and AMD spend billions of dollars designing each generation of chips, and have the tools to build them with full custom logic, instead of asic designs. Once sicortex invests all that money in designing the processors, they still have to build a machine around that. Then you have to build a software stack and service organization. Then, you have to sell the thing into a competitive marketplace.

Tough row to hoe.

The low end is a larger market.

Comment Numalink vs. infiniband. (Score 1) 159

by flaming-opus on Monday April 06, 2009 @02:59PM (#27479561) Attached to: Rackable Buying SGI Assets For $25M?

If you look at the history of infiniband, it was always intended to be something like numalink+xio. Origionally you were supposed to connect peripherals, storage, and processor nodes onto this big network and add and remove them all dynamically. It got scaled down from that, and now is pretty much used as a high-speed network, with the occasional RAID attached directly to it. Numalink can be used in this way too. One does not need to make a single numa domain from an altix.

The numalink chip also has the extended cache directory logic in it, which allows large numa machines. Importantly, that version of large is large on the scale of numa database servers, but rather small on the scale of supercomputers. Even SGI has to fall back to infiniband for the really large machines, such as the two big systems at nasa. It's not as feature-rich as numalink, but it'll scale to tens of thousands of nodes, sorta affordably. I should note that there's no reason that the cache director chips can't talk to one another over an infiniband network. Noone has invented this chip, but the network can be an independant piece.

I agree that SGI has long had great technology, and useful products. (I reserve the term great products, as they have tended to have great strengths coupled with great weaknesses) But I would not say that their products have been successful. If they had, SGI wouldn't have been circling the bowl for the last ten years. SGI learned how to make a lot of money when they were at the top of a growing market. They never learned how to make money in a shrinking market, or how to transition to a profitable spot in a different market.

Slashdot Top Deals