Follow Slashdot blog updates by subscribing to our blog RSS feed


Forgot your password?

Cray XT-3 Ships 260

anzha writes "Cray's XT-3 has shipped. Using AMD's Opteron processor, it scales to a total of 30,580 CPUs. The starting price is $2 million for a 200 processor system. One of its strongest advantages over the std linux cluster is that it has an excellent interconnect built by Cray. Sandia National Labs and Oak Ridge National Labs are among the very first customers. Read more here."
This discussion has been archived. No new comments can be posted.

Cray XT-3 Ships

Comments Filter:
  • by Anonymous Coward on Tuesday October 26, 2004 @03:35AM (#10629026)
    single node of those.
    • Re:imagine a... (Score:2, Interesting)

      by catch23 ( 97972 )
      Man a single node is $20,000?? I could build a pretty good opteron system for $2000.... so the other 18 goes to the interconnect? $2 million is a lot for only 200 processors.
      • Re:imagine a... (Score:5, Insightful)

        by Anonymous Coward on Tuesday October 26, 2004 @05:33AM (#10629347)
        *rolls eyes*

        When you have a single CPU, designing the system to be pretty fast is easy. There's no major contention to deal with.

        Two CPUs? Slightly harder, but reasonably straightforward. You don't see a 2x improvement in speed over one CPU, but it's around 1.95x, give or take a bit.

        Four CPUs? Now you're starting to see less improvement ... probably around 3.2x, because of all the contention issues.

        Sixty-four CPUs? You'll be lucky to get a 50x speed up over a single CPU.

        When you get to 200 CPUs, the issue of access to shared memory and other shared resources becomes critically important. It's also an issue that most computer buyers don't need to worry about, because they don't have 200 CPUs in their system. This means that you have a lot of highly specialised research going on, and relatively few buyers to spread the cost of that research over.

        Two million for a 200 CPU box which has low latency, low contention, and solid reliability is not a lot at all. You might not buy it. That doesn't mean nobody will.

        • Last time I bought a Cray super-computer, I was kicking myself for weeks about the 2 million dollars I wasted.

          Next time, I'm just gonna build a beowulf cluster out of 200 overclocked AMD Barton 2500s. I shall NOT be suckered again!
        • Re:imagine a... (Score:5, Informative)

          by crimsun ( 4771 ) * <crimsun@ubu[ ].com ['ntu' in gap]> on Tuesday October 26, 2004 @08:35AM (#10629819) Homepage
          It's not just hardware: the amount of non-parallelizable code in parallel applications impacts scalability most tremendously.

          The upper bound on speedup is generally Amdahl's law []. Plainly, the efficiency approaches zero as the number of processes is increased. Generally we consider the major sources of overhead to be communication, idle time, and extra computation. Interprocess communication is considered negligible for serial programs in this context (we consider message passing). Idle time ends up contributing to overhead, because processes idle awaiting information from others. Extra computation is virtually unavoidable at some point; for instance in MPI's Single Program Multiple Data model, each process in tree-structured communication other than the root is eventually idled prior to the completion of computation, and each process determines IPC at some point based on rank.

          There are notable exceptions to Amdahl's law, however; Gustafson, Montry and Benner wrote about such in Development of parallel methods for a 1024-processor hypercube, SIAM Journal on Scientific and Statistical Computing 9(4):609-638, 1988.
        • Re:imagine a... (Score:2, Informative)

          by ant_slayer ( 516684 )
          My apologies, but I couldn't help but think that you'd be *really* lucky to get 50x out of 64 CPUs. Examine the following:

          1 CPU @ 1.00x -> 1.00 / 1 = 1.000
          2 CPUs @ 1.95x -> 1.95 / 2 = 0.975
          4 CPUs @ 3.20x -> 3.20 / 4 = 0.800
          64 CPUs @ 50.0x -> 50.0 / 64 = 0.783

          Pop that into an spreadsheet and look at the graph.

          That is not linear, in fact, it's non-linear in the direction that *helps* more and more processors. If the decline from 4 CPUs to 64 CPUs is a mere 1.7% efficienc
  • Gotta save up for this gamer's dream machine. Now taking donations...
  • How big is it? (Score:3, Interesting)

    by rooijan ( 746599 ) on Tuesday October 26, 2004 @03:36AM (#10629031) Homepage
    I read the article (okay, so I kinda read it :-) ) and it has the speed and specs to be a geek's improvement on sliced bread. But how big is it, physically?

    The article doesn't appear to mention its dimensions, and I'm curious to know what kind of space you need to install this baby. Anyone got any idea?
    • Re:How big is it? (Score:4, Informative)

      by Anonymous Coward on Tuesday October 26, 2004 @03:41AM (#10629056)
      Dimensions (cabinet): H 80.50 in. (2045 mm) x W 22.50 in. (572 mm) x D 56.75 in. (1441 mm)

      Weight (maximum): 1529 lbs per cabinet (694 kg) ht ml
      • by Anonymous Coward

        Dimensions (cabinet): H 80.50 in. (2045 mm)

        Wow... for the first time in my life, I couldn't picture 80 inches, but I could 2 meters. I think there may be hope in the metric system afterall.

    • The weight is 1529 lbs per cabinet (694 kg). Imagine lugging that up to your 5th floor walkup apartment....
    • How big it is (Score:2, Informative)

      from TFA -

      Dimensions (cabinet):

      H 80.50 in. (2045 mm) x W 22.50 in. (572 mm) x D 56.75 in. (1441 mm)

      Sorry to reply twice but I forgot this detail.
    • Sibling posts already gave the spec, but I believe the size of a system like this is rather insignificant.

      If you're paying 2 million and upwards for a thing like this you probably can afford an appropriate space with appropriate climate control.

      (OK, so some people cram ${car_price * 10} worth of HIFI equipment into a ${small_japanese_car}, but I doubt anyone would want one of these installed in their closet...)
    • Obviously they aren't using one of these for their webserver. Or if they are, they need something more than a modem for their internet connection!
  • by mrjb ( 547783 ) on Tuesday October 26, 2004 @03:37AM (#10629033)
    This is only the XT-3. I'll wait for the Pentium-3-4.
  • by nilbog ( 732352 ) on Tuesday October 26, 2004 @03:37AM (#10629035) Homepage Journal
    A few more years of advances like this and we might have a machine capable of running Longhorn!
  • by commodoresloat ( 172735 ) on Tuesday October 26, 2004 @03:37AM (#10629036)
    It better have a lot of good games. How many mouse buttons does it have?

    I can't believe people complain about the price of iMacs....

  • real FPU operations (Score:5, Interesting)

    by Barbarian ( 9467 ) on Tuesday October 26, 2004 @03:38AM (#10629039)
    How are the Opterons at standard FPU operations in double precision? SSE2 and friends are nice, unless you have to make compromises in your simulations.

    I ask, because I remember that the Athlons beat the pants off the Pentium 4's in FPU operations, so all the benchmarks were rewritten to use SSE2.

    • by jmv ( 93421 ) on Tuesday October 26, 2004 @03:44AM (#10629071) Homepage
      Opterons beat the pants off the Pentium 4s in x87 (i.e. old) FPU operations. If you want to get good performance, you need SSE/SSE2. Both for AMD and Intel. For pure SSE, the Pentium 4s beat the Opterons mainly because of the clock speed, but for multi-processor systems, the hyper-transport and all more than makes up for that.
      • Yeah, yeah, SSE... Nice, if you can manage your data properly aligned and don't mind manual assembler coding (even intel's compiler does so-so job at vectoring). Still, even then you don't get trigonometry, for instance.
        • by jmv ( 93421 ) on Tuesday October 26, 2004 @05:07AM (#10629300) Homepage
          Couple facts about SSE:
          1) You can use it in scalar mode, in which case it's almost like x87, only a bit faster because:
          a) It doesn't use a braindead register model (stack)
          b) On P4, you can do a mul and an add in parallel with SSE, but not with x87
          2) You can use SSE intrinsics. It's not as easy as "normal" programming, but easier than assembly and almost the same speed.
          3) Unaligned access is possible. It's slower than aligned access, but overall better than non-vectorized code.
          4) Trig is so slow that SSE/x87 doesn't matter (unless you write approximations, in which case SSE will also be faster).
        • You're supposed to use lookup tables, recurrence relations and interative refinement to get the precision you need when you need it.
          The only time you should be using fsincos (SLOW) is when you need to build a table or populate variables accurately before a loop.
    • I seem to remember it being the other way around.

      Anyway, it depends on how you're using the floating point numbers: the standard 387 FPU instructions are faster, but the superscalar operations are more efficient when used in their intended role of vector calculations.

      Or so I've heard. YMMV.

      (BTW: nice sig...)
  • by Dancin_Santa ( 265275 ) <> on Tuesday October 26, 2004 @03:42AM (#10629058) Journal
    In this day and age of very fast computers and clusters built in our basements, there sometimes comes along a story that whispers of the computing age of days long past. Cray is one of those names that can drop a jaw just by the mere utteration of the name.

    The name is synonymous with speed and power and the unwillingness to cut corners in order to shave a few dollars off the final product. When you buy a Cray, you know you are getting top of the line hardware.

    It looks like Sandia wants to build the fastest supercomputer in the world by clustering a few of these monsters, and I have no doubt that they will. Looks like more fun articles about this in the future. :-D

    There are two prominent applications for these machines. The first is nuclear weapons simulation. Personally, I don't see the point to that. The other application is in weather prediction. By feeding in current weather variables into a well-written model, a supercomputer is able to predict to a large degree of accuracy the future weather. Such an application will always be welcome.

    I think I'm going to have to fire up the old ][e, the nostalgia is killing me!
    • by joib ( 70841 ) on Tuesday October 26, 2004 @05:59AM (#10629406)

      There are two prominent applications for these machines. The first is nuclear weapons simulation. Personally, I don't see the point to that. The other application is in weather prediction.

      Oh, please. Buy a clue, will ya? There's lots and lots and lots of applications that use supercomputers, or could use if they were more affordable. A few examples from the top of my head:

      Materials science, that is ab initio simulations, moldyn, you name it. This alone probably uses > 50 % of all supercomputer cpu time in the world. By comparison, weather prediction and nuke simulations is small potatoes (or shall we say, the simulations as such are big, but the number of people engaged in weather prediction or nuke simulation is really small compared to all the supercomputing materials scientists).

      CFD, the automobile and aerospace sectors are big users.

      Electronic design.

      Seismic surveys, the oil industry uses lots and lots of supercomputers to find oil deposits.

      Biology. Gene sequencing, moldyn simulations of lipid layers and whatever.

      Climate prediction, somewhat related to weather prediction. Official purpose of the Earth Simulator.

      All of the examples above could easily use almost any amount of cpu power you can throw at them. The only thing that stands between a lot of scientists and improved understanding of the world is computing power.
      • by Moraelin ( 679338 ) on Tuesday October 26, 2004 @07:48AM (#10629668) Journal
        The real problem that stands between scientists and them having lots of shiny toys is funding.

        E.g., yeah, having a 30,000 CPU super-computer to simulate your gene model on would be nice. Forking over half a billion for it, well, it's suddenly not that nice any more.

        Having one of those to simulate an electronic circuit, now that would probably rock. Again, paying half a billion for it, suddenly isn't that attractive.

        The real question isn't how nice a toy you'd like to have, it's ROI. (Unless you work for the government, and just have a budget you _have_ to blow on stuff, whether you need that stuff or not.)

        And in that context, you'd be surprised what you _can_ do with a lot less expensive toys.

        Having Cray's custom interconnects sure is impressive, but for a lot of problems they're not even needed any more. _That_ is what killed Cray.

        Most RL problems are not really the kind described as "_one_ huge indivisible data set, that you have to process in _one_ huge batch process." They're more like "we have this process with a small data set that we have to run 100,000,000 times." Most design problems or biology problems are really of that kind: run the same thing 100,000,000 times with different parameters.

        And as Seti@Home or Folding@Home proved, a helluva lot of those don't really need _any_ kind of shared memory or fancy interconnects. The real ticket is noting that instead of accelerating the batch run 200 times, you could just split it into 200 smaller batches ran on 200 single-CPU machines.

        The super-computer solution costs 2,000,000 just for the machine alone, while the 200 PCs solution costs 200,000 or so. I.e., 10 times cheaper. Better yet, the 200 PCs solution is also far cheaper to program. (Anyone can program a non-threaded batch app.) _And_ for that kind of a problem the 200 PCs solution would actually finish faster, since it has no contention issues whatsoever.

        Again, that's what really killed Cray and the super-computers. They're techologically impressive, they're a geek's wet dream, but... for 99.9% of the problems out there they're just not worth the price any more.
    • There are two prominent applications for these machines.

      Wrong! There is a third, more used application: Solitare.

      Even super computer coders have to wait for results.

      I also asked this recently, but didn't get a reasonable answer, do these beasts have screen savers? if so, Are they just blackout type, or busy 3d rendered whizbang super cool ones "Just because we can"?

      (I realise you may not be able to answer that, but someone might)
    • There are two prominent applications for these machines. The first is nuclear weapons simulation. Personally, I don't see the point to that.

      Well, when you nuke the site from orbit, you do want to be sure don't you?

    • by flaming-opus ( 8186 ) on Tuesday October 26, 2004 @10:48AM (#10630803)
      Actually, there is no reason to cluster a few of these. If you have a 2000 node xt3 (or t3e, paragon, blue-gene, cm5, insert mesh-structured mpp here) and a 4000 node xt3, you stick them together and make a 6000 node xt3. But that's just picking nits.

      Curiously the xt3 IS about shaving dollars off the price. If you go read the origional whitepapers on the system, they go through EXTENSIVE cost-return analysis. They studied their (then-) current generation of cluster systems, as well as future linux/solaris/aix clusters, and rejected them as (interestingly) FAR TOO EXPENSIVE, once the administrative costs are factored in. They then looked at, and rejected, cray's vector solution, the X1. They then decided that the (amazingly) most cost effective solution was to underwrite cray's product development cycle on a wholey new product. Basically they asked for an update to the system they already had. (asci-red i.e. intel paragon++) Nobody was building such a thing. Since cray had a really strong similar product in the 90s. (T3D, T3E) the department of energy asked them to create an update. Some designs never die.

      What I'm most interested in is the reliability. One of the biggest difficulties in the T3D engineering cycle was dealing with memory failure. red-storm is going to have 10,000 processors. Lets assume each has 2 banks time 3 dimms (chip-kill) of memory. That means there are 10,000 x 6 x 18 = 1 million+ memory chips in the system. IF 1/100th or a percent of these fail, that's still a lot of memory failures. How well are faults isolated? That's the big question for systems this big.

      I'm also a little wary of cray's use of lustre. I've used lustre before, as well as other cluster-FSes. While I'm not aware of other filesystems that will scale to 700+ i/o nodes, I'm not confident in lustre. It's an immature product at best. (I don't mean to disparage the people working on it, it's a neat architecture, but it's a hard problem, and I'm not sure it's ready for prime-time.)
  • by Henriok ( 6762 ) on Tuesday October 26, 2004 @03:48AM (#10629085)
    It seems that the XT-3 not only use Opteron processors but they also use PowerPC 440 co-processors from IBM to off load inter-processor communication from the main computing CPUs. Quite an interessting set up.

    The XT-3's biggest comptetitor in this segment must be the BlueGene/L type super computer made by IBM. The processors in Blue Gene/L is a custom built dual core version of the PowerPC 440 with built in high speed interconnects.

    Just like IBM have a finger in all the future game consoles, they seem to have a finger in several of the next generation super computers also. Nice going IBM.
    • No. The biggest competitor to the XT3 will be machines like the NEC SX-8, their own X1 family or the IBM p690's. They are all shared memory systems, while the Blue Gene family is not. And therein lies a whole world of difference.
    • Just like IBM have a finger in all the future game consoles, they seem to have a finger in several of the next generation super computers also. Nice going IBM.

      It's not that they're the best thing since sliced bread, it's mainly that all their competition went down the chute for one reason or another.

      HP/Compaq/DEC was the king of supercomputers. Now they're only supporting their formerly glorious products, with practically nothing new comming to replace it.

      Sun seems to really be sitting on their ass.


      • let's see what you're missing:

        * first, sgi still makes and sells supercomputers, they are far from faded. they also own cray (or did).
        * tandem, bought by compaq, we all know what happened there.
        * hp sells a superdome once in a while. but nobody seems excited about their itanic systems.
        * sun, rotting with their out of date cpus.
        * fujitsu is doing well in the supercomputer market.
        * nec is also successful.
        * ibm, of course.

        and you mentioned motorola? you're joking, i hope.

        the largest purchasers of superco
      • etc.

        Sounds like an opening for...


        Now that I think about it... they have massive experience with huge data systems!
    • Basically this is the same off-load as blue-gene. One processor with a MPI off-load engine. In the case of blue gene, the main cpu is another 440, while xt3 uses a much stronger opteron. (of course the IBM solution is less expensive, and much denser).

      The real difference in this system is the high bandwidth shared memory. Blue Gene has hardware support for shared memory, but the software appears to be strictly MPI based. (at least in the first revision, and according to what I've read, this may be prelimina
  • It seems like Cray is not capable of sustaining its heritage. Buying cheap AMD processor and connecting them with customized HT interconnect is not enough to build a machine capable of record-breaking single-task performance, old Crays exhibited. When one could be sure with Cray XMP that he has the best machine money can buy (with outstanding scalar and vector abilities), new Cray is just another loosely-coupled AMD cluster. Thanks god it's not a NEC clone (at least).
    • by Anonymous Coward
      It's not a customized HT interconnect. There's a dedicated SeaStar router chip that connects via HT to the uniprocessor Opteron + RAM node, but the actual fabric connecting the SeaStars is proprietary (each SeaStar connecting to six others via 7.6 GB/s interconnects, forming a 3D grid fabric expandable to 30K+ nodes).

      That's why they use mere 100-series Opterons: they need only one HT link per CPU. Because the whole is not based on HT interconnects.

      Really, loosely-coupled cluster my ass. This machine *is*
    • Cray does have its own fully custom system still, its called the X1 - a highly scalable Vector machine (thats far from a NEC clone...)

      I'd link it, but the site is down...
  • by teamhasnoi ( 554944 ) <> on Tuesday October 26, 2004 @04:02AM (#10629119) Journal
    they simulated a woman who posts to Slashdot and is waiting for her Centris running PearPC on Debian to boot OS X.

    Strangely, it took roughly a week. The second test was a simulation of the moderation results of this post.

    It received a +5 Funny, which puzzled researchers, as it is currently modded -1 Offtopic.

    Damn you Schroedinger!

  • Do they have leather seats for the operators like the 1980s models did?
  • by BrookHarty ( 9119 ) on Tuesday October 26, 2004 @04:25AM (#10629190) Homepage Journal
    So 96 processors, AMD gets about 144K per PE node at 1500 per cpu, or does Cray get a discount?

    Also, a 30,000 cpu complex, AMD must be making a tidy sum.
    • I highly doubt Cray are paying retail prices for these CPUs. I'd imagine they buy them wholesale from AMD directly, and probably a further bulk discount on top of that.
      • AMD may even be gving Cray an extra discount/kickback for the publicity value - not uncommon for this kind of cutting edge stuff.

        My old company managed to get some seriously expensive enterprise software for just 10% of the retail price because we were able to convince them that having us as a client would be a publicity coup for them....
  • Intersting note (Score:3, Interesting)

    by floydman ( 179924 ) <> on Tuesday October 26, 2004 @04:46AM (#10629234)
    from their Tech.sheet [] they are using the Luster file system []

    This is the first time i see a shipped linux with this file system. Now the intersting part is that lusterfs is made for linux clusters, but this monster is not a cluster... any body can shed some light?
    • It's not a cluster.....

      well, sort of.

      There are thousands of compute nodes, all of which get i/o services from dozens, or hundreds of i/o nodes. These i/o nodes run linux, several instances of linux. Basically the i/o nodes ARE a cluster, though not a compute cluster, and not necessarily a symmetric cluster. The i/o nodes run lustre in very much the same way that a cluster system would (though they can take advantage of hardware features not present on commodity clusters).

      The real difference in this syste
  • I can't find this price anywhere.
    And it seems _really_ low.
    I would expect a price at least twice higher.

    Ok, $2 million is starting price, but on Cray's website they say the configuration can be as "small" as 96 CPUs.

    So it's maybe $2 million for 96 CPUs.
    (Still fairly cheap for a Cray, if you ask me)
  • by Alkonaut ( 604183 ) on Tuesday October 26, 2004 @07:09AM (#10629556)
    ...Sadly I think that beats my Volkswagen on all three
  • Finally ... (Score:2, Funny)

    by Zurd3 ( 574979 )
    We'll be able now to install Gentoo in just a few days !
  • by ebooher ( 187230 ) on Tuesday October 26, 2004 @07:52AM (#10629678) Homepage Journal

    So come on, ante up. How many remember being awed at the mere sight of old Crays back in the day? Like the Cray-3? I remember the first time I saw a Cray .... thing was in an anti-static environment. To access it, one had to pass through an airlock and be "decharged" or "depolarized" etc. Basically they some how charged the air to get rid of static electricity. Then you had this system that was running *in* liquid! Take that "Oh I'm so cool cause I have a l337 haX0r water cooled CPU" overclockers

    They (Cray) were so proud of this accomplishment that the upper portion of the cabinet was some kind of plexiglass so you could see the fluid as it moved, and moved wiring and what not with it. Very surreal feeling, almost like the thing was breathing.

    And what about the Cray-1? Wasn't that a true testiment to 70's *art* and sculpture? The thing looks like some kind of freaky bus station bench with it's odd red and white panels and black base. Though, I don't know if they all looked like that, maybe you could get them in other colors?

    Ahh .... those were the days.

    • Because, IIRC, that was the one that they were only building one of, and when the govt cancelled the order, thats when Cray Research went under.

      • Well, anything is possible. It may not have been the 3 I saw. Though I seem to remember being able to see over it, and I think almost every other Cray was at least 6 foot tall. But I don't know, it could have been recessed into the floor. I was quite a bit younger at the time, obviously. I was more focused on the tour guide guy showing off the fact that electronics were in liquid. At the time I thought if it was wet, it conducted electricity, so that kind of blew my mind.

        I'd love to find an old Cray, li

      • wrong company - Cray 3 was being built by "Cray Computer", a different company that Seymour started after he left the original Cray Research. He's most likely talking about the Cray 2, which even came with its own Waterfall
      • Looked around on the net, as well as a couple other /.'rs here, and someone posted a link here to a 2 and I found a pic of a 2 with the waterfall system that was mentioned by another person, and I must accept defeat within the loosened strands of my unraveling mind.

        It was indeed a Cray-2 that I remember so vividly. Nevertheless, still an extremely exotic machine. Very much the Ferrari F40 or McLaren F1 of super computing. You've seen pics, maybe even seen one at a car show, but you know you'll never be

      • Cray-3 memories by Steve Gombosi From a comp.unix.cray posting

        Graywolf ("S5") was installed at NCAR. Like all NCAR supercomputers, until fairly recently, it was named after a Colorado locale.

        This was the *only* Cray-3 shipment, installed in May 1993, the machine was a 4-processor, 128 Megaword system.

        Two problems in the Cray-3 system were uncovered as a result of running NCAR's production climate codes (particularly MM5): a problem with the "D" module causing intermittent problems with parallel co

    • well, when the computer costs 10-15 million dollars, you can afford to spend twenty thousand on making it look really cool.

      compared to ASCI-red, the system that red-storm is replacing, xt3 looks increadible. Yes it's a long row of rectangular racks, but at least they are stylish racks. Intel built asic-red in beige box style. Oh well. function over form I suppose.
    • Then you had this system that was running *in* liquid!

      Before that was the Cray-2 (a.k.a World's most expensive aquarium")? In case anybody's interested, I believe they used Fluorinert as the liquid, as it wouldn't swell the PC boards, short anything out, or cause anything to corrode.

      A note, the Cray-3 was created by Cray Computer Corporation of Colorado, whereas the Cray-1 was made by Cray Research of Wisconsin. In ~1990, Seymore wanted to start working on computers using gallium arsenide instead of s

  • ... and when you turn it on, a crackly computer generated voice says, "Would you like to play a game?"

"The eleventh commandment was `Thou Shalt Compute' or `Thou Shalt Not Compute' -- I forget which." -- Epigrams in Programming, ACM SIGPLAN Sept. 1982