Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
AMD Technology

Supercomputer Breaks the $100/GFLOPS Barrier 281

Hank Dietz writes "At the University of Kentucky, KASY0, a Linux cluster of 128+4 AMD Athlon XP 2600+ nodes, achieved 471 GFLOPS on 32-bit HPL. At a cost of less than $39,500, that makes it the first supercomputer to break $100/GFLOPS. It also is the new record holder for POV-Ray 3.5 render speed. The reason this 'Beowulf' is so cost-effective is a new network architecture that achieves high performance using standard hardware: the asymmetric Sparse Flat Neighborhood Network (SFNN)." Because this was a university project, KASY0 was assembled entirely by unversity students, which while being a source of cheap labor, is also a good way to get a lot of students of involved in a great project.
This discussion has been archived. No new comments can be posted.

Supercomputer Breaks the $100/GFLOPS Barrier

Comments Filter:
  • Wow! (Score:5, Funny)

    by fryguy451 ( 592591 ) on Saturday August 23, 2003 @10:23AM (#6772656)
    Imagine a Beowu... errr... Oh..
  • by Anonymous Coward on Saturday August 23, 2003 @10:24AM (#6772660)
    Note to moderators, Beowulf cluster jokes CANNOT be offtopic.

    Imagine a Beowulf cluster of Beowulf cluster jokes!
  • Also I wonder (Score:5, Interesting)

    by HanzoSan ( 251665 ) on Saturday August 23, 2003 @10:25AM (#6772662) Homepage Journal


    How much electricity will these super computers use up?

    All those wires, it looks like it takes up alot of juice.
    • by gremlin_591002 ( 548935 ) on Saturday August 23, 2003 @10:29AM (#6772683) Journal
      Ponders while there are not University students pictures in the National Geographic Article on Slavery....
    • by jd ( 1658 )
      That depends on how fast the students can operate the peddle-power generators.
    • I mean these things are Athlons! Heck, they're saving money just from the fact that they'll never have to turn on the furnace again!

      Did you guys notice from the pics [aggregate.org] that there doesn't seem to be any fans in the holes on the sides? Are they crazy? These are Athlons. I hope they put enough fans in those things.
      • As much heat as power.
      • Actually, many very early supercomputers were built into the basement/cellar for this very reason. Pack your computers as low as possible, and use the convection currents to carry the heat around the building.

        (Those familiar with the University of Manchester's Department of Computation, in the UK, will understand what I mean. The architecture is designed around the computer room. Even after the truly massive lumps of iron were removed, it still wasn't until the mid 1990s that the building had a ground-flo

      • Did you guys notice from the pics [aggregate.org] that there doesn't seem to be any fans in the holes on the sides?

        See here:

        For example, each case came with two side fans, which we converted into a redundant stack venting out the back. [aggregate.org]

      • My Athlon XP 1700+ overclocked to 2000+ dissipates approximately 46W as heat. With cooling moving only from the front of the case to the back (though including one pretty fast and loud fan) it reliably stays below 104 degrees Fahrenheit. I'm guessing they're not using overclocking, and they're using the new CPUs with the higher speed bus. ZDNet claims [zdnet.co.uk] that the 2600+ dissipates 62.0 watts as heat, so there's a bit of a bump there, but since I know from experience that Athlon chips can run at 140 degrees with
      • The Athlon actually has a pretty average Watt/Flop ratio for a modern processor. The only one that really trounces it is the POWER series including POWER-PC. The Athlon 2600+ only uses 68.3W, compare this to a 2.4Ghz P4 which uses 66.2W and you see that they are in the exact same neighborhood. And if you include price into the equation the Athlon becomes the leader. Also if you had RTFA they explain that the side fans were moved to a stacked rear configuration for better airflow and redundency.
    • Re:Also I wonder (Score:3, Informative)

      by rusty0101 ( 565565 )
      Per the FAQ on the site, the supercomputer draws 210A. Power requirements provide an yearly cost equivalent to the cost of the network equipment connecting the nodes.

      210A at 120Vac via the power law comes to 25.2kw/hr. Tripple that to allow for cooling (It takes approx 2 watts of power to remove the heat generated by 1 watt of power usage) and you come to almost 76kw/hr. Take a look at your utility bill to come up with the hourly cost for electricity while this thing is on.

      The equipment does not have cool
      • There was work, at one point, on a single-stage high-voltage amplifier. The idea was to reduce the unwanted distortion by reducing the stages you needed to go through.

        I think these guys need a way to tell if the computer has crashed or lost power. Y'know, UPS' have those mini alarms, but people aren't going to be around the computer all the time, and the UPS will only detect a power outage.

        I think they need a watchdog circuit, linked to a 25.2 kilowatt amplifier and a suitable speaker. That way, no matt

      • First of all your units are all screwed up. 25.2kW was right, forget the /hr thing. Second, I refuse to believe that you need 2W of electricity to move 1W of heat. Airconditioning seems to be more in the range of 1W of electricity needed to move 2W of heat. So let us say 40kW total, which in the silly units used for electricity billing comes to 350MWh/year.

        Anyway, if you want to see stuff that really draws power, go look for the high energy physics stuff. Power cables that are liquid cooled through tubes

      • 210A at 120Vac via the power law comes to 25.2kw/hr. Tripple that to allow for cooling (It takes approx 2 watts of power to remove the heat generated by 1 watt of power usage) and you come to almost 76kw/hr. Take a look at your utility bill to come up with the hourly cost for electricity while this thing is on.

        On what planet? I cool my 60 watt or so Athlon XP 2000 using a 4 watt, 80mm fan. Add an 8 watt, 120mm fan on the intake that is WAY overkill, and a 4 watt PS exhaust fan, and I'm using 16 watts to
        • Which is fine for ONE PC. And, if you don't use air conditioning, then yes, a few watts in fans is all the cooling energy cost that you'll need. But try dissipating 25Kw. Your room and then house will heat up really quickly. Let's see, after the Big Blackout I think I saw an estimate that running a household oven is about 12Kw - so this cluster would be comparable to running two ovens 24/7. This heat has to go somewhere... and in most non-residential buildings, it'll be the air con. that has to get rid
    • Someone needs to work on their cable management skills too. One word: velcro
  • gigaflop

    As a measure of computer speed, a gigaflop is a billion floating-point operations per second (FLOPS).
    • If you're going to try to be informative, at least be accurate. There's no such thing as a "gigaflop". That would mean "Billions of Floating point Operations Per..." without the unit of time.

      It's a gigaflops (singular). The 's' is very important. It's how we know how long it takes to perform a billion floating point operations.

      It's like when people say "I had my engine up to 6000 rpms". What's an rpms? Is it a plural rpm? If so, what is pluralized? The acronym expansion yields "revolutions per

  • by CGP314 ( 672613 ) <CGP@ColinGregor y P a lmer.net> on Saturday August 23, 2003 @10:25AM (#6772665) Homepage
    Supercomputer Breaks the $100/GFLOPS Barrier

    Not after you factor in the SCO license fees.
  • by Anonymous Coward on Saturday August 23, 2003 @10:29AM (#6772681)
    Remember, everyone, this was a university project. *BSD was also a university project originally, and now *BSD is dying. So obviously university projects are not of very high quality.
  • by FreeLinux ( 555387 ) on Saturday August 23, 2003 @10:29AM (#6772682)
    Obviously, I don't get it. This doesn't look any different than redundant backbones or what is frequently done with VLANs. Multiple paths between hosts is what I see. How is this "new"?
    • by flymolo ( 28723 ) <flymolo@NOspAM.gmail.com> on Saturday August 23, 2003 @10:38AM (#6772715)
      Due to "creative" (computed) wiring, if all switchs are functioning, no node is more than one hop from each other node. This requires a routing table written for each pc. It could be used for redunancy, but it is being used to minimize latency, and collisions, which are both killers in clusters.
      • no node is more than one hop from each other node. This requires a routing table written for each pc.

        Admittedly, I understand that no node is more than one hop away. But, how is this different than all nodes plugged into a large switch like a Cisco 6500 or a Nortel Passport 8600? These switches can have ~128 ports and can switch 256Gbps aggregate throughput at wire speed. Add another switch and then add a second NIC to each host and you increase the capacity even further. Additionally, this does not requi
        • But, how is this different than all nodes plugged into a large switch like a Cisco 6500 or a Nortel Passport 8600?

          It's cheaper.

        • Here's a quote from the site:

          Does The World Need Yet Another Network Topology?

          One would think (well, we did ;-) that the latest round of Gb/s network hardware would have made the design of a high-bandwidth cluster network a trivial exercise. However, that isn't the case when the prices are considered:

          • When we invented FNNs in 2000, the cheapest of the Gb/s NICs available were PCI Ethernet cards priced under $300 each; now they are $50-$100. Prices have continued to drop. Prices on custom high
        • The technique that was used seems to be more of a mental exercise in making spaghetti, I don't see it reducing latency or increasing performance beyond the currently used techniques.

          It significantly reduces cost. In wire speed switches (FastE or GigE) there will typically be a sweet spot for price/performance. Beyond that point, switch prices jump into the stratosphere.

          For larger clusters, there simply aren't any switches big enough at any price (just try to get a 256 port GigE wire speed switch for e

        • (they have 64 machines, not 128, so I have done the numbers with this).

          you can increase performance. rather than 1 Gb port into a very expensive 64 port switch, to give you a maximum of 128Gb bandwidth (bidirectional 64x1Gb), you can (if you use the calculator) stick 4 Gb ports in each machine, buy 11 cheapo Dell 24 port gigabit switches (about $3k each), have 1 switch latency, and have 4 times the total non blocking bandwidth available. And the switches will still cost you less than 1 64 port gig switch.
    • Traditionally, people have tried to keep their routing tables small. When you're routing in hardware, the larger your routing table, the slower (or more expensive) your routing hardware is. As a result, you want to have single routes which apply to entire groups of hosts (eg, "packets for nodes 0-127 go through port 0, packets for nodes 128-255 go through port 1").

      Because the routing is being done in software instead, the cost driver is dramatically reduced; consequently, it becomes cost-effective to hav
      • Because the routing is being done in software instead, the cost driver is dramatically reduced; consequently, it becomes cost-effective to have a routing table with an entry for each node.

        I was actually wondering how well Linux would handle this. The obvious algorithm to find the correct entry in the routing algorithm is linear in the number of entries. That doesn't sound like efficient to me, but it might be that 100 entires is still so small a number, that it doesn't matter. However this particular cas
  • this is nice (Score:2, Interesting)

    but super computers as in giant iron are becoming more specialized and as such would woop the pants off a Beowulf cluster when competing in the specialty.

    of course, if you just need a lot of general purpose super computing, it is obvious that you cannot compete with this.
    • Wrong (Score:3, Informative)

      by imsabbel ( 611519 )
      In reality, beowolf clusters are good for only a subset of supercomputing tasks and the "real" supercomputers are still best at general purpose supercomputing.

      If you can paralize your application well enough, beowoulf rules, but if you need a lot of node2node communication, the network cost quickly surpasses the cpu cost of the system
      • Re:Wrong (Score:4, Insightful)

        by sjames ( 1099 ) on Saturday August 23, 2003 @02:23PM (#6773733) Homepage Journal

        Really, it's a spectrum. One one end you have fully commodity beowulf, in the middle, you see things like Dolphin and Myrinet, and on the high end you see fully custom backplanes and sometimes RAM and I/O controllers as well. Purpose built CPUs are becomming less common now, but not unheard of.

        Each step up the spectrum widens the domain of problems that the machine can work on efficiently, and raises the price for the machine. In many cases, a 'real' supercomputer is more or less a cluster with a specialized network and OS and mounted in a single cabinet so it doesn't look like a cluster.

        In general when a lower end machine can efficiently run your program, there is no benefit to using a more expensive machine.

        As server hardware improves and 'exotic' hardware becomes more mainstream, the gap between the low and the high end narrows. There will probably always be a small but existant set of problems that call for the 'real' supercomputer, but that set is shrinking.

        There are other considerations as well. If the Beowulf in your lab can solve the problem in 1 week and is available now, while the 'real' supercomputer on the other campus can solve it in 4 hours and will have a timeslot available in 2 weeks, the Beowulf is 'faster' from your point of view.

  • by gorim ( 700913 ) on Saturday August 23, 2003 @10:31AM (#6772691)
    And it was introduced to consumers just a couple years
    ago. Sorry, the AMD beowulf cluster at $100/GFLOP just
    isn't that impressive.
    • by Sycraft-fu ( 314770 ) on Saturday August 23, 2003 @10:48AM (#6772764)
      I'm guessing the latter. You see all sorts of BSified numbers from marketing departments on processors, but they have little to do with reality. The number for this AMD cluster is a real, actual, measured-using-a-real-world-app number. To give you some idea of BS console numbers, the Xbox has a PIII 733 processor in it (ok, technically it's a little different, but it's a P3 core). Now the Gflop claim is 2.93. Out of a P3 733? Ya right, on paper perhaps but never in the real world, much less on a real app.

      Then, of course, there is the issue of specialised chips vs normal chips. A GeForce 4 4400 can claim, roughly, 80 Gflops peak. That sure beats the hell out of any sinlge CPU I've ever heard of, including the Power4. Thing is the GeForce 4 is a graphics DSP, it isn't a general purpose CPU. It can do that kind of math when all its units are working at what they do best, but try to reprogram it to do something else and it will slow to a crawl (for that matter I'm not even sure that it is turing complete).

      So don't take any hype on a console to equate to real performance in a general task. Oh, and the BS marketing number I see for the PS2's Emotion Engine is 6.2Gflops.
      • Of course, assuming it's only half the parent comment's assertion, thus 2.25 GFlops, at $180 it's still cheaper than $100/GFlop. However, as others (should?) have pointed out by now, it's useless as a supercomputing node for all but the smallest tasks since it has no local storage and extremely limited main memory. You will have to spend another $200 for a linux kit to get storage and networking, bringing it up to $380 for the system. If it were actually 5.5 GFlops in the real world, then that would still b
    • 5.5 Gflops, I dunno if it can really do that, but ...uh..the point is that it's the first *supercomputer* to break the $100/GFLOP barrier. The Playstation2, last I checked, isn't a supercomputer, it's a videogame platform.

    • In cache maybe (Score:3, Informative)

      by msgmonkey ( 599753 )
      These numbers for microprocessors etc mean nothing because they are usually referring to operations on data in cache.. you'ill find that real life performance is 10-20x slower because thats how much slower accessing main memory is.
    • Nice how you take the numbers from a marketing press release and treat them as if they are the absolute, indisputable truth. Can you show me the actual, reproducible benchmark that produced those numbers?

      Also, the PS2 is not a supercomputer. It has a slow processor and very little RAM, so it wouldn't be able to do much number-crunching. You can't hook PS2s together, anyway, so comparing a single specialized machine to a cluster is absolutely meaningless.
    • The key word here is supercomputer. A PS2 is not a supercomputer.
    • The previous price/performance champ was in fact a PS/2 cluster, mentioned here, but this AMD cluster is roughly three times the performance for the dollar. You can check the stats with different assumptions on their FAQ [aggregate.org] page, particularly the section labeled 'Is KASY0 really the first supercomputer under $100/GFLOPS?'

    • Gah feel free to mod the previous version of this comment into oblivion, I hit submit accidentally.

      The numbers you're looking at are marketing numbers first off, and overly generous. Second you don't scale for free - you never get anything like 100 times the performance of a single box when you wire 100 together, for the same reason that you don't get twice the horsepower out of an engine twice the size.

      The previous price/performance champ was in fact a PS/2 cluster, mentioned here [com.com], but this AMD cluster

  • though is how many mp3's are these students sharing on this monster ?


    • People dont share mp3s anymore, if they do the FBI, NSA, Secret Service, CIA, and Homeland Security Dep will swarm them and put them in the bay.

      I mean I wish we could crack down like this on organized crime, or on domestic terrorists, I'm surprised we are so aggressive at arresting teenagers who download music, but the KKK and Neo Nazis can collect a million guns and spread their crazy hate speech and its protected by freedom of speech.

      I'd think that hate speech does more harm than copyright infringement
  • each node has two side case fans! that's gotta be the most dedicated case modding job i've ever seen! 132 pc's with 2 fans! too bad they didn't put fan guards ... or interior lights.. or blue led's... but i guess all that junk about a supercomputer makes up for it...
  • by krahd ( 106540 ) on Saturday August 23, 2003 @10:41AM (#6772730) Homepage Journal
    and it still can't run Doom III at a decent rate.

    --krahd

    mod me up, scottie!
  • Comment removed (Score:4, Interesting)

    by account_deleted ( 4530225 ) on Saturday August 23, 2003 @10:44AM (#6772750)
    Comment removed based on user account deletion
  • Cooling (Score:4, Informative)

    by bengoerz ( 581218 ) on Saturday August 23, 2003 @10:45AM (#6772756)
    I toured the previous cluster these guys did (KLAT2) and was very impressed. However, using AMD Athlon Thunderbirds last time, it did get quite hot. I remember standing by the cluster looking at all the wiring and being bombarded by an overhead cooling vent. I'm also assuming that these cooling issues is the reason that each case has two blow-holes. I'd also like to see these guys post in-depth specs of each machine. Being a hardware nut, I'd like to see how they got so many machines so cheap, and maybe even what vender they used. As I remember, they worked REALLY hard on their last cluster to keep costs to an absolute minimum.
  • by borgdows ( 599861 ) on Saturday August 23, 2003 @10:50AM (#6772776)
    Dear customer,

    At the cheap introductory price of 699$ for 80 lines of code in the Linux kernel, it will cost you 8,377,500$ by kernel since we have discovered that in fact 1000000 lines of SCO IP were copied into Linux.

    Designation .. Price .. Qty .. Total
    Linux kernel .. 8,377,500$ .. 128 .. 1,118,400,000$

    So you must pay us only 1,118,400,000$, and in my kind almighty I will offer you a discount of 118,400,000$ so you only have to pay ONE BILLION DOLLAR if you pay before tomorrow!

    Please send you creditcard number at darl@sco.com

    Sincerely yours,

    -- Darl Mac Bride
  • Nice wiring! (Score:2, Insightful)

    by nate.sammons ( 22484 )
    Looks like most of the wiring jobs I've seen done by students: kasy0core.jpg [aggregate.org].

    God forbid they use cable gutters ;-)

    Other than that, kick ass job guys!

    -nate
  • Hey! I used to work there.

    Way to go Dr. Dietz!

    So, mod me anyway you want, karma to burn.
  • by SilverSun ( 114725 ) on Saturday August 23, 2003 @11:04AM (#6772823) Homepage
    I wonder which universities/institutes have larger and maybe cheaper clusters, but just don't bother with running benchmarks. I for one are sitting next next to a tiny cluster with 40 dual-cpu nodes, which is connected (GRID like) to a 340 dual-node cluster in a nearby town. Non of us high ernergy physicists bothers with running any benchmarks on our clusters, other than our own applications. I wonder how many "linux-cluster-supercomputers" are out there which would easyly make it into the top 500, but noone has ever heard of....

    Cheers.
  • by SuperBanana ( 662181 ) on Saturday August 23, 2003 @11:07AM (#6772834)
    Because this was a university project, KASY0 was assembled entirely by unversity students, which while being a source of cheap labor, is also a good way to get a lot of students of involved in a great project.

    At the risk of being flamebait- No. Using university students is almost always purely a way of getting cheap labor to do semi-mindless, or completely mindless, stuff the staff doesn't want to do- it's a common myth that students 'learn' by doing grunt work. I should know- I have several grad student friends, and they've thusfar spent a large part of their academic careers working in labs doing mind-numbingly boring stuff(according to them.)

    Imagine if a Bio lab did this. The following would sound pretty absurd: "Help us move our lab, you'll learn about cellular recombination!". No. You'll learn what a bunch of lab equipment looks like, how eccentric the professors are, and how expensive/fragile/heavy the equipment is, and the next morning what sore muscles are like. Let's get a reality check here.

    (from the site):Our group develops the systems technology for cluster supercomputing; the more people we can show how to apply these technologies, the better.

    Huh? What cluster supercomputing "technology" does assembling a PC and plugging it into ethernet teach you? Did they give a presentation about how clustering technology works, for example? Did they explain to each person, as they put a machine in a particular place and wired it to a particular switch, WHY it was going there etc? Obviously I wasn't there, so perhaps someone from the group can contribute on this point.

    • by panda ( 10044 ) on Saturday August 23, 2003 @11:46AM (#6772963) Homepage Journal
      Having worked there, and knowing what Hank Dietz and his students are doing, I can tell you that it is different from just slapping PCs together, stringing wire between them and installing clustering software.

      Dietz specializes in networking and all the wiring that you see in the photos is charted out by custom software that he's written just for this purpose.

      He works in the realm of optimizing communications among the nodes to avoid network latency and so on. If you read the POVRay benchmarks, you'll notice that the author comments that several clusters' CPUs spend most of their time idle due to network latency. Dietz is researching the best ways to eliminate much of that latency so that the CPUs in the cluster can spend more of their time crunching data rather than just throwing off heat. To my knowledge, he is succeeding at this and better than most other researchers in the field.

      As for what his students learned from this, I don't know exactly which students helped him on this. For KLAT2, there were several undergrad volunteers who helped with wiring and assembly, mostly from the campus Linux Users' Group. I know his grad students and research assistants are learning a lot about how clustering and network tech works, and a couple are doing their Ph.D. disserts in this very subfield of E.E.
      • It seems to me that the next step is to get some big switches with VLAN support and reconfigure them dynamically as the workload changes in order to maximally utilize all nodes. I wrote some pathetic little software once to log into some switches and make vlan changes from a web interface (no security or anything, what a bad idea eh? worked though. this was before cisco included the "why bother" ssh1 in ios) so at least THAT part is trivial :D
      • I worked there as well back during the KLAT-2 days. Sure I dont' remember getting any actual mention on the project (even if I did help coordinate the student help, help build it, assemble it, etc), but thats ok. It was a great project to work on (I still even have the GaLUGtica videos I made in 3dsmax) and I did learn quite a few things. Deitz harpy of darkness, aka Tim Mattox, knows his stuff, and him along with Dieter will very helpful to the students answering whatever questions they had.

        Petty 1 of
    • At the risk of being flamebait- No.
      He who moderates you has been infected with the reverse psychology bug! See sig.
    • On the other hand to build a supercomputer for less than 89$ per gflop you still have to actually "build it". I mean who else is going to put it all together? Someone has to build it if you want cs students to use it.

      And yes microbiology students will still have to build their own apparatus for experiments they conduct - I only know this because I took a class in microbiology a while back and I had to build the apparatus for all the required experiments I had to do.

      I'm guessing in this case they not only
  • In other news... (Score:2, Insightful)

    by rmdyer ( 267137 )
    Now that the university students have graduated and moved on, there isn't any documentation, nor do they know how to use the darn thing...

    -1
  • why not DSP? (Score:5, Interesting)

    by mike_g ( 24445 ) on Saturday August 23, 2003 @11:11AM (#6772842) Homepage
    Why are not DSPs used in configurations such as this. The TI 67xx series are able to perform about 1 GFLOP/s running at only 150 MHz and cost only about $40 per chip.

    This price/performance ratio seems to make them very attractive compared to general purpose CPUs. According to the NASA G5 Study [cox.net], the P4 2.66 GHz is only able to achieve 255 MFLOP/s. And the P4 costs about 4x the price of the 6711 DSP.

    It seems that DSPs should be the clear winner in supercomputer applications, what are their disadvantages and why are they not used? Granted there is a lack of mass produced hardware such as motherboards for DSPs, but that alone should not exclude them from the supercomputer realm.

    • I guess it's the lack of operating system support on the DSPs themselves. Plus their instructions sets and I/O don't lend themselves well to general purpose computing. The cost of developing a node consisting of a DSP plus a general purpose processor, plus the efficient I/O to the DSP, might be too high for the relatively restricted usage on supercomputers.

      That said, my Palm Tungsten is a combo of a GP processor and a DSP, as I believe are several Sony variants. Perhaps as I/O on handhelds improves (?) the

    • There's a lot of OS essentials that could be moved easily into hardware. By using programmable gate arrays, or by just etching the kernel directly onto silicon, they should be able to reduce the energy requirements and thereby the actual cost.


      Further, it would also accelerate the product enormously - Linux on a Chip would be blazingly fast, as it wouldn't take any processing power away from what it was running - thereby also reducing the cost per GFLOP.

    • Re:why not DSP? (Score:4, Informative)

      by SmackCrackandPot ( 641205 ) on Saturday August 23, 2003 @12:36PM (#6773200)
      Actually, they do, but they are referred to as vector processors rather than DSP's. Probably the most famous and the first was the Cray supercomputer [cray-cyber.org]. And there was also the INMOS "Transputer" [ox.ac.uk]

      DSP's are optimised to handle streamed data of a particular maximum size (Eg. 4-element float point variables). Useful for image processing (red,green,blue,alpha) and 3D graphics(XYZW), but if you're modelling something like ocean currents, global weather, every data element is more than likely going to have more than four variables (eg. temperature, humidity, velocity, pressure, salinity, ground temperature), you may not get full optimisation.

      Plus, you also need a means of getting all these processors to talk to each other. DSP's are nearly always optimised to operate in single pipelines, so don't need much communication support (eg. Sony Playstation 2). However, if you're designing a supercomputer system, the major bottleneck is the communication between processors (network topology). Some applications might only need adjacent processors to talk to each other (global weather simulation usually represents the atmosphere as a single large block of air, with sub-blocks assigned to seperate processors. Other applications might assign individual processors to different tasks, which complete at different rates (eg. the Mandelbrot set). A configurable network architecture allows the system to be used for many more different applications.
    • Why are not DSPs used in configurations such as this.
      1. Non-commodity hardware has high one-time expenses for design.

      2. DSPs tend to not have a lot of RAM, whilst big modelling apps crave RAM (esp. raytracing).

  • by prof_bart ( 637876 ) on Saturday August 23, 2003 @11:33AM (#6772910)
    Hmmm...

    Nice machine, but this January, CITA and the astro department at the University of Toronto brought a 256 node dual Xenon system on line: "1.2 trillion floating point mathematical operations per second (Tflops) on the standard LINPACK linear algebra benchmark." Total cost: CDN$900K (including tax) (in January prices, that's $600K U.S. or $0.50USD/GFlop.) It's being used for some very cool Astro simulations...

    See http://www.cita.utoronto.ca/webpages/mckenzie

  • Am I missing something? They say:

    KASY0 nodes are completely diskless; there isn't even a floppy. (from the FAQ [aggregate.org])

    So how are the nodes booted? Are there bioses out there that can netboot?

    -c

    • Many ethernet cards have a socket for a programmable chip that allows netbooting. Pretty much all you need is the address of the server from where to retrieve the rest of the software. Usually the kernel is loaded via tftp then the rest of the os is NFS mounted. I don't know if this is how the article is doing it, but the netboot stuff is pretty common and easy to configure.
    • Yep, it's called IBA or Intel Boot Agent, it allows booting of a diskless system through PXE. It's actually where Paladium came from origionally. In order to pull a boot image over the network and be sure it was not tampered with on the wire through a man in the middle attack you need hardware crypto with a signed boot image. Basically every PC made in the last 4-5 years or so supports it (there are exceptions but they are usually consumer only oriented only PC's, corporate PC's almost all support it). It's
  • by Tiosman ( 614633 ) on Saturday August 23, 2003 @12:14PM (#6773086)
    It's not the first time that these folks in KY work around the definition of the acronym "Flop". A Flop is a floating point operation on 64 bits, not 32 bits. All entries in the Top500 used results with 64 bits HPL, nobody else in the world is running HPL on 32 bits. So claiming the moon on 32 bits is easy, useless for the sake of comparaison and almost unethical. I cannot believe that Dr Dietz do not know the difference by now.

    The same machine would yield average results on 64 bits. Difficult to draw attention without headline numbers...

  • ... they're going to have the largest Quake LAN party ever!
  • overclocking (Score:2, Insightful)

    by snooo53 ( 663796 )
    Looking at the specs I'm curious if anyone thought of overclocking the machines to get an even bigger performance increase. It seems that with most Athlons you can get at least a good 100 mhz of extra speed, even with a stock cooler, by increasing the fsb/multiplier and not even touching the voltage. Even a modest increase like that would yield an extra 12.8GHz of power, dropping that price figure even further. Depending on what type of computing they're doing, increasing the fsb might have an even bigg
    • That would be stupid. The entire point is, the nodes are so freaking cheap, that if you really want an extra 5% performance you just buy a few more nodes. Gee, what do I choose, buy a few more nodes, or spend two weeks overclocking all these finicky chips and trying to get them to run correctly?

      Besides, nobody in their right mind would run a parallel program of any importance on a "rigged" setup like that.

      • Not to mention if you can only overclock, say, 50% of them, would you run into problems with nodes running at different speeds?
        • It depends what sort of cluster it is. If you have a standard network of workstations, and you're running something like PVM or MPI, then each node can run at a different speed. In fact, they don't even have to be the same kind of nodes (you can have different platforms, say Solaris and Linux, both running in the same virtual parallel machine).Usually you will have to adjust your algorithms to account for nodes running at different speeds. But it doesn't make it impossible.

          MOSIX is a parallel cluster oper

  • But the "FA" says $1000 per gflop not $100

    Did you RTFA?
  • by Axe ( 11122 ) on Saturday August 23, 2003 @02:20PM (#6773722)
    That they own to SCO, that damn commies? Did they at least aknowledge using stolen property?

    What a shame. Freeloaders. They would never be able to achieve such performance if not for the fruits of labour of SCO .. eeeh.. lawers?

  • Er...you can do that with parts from ebay or craigslist without too much trouble.
  • You have to include a people time, building overhead etc. A reserach grant may be billing $500 - $1000 a day for this. If this takes 50 man days to set up, then the cost is is another $50,000.

To the systems programmer, users and applications serve only to provide a test load.

Working...