Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Silicon Graphics

SGI Demos 64-Proc Linux Box 253

foobar104 writes "Details are scarce, but SGI announced this morning that their prototype Itanium 2 system has demonstrated more than 120 GB/s to and from main memory on the STREAM TRIAD benchmark, which is the fourth best result in the world. For comparison, the Cray C90 sustains 105 GB/s, while an even larger Sun Fire 15K clocks a measly 55 GB/s. The interesting part? The system wasn't running IRIX, SGI's proprietary version of UNIX. It was running Linux. More information on STREAM TRIAD, including results from other systems, is available here. The system, incidentally, was an Origin 3800 straight out of manufacturing equipped with Itanium 2 processor modules. SGI will start selling the systems early next year."
This discussion has been archived. No new comments can be posted.

SGI Demos 64-Proc Linux Box

Comments Filter:
  • by Anonymous Coward on Monday September 09, 2002 @12:57PM (#4221514)
    To me, it would seem that the primary purpose of being able to push info that fast to and from memory is useful for very few problems these days. I was under the impression that the majority of "super-computing" problems were of the sort that required lots of calculations, not lots of parsing of information in storage.

    Am I wrong about what this benchmark means? Or am I missing something basic?
    • Usually these number crunching exercises have large datasets. Too big to fit in the CPU registers, or cache, so you need quick access to RAM too.
    • Perhaps the super-computing problems are approached in the way they are, because of the limitations on bandwidth to the CPU(s).

      Most of the super-computing problems are simulations, and I would have thought that being able to simulate more of the environemnt (therefore, more data to crunch) would be an advantage.

      Simon
    • This is very dependant on the type of application that is being done. A big use of supercomputing power these days is genome research, and from what I've seen in this field, these applications are very data intensive, moving around massive amounts of data related to the sequences being processed. I'm also rather sure that applications like nuclear explosion and earth motion modeling require manipulating very large amounts of data that have a need for lots of memory bandwidth.
    • Thin client!!!!

      That's about the only use I can see for it. I could easily replace every workstation and server in our building with one of these.

      I guess colo could be another use, but I'd have to question what you're hosting that needs 64 Itanium processors. More importantly, how well does it handle VM?

      • That's about the only use I can see for it. I could easily replace every workstation and server in our building with one of these.

        Wow, that's going to be expensive, and how will they fit those in their cubicles?

        Imagine an openMosix cluster of these though. :)
      • Yep :-)

        Throw out that old Z/390 or AS/400 and replace it with centrally managed GUI terminals. Saves the company $$$ My understanding is the only thing stopping Intel from taking over the mainframe space was a stable OS and memeory bandwidth, looks like that's solved.

        But given SGIs history this is probably destined for running simulations and large factoring jobs.
    • by Falrick ( 528 )
      Its good for, as another poster put it, simulations. Specifically simulations with lots of tightly coupled entities. If you are simulating, say 100 different entities, and the action of each of those entities has an affect on all of the other 99 entities, you gain greatly from a massively parallel shared memory environment. Sending state changes through a cluster can kill these kinds of applications.
    • by Jhan ( 542783 ) on Monday September 09, 2002 @01:17PM (#4221661) Homepage

      Typical super-computing problems are weather prediction, air flow computations and nuclear reaction modelling. Physical models in other words.

      Generally, you attack these kinds of problem by partitioning 3-d space into many small cells, and then running relatively simple calculations on every cell. The better the resolution, the better the model.

      The thing about three dimensions is, storage space increases with resolution^3... For instance, I believe the weather guys are currently pushing 1kmx1kmx100m resolutions. That means about 3,2e11 cells. If each cell has 1 kB of state, the total memory usage would be about 320 TB.

      Super computing problems eat memory like Takeru Kobayashi [cnn.com] eats hot dogs. In many (most?) cases the calculations are simple. Hence, bandwidth is King.

      • by Anonymous Coward
        1 km x 1 km x 100 m for Numerical Weather Prediction
        is a bit much for today's (affordable) supers.

        We use a 22 km x 22 km horizontal grid for
        predicting the weather 48 hours ahead over the
        North Atlantic + Europe (406 x 324 cells).

        We use 31 layers in the vertical (from ~30 meters
        thick in the lowest level to ~2 km for the few in
        the stratosphere.

        This is for a so-called "limited area" model. A
        global model such as the model of the European
        Centre uses about half the resolution (40 km)
        over the entire globe.

        Toon Moene.
    • by foobar104 ( 206452 ) on Monday September 09, 2002 @01:19PM (#4221675) Journal
      Am I wrong about what this benchmark means? Or am I missing something basic?

      With no disrespect intended, I think you might be missing something basic.

      Any activity that involves moving data into and out of RAM will benefit from the ability to do it faster. That includes things such disparate things as database processing (if you're lucky, you can cache your indices in RAM), media encoding, hell, even compiling. Memory bandwidth is one of the few aspects of computer design that touches just about every application, with the exception of those that are small enough-- or sufficiently well optimized-- to fit into cache.
    • by ericman31 ( 596268 ) on Monday September 09, 2002 @01:31PM (#4221771) Journal

      One of the areas this is meaningful is data warehousing. There are three major competitors in the very large data warehousing environment and one wanna be competitor:

      • NCR Teradata and Worldmark MPP servers
      • IBM DB2 and IBM pSeries clusters (MPP again)
      • Sun SunFire 15K and Sybase IQ Multiplex (SMP)
      • Oracle is trying to compete in this space and not really succeeding. Their model is sort of MPP, based on Oracle Real Application Clusters
      MPP, or massively parallel processing, is the typical solution for very large (generally anything over 3 or 4 terabytes) data warehouses. Sun and Sybase are trying hard to crack the market with their SMP (symmetric multi-processing) solution, which is actually very promising. The major benefit to SMP processing is simplicity, one server to maintain, one OS, no cluster, no cluster interconnect. With Linux potentially pushing into the large SMP space we will have the potential for competition to the MPP data warehouse solutions, which are incredibly expensive to purchase and maintain.

      One of the biggest drawbacks to Linux adoption in the commercial Enterprise space is its lack of SMP scalability. If the SGI platform works out we will start seeing Linux scaling into an arena that will allow for acceptance in the Enterprise.

    • There are lots of applications that will benefit from this. But what I would like to see are faster disk storage systems, not faster memory... But then my main work over the last years have been huge mail systems (entirely disk IO bound) and extremely fault tolerant database distribution (.name TLD resolution system, also almost entirely disk IO bound).

      I'd be very happy to find a storage solution that gave us transfer rates that would get us anywhere near utilizing the full CPU capacity even with entry level servers these days for non-computing intensive processes such as mail delivery, serving DNS queries or fault tolerant message queueing... (and preferrably one that doesn't cost ten times more than any potential savings from reducing the number of servers...)

    • Primarily this is good for marketting, company image, press releases, and selling potential customers on smaller systems.

      Chances are good that they will build very few full scale machines. Those that are built go towards data-warehousing, research (atmospheric, oceanic and space science, nuclear modeling, etc) and to the government. Factoring large primes is a use, for instance, as it's a problem that can be performed in parallel.

      But they will have the ability to say that x, y, and z companies/ gov't agencies have our equipment, it can't be exported (so it must be good), and our lower end machines will suit your job until you need an upgrade - in other words we can be with you for the whole ride and promise application compatability.

      -Adam
    • Actually, it's precisely because of lack of superfast mem-IO machines that many people tried to work around the problem and create algorithms that are CPU-bound.
      In fact, most of the computationally-intensive problems require LOTS of mem-IO.

      And there's one more thing: there's a huge difference between the 64-CPU SGI machine, and a Mosix cluster of 64 1-CPU nodes: the SGI has one single memory space contiguous on the same machine. That means you can actually use a very large matrix to process your data, instead of shoving bits of it over the network back and forth.
      There are entire classes of problems that will be solved orders of magnitude faster on the SGI server than on a network-distributed Mosix cluster (or any other kind of cluster, Beowulf, etc.). That's the advantage of true SMP systems (all CPUs on the same hardware) as opposed to networked clusters.
  • by Lxy ( 80823 )
    Like any good press writeup, it lacks any details that are useful to techies. I want to see a dmesg from this thing, as well as pretty pictures of what's under the hood.
  • by Neon Spiral Injector ( 21234 ) on Monday September 09, 2002 @01:01PM (#4221549)
    That was my first though. So it beats a C90, but what is faster?

    Found the answer here [virginia.edu].

    And if you were wondering about a Beowolf cluster of these, the top ten ranking excludes "cluster results".
    • Interesting...Looks like a T932 has got about a 3x performance on it, and the NECs (understandably, since they are the most modern) get like 5x. Still pretty impressive for a MPP machine, I would think. Were you able to find stats on MPP systems (such as the T3E or SP) anywhere?
    • These results are quite old. The SGI MIPS based machines [sgi.com] seem to be much faster.
      512 processor Origin 3000 quoated as 716 GB/sec.
      I have no idea why they are using Itanics for this but its not because they are better processors.
      • It is strange about the Itaniums. Origianlly SGI was looking towards Windows NT to replace IRIX, and thus they started working with the Intel platform. But now they have turned that interest to Linux, which runs just fine on the MIPS.
        • Windows? No way! (Score:2, Interesting)

          by halfelven ( 207781 )
          SGI never thought to replace Irix with Windows! That's ridiculous.
          Irix can scale up to 1024 CPUs and beyond. Solaris can scale up to 100. Here's Linux, now it's scaling close to 100. How much to you think Windows can scale? 10 CPUs? 20? :-)
          SGI's thing was always that it had machines running one single copy of the OS across hundreds (or thousands) of CPUs on the same machine (not in a cluster). You simply cannot do that with Windows, period.
          They had some graphics workstations running Windows, but that was on the lowest end of things, and now those systems are not available anymore.
          • Comment removed based on user account deletion
            • Back in the early days of Windows NT, it was not know what its capabilities would be. SGI nearly bought the farm by betting NT to replace IRIX.

              Then SGI realized NT wasn't going to be for big machines, and let that bad dream fade away.


              Dude, that's simply not true. SGI never even built a prototype Windows system that ran any version of NT before 4.0. By the time their Windows NT workstations made it out the door, Windows 2000 was very nearly a reality. So close that rather than adding support for certain features, SGI just punted. For example, dual monitor support was never offered on the NT systems until Windows 2000 came out, because NT 4.0 didn't support running two graphics pipes with different drivers.

              SGI never, ever, spent any time or effort on Windows NT for anything other than workstations.

              But likewise, finding x86 hardware with more processors is probably the largest reason that x86 Linux, Windows, whatever isn't running on bigger machines.

              So, let me get this straight. They don't make big x86 boxes, and that's "probably the largest reason" why there are no OSs for big x86 boxes? Brilliant!
              • Comment removed based on user account deletion
                • During development of Windows NT 3.1, the first version (god bless starting counting at 3) MS made a strong pitch to SGI to get behind it and work with them to replace IRIX. SGI turned it down, and later signed up for limited workstation production.

                  If true, this is the first I've heard of this. Can you back this up with some sort of evidence?
              • So it was more like "we got a moron for a CEO who was totally in love with Bill Gates". He got SGI to commit to running Windows, even on big systems (in fact, the internal hardware manuals for a large system I worked with there actually mentioned booting "an OS such as Irix or Windows", even though that plan had long since been dropped). This caused SGI to lose a large number of top-notch developers who didn't really want anything to do with Windows. The same moron CEO, "Rocket Rick" Belluzo (who recently got the axe at Microsoft), and his yes-men made a number of other dumb descisions that nearly killed the company. Finally, he quit (before he could be tarred, featered, and run out of town on a rail) in August of '99 and was replaced by competant management (the present CEO, Bob Bishop, who brought in a bunch of other people who are a lot better than Rocket Rick's crowd).


                In any case, the large systems capable of running Windows didn't appear until after he left and by the time they did, the descision had been made to unload Windows and go with a useful OS (Linux). This is what triggered SGI to get involved heavily in Linux development. Irix was another OS that was considered for the Itanium platform, but there were a variety of reasons why that wasn't picked.


                So, that's the short story on why SGI is presently making this system with Linux and why some people have mentioned Windows on large SGI's.

      • 512 processor Origin 3000 quoated as 716 GB/sec.

        That's a peak speed, not a STREAM speed. Some of these machines (like the NEC SX-6) have peak speeds that are *much* higher. STREAM is an attempt at showing how a system performs on a somewhat more realistic workload.

      • If they use someone else's parts for a portion of the solution, that's one less chunk of the R&D that they have to bankroll. HP is dropping it's own CPU line over such concerns. Besides, on highend RISC based machines it is the memory busses that are most impressive (not the CPUs). A Sun or SGI bus is what Intel CPU's need to look really respectable.
  • impressive w/Linux (Score:5, Interesting)

    by d3xt3r ( 527989 ) on Monday September 09, 2002 @01:04PM (#4221571)
    What is most impressive about this to me is that they did it using Linux over IRIX. Why? Because this has provent to be Linux's weakest point: scalability. Most of the changes in 2.5 are concentrating on scalability, could this be reaping those benefits?

    Linux running at 120 GB/s with 64 processors is impressive for an OS that has been criticized as inefficient when running on more than 8.

    I would be very interested to know what version of the kernel they are using.

    • by tempest303 ( 259600 ) <jensknutson@@@yahoo...com> on Monday September 09, 2002 @01:15PM (#4221640) Homepage
      I'm wondering the same thing - I wouldn't be surprised if this wasn't a very customised 2.4/2.5 hybrid or some such.

      What I'm more curious about is what the licensing of all this will be like... are they just doing standard kernel patching, in which case the changes might get rolled back into the vanilla kernel? I'm a little worried that they might be doing it all via binary-only modules, which means that Linux proper gets none of the changes rolled back in... :-( I'd be somewhat surprised if SGI did this, though - they seem to have been pretty damn OSS friendly. (XFS!)
      • by Angry White Guy ( 521337 ) <CaptainBurly[AT]goodbadmovies.com> on Monday September 09, 2002 @01:28PM (#4221742)
        I think that the big question is will this get Big Iron back into the rendering farms, and what will be the effect?
        With the major animation companies going to Linux server farms to save cost and get better performance, maybe moving back away from x86 architecture to these large machines may be beneficial cost/productivity wise.
        • Nope. Rendering motion picture frames is "embarrassingly parallel" as my boss likes to say. For a feature length movie, you have circa 120000 frames that each can be rendered without any communication through memory to other frames being rendered.

          You would be foolish to pay for interprocessor memory bandwidth when clusters are just as fast for that task.

          • Heck, not only do the frames not need to communicate with any other frame to be rendered, the same is also true for most of the pixels. (Absolutely so for classic ray tracing, less so for other rendering techniques.)

            On the other hand, however, many of the modelling techniques used to generate/animate the scenes to be rendered are memory bandwidth intensive as they basically amount to physics simulations in themselves. (Think particle systems, water effects (fluid dynamics), motion of things like hair and fabric, etc.)
      • by CMonk ( 20789 ) on Monday September 09, 2002 @01:39PM (#4221833)
        Given that they list "scalability" as one of the open source projects that they contribute to I would say they are playing nice with the community. (http://oss.sgi.com/projects/).

        They are working hard to get a number of their changes into the offical kernel, I imagine this is one of them .
    • I would be very interested to know what version of the kernel they are using.

      I tried really hard to find that info this morning before submitting, but to no avail. But the test was demonstrated at the Intel Developer's Conference, according to the press release, so maybe we could find somebody who knows somebody?
    • As everyone else here, I don't know either. But I'd say it's quite a different kernel than the stock 2.4/2.5 kernel. I'd gues something like

      1) A K42 [ibm.com] -like exokernel with some parts of the linux kernel bolted on.

      2) Something like Larry McVoys idea of OsLets [bitmover.com], i.e. many kernels running on the system collaborating to provide a single system image to the user.

      3) The traditional way, i.e. implementing super-fine-grained locking in the linux kernel. This would of course make linux hard to maintain and slow on "normal" hardware, just like say, solaris.
    • What is most impressive about this to me is that they did it using Linux over IRIX. Why? Because this has provent to be Linux's weakest point: scalability.
      Maybe that was true Three years ago [sgi.com] when SGI announced its Itanium/Linux strategy. But I imagine they've put a little effort into it since then.

      This new system is news, but it's hardly groundbreaking news. Back in '99, SGI spun off MIPS [mips.com] and announced they would do commodity systems -- including supercomputers with commodity processors. At that they had a choice: port IRIX to the Itanium, or teach Linux to scale so they could use it on their supercomputers. It's been no secret that they chose the latter. Or why: it was less expensive, and catered to an established user community.

      Note that Itanium/Linux systems are not meant to replace MIPS/Irix systems. Unless they've changed their strategy since I worked there, SGI plans to keep developing Irix systems for another 10 years, at least. Of course, that depends on maintaining loyalty to Irix solutions, and the buzz is that they're having trouble with that.

  • by Durinia ( 72612 ) on Monday September 09, 2002 @01:05PM (#4221575)
    ...interesting that SGI chose the Cray C90 - a system released in *1991* - to compare against. It's nice to know that it's only taken them 10+ years to catch up. :)

    They also mention the SV1, which is a "low-end" Cray. I'm curious how the new X1 (nee SV2) does on the STREAM suite.

    It's good to see that their "scalable linux" work seems to be doing pretty well! I'm sure it was much easier for them to use the IA-64 port of Linux than to port IRIX...

    • SGI didn't choos to comapre this to a C90, the slashdot submitter did. SGI primarily compared it to the "IBM® eServer p690 and Sun Microsystems Sun Fire"

      The part that I really find interesting is that the top three in the list all outperform this by twice as much, the #1 spot being held by a machine that can do over 500GB/sec.

      It's still over 12x faster then the quad Itaniums I used to work with, and probably much cheeper then the NEC machines and the Cray...
    • ...interesting that SGI chose the Cray C90 - a system released in *1991* - to compare against. It's nice to know that it's only taken them 10+ years to catch up. :)

      If you read the STREAM TRIAD web site linked above, you'll see that SGI didn't compare itself to the C90 exactly; it just ran a benchmark and published the results. Also in that approximate rank are other machines from NEC and Cray and, further down, Sun.

      But you're right. Cray was way ahead of their time when it came to things like memory bandwidth. I remember a friend (ex-Crayon) telling me once that access to main memory on the T-90 was faster than access to the on-chip cache on the Pentium III. That sounds implausible, though, so he might have been exaggerating.

      I'm curious how the new X1 (nee SV2) does on the STREAM suite.

      The last word I got is that X1 is still in the PCB design phase. It's only running as a simulator right now. So it'll be a while before you see those numbers. ;-)

      (That info is several months old, so I may be wrong.)
  • Hmm. I wonder if I can parallellize the app I work on enough to use all those 64 processors? I know my bosses would wet themselves if I did. Of course, I am mainly disk bound. Anyone got a disk system to match?
  • Does the current 2.4.x series kernel scale to 64 procs effectively, or are they using some "enterprise patch" to fine tune for this particular hardware? I was under the impression that since most kernel developers don't have access to this kind of ultra-high end hardware that Linux isn't really optimized for it. Correct me if I'm wrong.
    • Re:Stock Kernel? (Score:2, Informative)

      by Jobe_br ( 27348 )
      I, too, was wondering if SGI has produced a patch for this or if its running a linus kernel. Chances are, though, it isn't 2.4.x which is in maintenance mode, but rather the 2.5.x series, which is concentrating on enhancing scaleability. Surprising, however, that the 2.5.x line would have gotten such impressive results so early. 2.5.x has only been in the works for a short time now, right?!?
    • Re:Stock Kernel? (Score:2, Informative)

      by GigsVT ( 208848 )
      SGI is actually the driving force behind a lot of work on linux scalability. SGI submits patches to the kernel, everyone benefits, etc.

      Linux isn't really optimized for a lot of processors, but companies like SGI are working to change that, and contributing a lot to the community in the process.
  • SGI make loads of 64 processor machines. And I believe Linux runs fine on multprocessor MIPS 14000s.
  • Why can't it run Windows XP?

    Ow!... ow, ow, ow, OW! stop throwing rock at me!

    Ok so it was a bad joke....
    • by Anonymous Coward
      Why can't it run Windows XP?

      Well, Windows is notorious for demanding a lot from the hardware. You have to expect it to be a dog on a low-end machine like this one.

      NT once ran on MIPS machines, as I recall. I don't have my NT4 disks handy, but I think that I recall that they included binaries for Alpha and Mips. Wouldn't it be nifty to be able to boot NT on that and see it run one cpu, straight into a bluescreen? After all, a computer without MS Windows is like a person without cancer.

      • at one time, NT was on x86, MIPS, and PowerPC. I remember all the "It runs NT" ads for MIPS based comps in teh Ziff-davis rags. I think for NT 3.51 only, then all but Alpha was dropped for NT 4, and then not even alpha was supported past NT4.

        I may be wrong...
    • I was going to say, "Wow, finally a machine that can handle the resource requirements of GNOME." but I didn't have the gnads.
  • If we could work together (plus Mr Perens who is currently looking for a good cause to lead) we could take the demo to greater heights.

    What is to say that the demo's code isn't buggy and shoddy, holding the power Itanium processors back?

    If we realize the vast potential that the Open Source developer community provides then we can tackle such complex tasks as this Itanium performance measurement.
  • Two things (Score:2, Interesting)

    by _damnit_ ( 1143 )
    This sounds very cool, but I would really like more info than this. Plus, it isn't going to be released until next year. Within that time frame there will be the usual delays and then final release to a couple customers. Don't get me wrong, I think this is cool. Especially the linux part. This could go a long way to helping Linux scale better on massive machines.
    The second thought is: can it be partitioned? This is a rather big machine and goes against the trend I have witnessed to use many smaller machines to accomplish your goal. I'll have to ask some of the guys at Oracle if they've looked at Linux installs of this size, but as far as I know they only make x86 ports right now. So, I wonder what linux apps would someone run on a system this big? (I know. Insert obligatory Quake, Beowolf and porn server reference here.)

    Disclaimer: I work for an SGI competitor. But I have personally installed Linux on every piece of harware I can get my hands on. Just to play usually, but still. They just pay my mortgage.
    • Re:Two things (Score:4, Interesting)

      by foobar104 ( 206452 ) on Monday September 09, 2002 @01:40PM (#4221838) Journal
      The second thought is: can it be partitioned?

      Since this machine is a standard Origin 3000 with McKinley processor modules, I'm going to assume the answer will be yes. You can partition an O3000 down to a single processor brick + base IO brick, so I imagine that SGI will implement the necessary software bits to make that happen on the SN1-IA systems. I know there are both user space bits (mkpart, partmgr) and kernel space bits (the TCP-over-NUMAlink driver).

      I personally have only seen partitioning used on HA systems and lab systems. For a fully fault-tolerant N-processor system, you can buy one 2N-processor Origin and partition it down the middle. The two nodes can run in parallel, passing data back and forth over the NUMAlink via TCP/IP, until one goes down. Also, partitioning is great in a lab environment. It's nice to be able to carve up a big multiprocessor system and give each user a 4-processor (or multiple of 4) node.

      I wonder what linux apps would someone run on a system this big?

      Anything you'd run on an IRIX system of that size, I'd imagine. I believe-- not positive-- that MSC has already released Nastran for Itanium 2 Linux. (Nastran is a computer-aided engineering tool used extensively in the automotive industry, and other manufacturing industries. It's used for things like stress, heat transfer, and vibration analysis.)

      And, as long as the Fortran compilers are worth a damn, you can run just about any other scientific, analytical, or technical software, I'd imagine.
  • I saw a few comments along the lines of "wowee, powerful!". I'm just curious what somebody'd want with a machine that powerful.

    Me, personally, I do lotsa 3D stuff and would love to see what it'd take to bring that machine to it's knees. However, I get the impression I'm but of a few 3D dudes here. So what would you non-3D dudes wanna do with it?
    • At a company I worked for in 1997, we used an SGI box of comperable power (well, not _that_ much power) to do real-time rendering of geological resevoirs of data. Typical data points were about 40MB of data, directly measured from the field of study. The purpose was a "fly through" for geologists to tell where oil could be found.

      Everyone on the team used SGIs (I used an Indigo 2, arguably the slowest box in the office) running IRIX. The Origin system sat two floors below us, with the 3D programmer only having the keyboard, mouse and monitor in his office. It made it difficult when we wanted to run a game of Quake, as everyone could easily sneak up on him.
    • I've been doing ab initio calculations, i.e. calculating properties of some atomic system starting from quantum mechanics. Last month I used about 3000 CPU-hours of IBM POWER4 1.1Ghz juice. And my calculations weren't extremely complicated either..
    • Model weather with smaller cells.
  • from Maddog:

    "For those applications that need to scale, SGI has just proven that Linux need not be synonymous with clutter."

    cluster? or clutter? a good cluster is not cluttered :)
  • by RicochetRita ( 581914 ) on Monday September 09, 2002 @01:21PM (#4221686) Homepage
    ... SGI will start selling the systems early next year.

    to meet the system requirements for Doom III.

    -R

  • ....but seriously what are the applications for boxes like these. I mean - other than uses for lawrence livermore labs etc... big ass iron like this seems to only really be useful for 1. nuclear modelling 2. benchmark testing press releases.

    I know that someone somewhere is going to use a box like this - but tell me for what real world application will you use it. (serious question - curious. I want to know the reall apps these are used for)
    • ... weather modeling, for one. Here in the US, NCEP (National Centers for Environmental Prediction) runs all the forecast weather models on an IBM-SP (used to run on a Cray C90, I think). In Europe, the ECMWF model is run on a Fujitsu supercomputer, I think.

      Models for plasma dynamics and astrophysics are also run on these heavy-duty machines. I'm sure others have had some experience running other things, but I know that the NCEP IBM-SP gets a workout at least 2 times a day running at least four different weather models that have average runtimes around an hour each.

      -Jellisky
    • In the health insurance industry, which I happen to work in, large SMP or MPP machines are used for data warehousing and fraud and abuse detection. Machines ranging from 16 to 64 CPU's (generally UltraSPARC or IBM Power). When you are dealing with claims records for 5 or 10 million beneficiaries over a 5 or 10 year time span you need a lot of processing power and disk space. The data warehouses are used for trend analysis, fraud investigation and the like. Anyone with a background in statistics knows just how much number crunching we are talking about.

  • by dprice ( 74762 ) <daprice.pobox@com> on Monday September 09, 2002 @01:28PM (#4221739) Homepage
    It's not surprising that the SGI machine runs STREAM well. Back in the mid-1990's, John McCalpin, who worked for SGI at that time, was a regular contributor to comp.sys.super, and he would frequently brag about the superiority of SGI running STREAM. McCalpin is one of the primary advocates for STREAM. You can optimize a computer architecture to run a particular benchmark well. The question is whether the SGI machine runs a wider variety of real-world problems well.
  • This sure would run a select statement on a database of all of our info pretty damn fast. but, who would believe we'd ever adopt any kind of national id, you know, like drivers licenses, social security cards, membership cards at grocery stores, etc.
  • ...and it's called 64 CPUs.
    Perhaps they should update the song [zdnet.co.uk]
  • by Animats ( 122034 ) on Monday September 09, 2002 @01:43PM (#4221866) Homepage
    First of all, the OS doesn't matter for this benchmark. This is a memory-to-memory copying test.

    That said, it's an impressive result. And it's done in an unusual way. SGI has a 1.6GB/s channel running through routers [sgi.com] connecting the processors and memory. A computer is made up of multiple rackmount "bricks" connected by cables and routers. The "router" is a 2U rackmount device.

    Processors and memory reside in rackmount boxes with 4 CPUs and 8 GB (max) of local memory. These boxes interconnect through a single 1.6GB/s link per box, which, in a big system, goes through several layers of routers. So a memory access to another box is routed through what is essentially a fast LAN. All this is cached, of course.

    It's not clear to what extent application programs have to be aware of this. Clearly, if you lay things out in memory badly, with lots of CPUs reading and writing the same memory from all over the memory net, the system will bottleneck. (Everybody reading the same stuff is OK; it's cached. But writes have to propagate back to the home location of the data.)

    Since the whole monster crashes all at once, you don't want to build your web server farm this way. It's for applications that really need all that crunch power in one machine.

    • by foobar104 ( 206452 ) on Monday September 09, 2002 @01:59PM (#4222022) Journal
      It's not clear to what extent application programs have to be aware of this. Clearly, if you lay things out in memory badly, with lots of CPUs reading and writing the same memory from all over the memory net, the system will bottleneck.

      Speaking as somebody who's done his share of IRIX programming, I'd say "none at all."

      In some cases, on Origin 2000 hardware with older versions of IRIX, you could see notable performance differences if you went out of your way to place memory in banks adjacent to the running processors. But the Origin 3000 architecture, with its significant reductions in memory latency, and newer versions of IRIX, with their improved page replication algorithms, have made manual memory placement almost obsolete. Almost.

      SGI spent a lot of time and trouble trying to reduce the impact of accessing remote memory. The caching mechanisms and page replication stuff are really well thought-out.
    • Packet sniffing?

      arp who-has cpu53 tell cpu4
      arp who-has ram1G-2G tell cpu3
    • First of all, the OS doesn't matter for this benchmark. This is a memory-to-memory copying test.

      Even the relatively simple uniprocessor x86 architecture offers OS implementors numerous ways to kill performance (shameless plug: a benchmark example [www.enyo.de]). I would be suprised if SGI achieved this result without some tweaking.
  • by AtariDatacenter ( 31657 ) on Monday September 09, 2002 @01:46PM (#4221885)
    I think it is pretty interesting that the benchmark that they used measured memory throughput, as opposed to, say, an actual workload. In other words, this is a synthetic benchmark, versus a real-world benchmark. They say, "Look! We can do memory transfers really really fast!"

    Unfortunately, memory transfers are not the world when it comes to large multiprocessor boxes. The overhead comes in when you're trying to synchronize a large number of threads/CPUs to do a large task. For example, an Oracle database.

    Sun has proven that it scales up the tree very well with large numbers of processors. But from my understanding, Linux is more efficient with a low processor count, and less and less efficient with more processors.

    I question its ability to do anything with a real workload. And I've even more suspicious because they use a benchmark I've never heard of (STREAM TRIAD) to push its superiority on a single-aspect synthetic benchmark.

    Good. The machine looks like it has a decent memory bus, and memory modules with a good configuration and speed rating. Now, what can the machine actually do well that makes it a real winner?
    • by foobar104 ( 206452 ) on Monday September 09, 2002 @02:04PM (#4222065) Journal
      Good. The machine looks like it has a decent memory bus, and memory modules with a good configuration and speed rating.

      You know, before you piss in SGI's Cheerios, you might want to do a little reading. The Origin 3000 architecture, on which this prototype system was based, has no memory bus at all. It uses a fabric of switched multi-gigabyte-per-second interconnects to attach CPUs to RAM and to other CPU nodes.

      CPU benchmarks (like SPEC) are synthetic and irrelevant, because they fit in cache. Virtually no real application fits in cache, and the sort of applications you run on a machine this big deal with data sets no the order of tens or even hundreds of gigabytes. Memory-to-CPU bandwidth is probably the only real indicator of the ability of the system to handle real-world workloads.

      It's also the only thing-- other than the dimensions and the color of the plastics-- that differentiates SGI's big Itanium 2 server from everybody else's big Itanium 2 servers.
      • CPU benchmarks (like SPEC) are synthetic and irrelevant, because they fit in cache. Virtually no real application fits in cache, and the sort of applications you run on a machine this big deal with data sets no the order of tens or even hundreds of gigabytes. Memory-to-CPU bandwidth is probably the only real indicator of the ability of the system to handle real-world workloads.

        You say that synthetic benchmarks are irrelevant. Then you go on to say that this particular synthetic benchmark is highly relevant. It can't be both. I'd like to see this run a TPC variant, which is closer to real-world than it is synthetic.

        The Origin 3000 architecture, on which this prototype system was based, has no memory bus at all. It uses a fabric of switched multi-gigabyte-per-second interconnects to attach CPUs to RAM and to other CPU nodes.

        What, do I have to explicitly call out the components and subcomponents? It is a memory bus, for the purpose of this discussion.
        • You say that synthetic benchmarks are irrelevant. Then you go on to say that this particular synthetic benchmark is highly relevant.

          No, I don't. I really can't emphasize this enough: read. I said, "SPEC is synthetic and irrelvant." Big difference.

          I'd like to see this run a TPC variant, which is closer to real-world than it is synthetic.

          The TPC benchmarks are measurements of database performance. Since SGI was trying to demonstrate the features and capabilities of their hardware, it would have been completely inappropriate for them to use a database benchmark. STREAM TRIAD is great because it measures only one thing: the rate at which data can be moved from memory to the CPU or vice versa. The TPCs measure aggregate systems, including hardware, storage, OS, database software, and so on. They may be relevant if you're looking for a fast database server system, but they're hardly useful for evaluating one hardware architecture over another.

          What, do I have to explicitly call out the components and subcomponents? It is a memory bus, for the purpose of this discussion.

          The whole point of this discussion is that the SGI system can outperform virtually everything else on STREAM TRIAD because it has no memory bus. Memory busses are bottlenecks, and pumping a lot of data through them is very hard. The SGI system eliminates the bottleneck and thus demonstrates amazing bandwidth. When you miss the whole point of the discussion, I'm going to call you on it.
          • No, I don't. I really can't emphasize this enough: read. I said, "SPEC is synthetic and irrelvant." Big difference.

            No. You said... "CPU benchmarks (like SPEC) are synthetic and irrelevant, because they fit in cache." You also said "Memory-to-CPU bandwidth is probably the only real indicator of the ability of the system to handle real-world workloads." I'd call that even more synthetic and irrelevant than SPEC.

            The TPCs measure aggregate systems, including hardware, storage, OS, database software, and so on.

            No argument there. But I was saying that it was MORE relevant than SPEC. And extremely more relevant than that STREAM TRIAD test they're pushing.

            The whole point of this discussion is that the SGI system can outperform virtually everything else on STREAM TRIAD because it has no memory bus.

            Really? I don't recall reading that in the story introduction or SGI's Press Release. Only the link to the STREAM TRIAD itself pointed out that it was talking about memory bandwidth. IIn fact, that is what my original message was trying to point out.

            So they've got a machine that gets great ratings on this synthetic benchmark? Who cares. It doesn't mean much if you've bolted a kernel on top of it which isn't mature in a large CPU environment. (And other hardware issues, as you mentioned which the TPCs would bring into play.)
  • The Cray C90 came out like in 1990 or 1991, and this new fangled SGI box just barely beats it? wow!
  • by deal ( 145150 ) on Monday September 09, 2002 @02:01PM (#4222037)
    "Through its experience and expertise in high-performance computing, SGI will offer customers of the highest quality 64-bit operating environments."
    Well, Hmph! The rest of us low-life customers wouldn't want it anyways!

  • Intel must be pleased. If SGI could manage to sell one of these that would double the number of Itaniums that Intel has managed to flog.
  • The poster on this is wrong. An Origin 3800 has MIPS processors and runs Irix (although there was a "toy" Linux port to Origin 2000 machines that would be fairly easy to adapt to the 3000 series). This is "the upcoming Itanium 2 system from SGI" that the press release mentions (what the marketing department at SGI will ultimately come up with for a name, I have no clue). While they are similar systems (both use ccNUMA and similar in other ways that I can't go into here), they use different memory control ASICs.


    In any case, the poster made it sound like you can just plug Itanium 2's into an Origin 3000 and *bang* you've got a Linux system which is not correct.

A morsel of genuine history is a thing so rare as to be always valuable. -- Thomas Jefferson

Working...