Slashdot is powered by your submissions, so send in your scoop


Forgot your password?
Silicon Graphics

SGI And /Massive/ Linux Machine 72

Thanks to some of the folks from SGI for sending us some information about their latest project. Pretty interesting project -- the largest configuration has 10 PCI busses (busi?) with 24 scsi controllers and 10 disks. And wait'll you see the rest of the stats.

Hi all,

Just thought I would send out a note outlining the state of the mips64 port. Ralf, Ulf and I have been actively working past few months to bring up Linux on the SGI ccNUMA machines.

The executive summary: we have achieved multiuser boot on o200 and o2000s. The largest configuration is a 32p, 16node machine (only approx 4G worth of memory was populated over the 16 nodes, the system can take 4G * 16 node worth of memory). This machine has 10 PCI busses, with 24 scsi controllers and 10 disks. (Sample output is at


If you are interested in the system architecture and details of the port, read on. The o2000s use R10000 series of MIPS processors. Each machine is comprised of modules, each module has 4 node boards with max 2 cpus and 4G memory on each node, and IO boards and routers. In a module, the two alternate node boards are each connected to a XBOW. Each XBOW possibly is connected on the other side to a number of PCI busses, which is what the IO boards connect to. Apart from this, there are routers in the system that provide connection paths between all memory to all cpus, to create a true CC-NUMA architecture.

On the software side, we are still struggling with compiler and binutils issues. The kernel itself is 64 bits, created by cross compiling on an ia32 box. We have not attempted 64 bit user program compilation or execution. The root disk is currently very close to the MIPS/Indy root disks. The architecture specific code uses the CONFIG_DISCONTIGMEM code to support memory on all nodes. The architecture specific NUMA features currently are: 1. replicate the kernel text on all nodes, so that no one node becomes a memory hot spot (unfortunately, the kernel data has to reside on only one node). 2. replicate low level excpetion handler code on all nodes. The architecture code also turns on CONFIG_NUMA to take advantage of node-local page allocations. (A CONFIG_NUMA patch that I have been submitting to Linus was put into the kernel in test6-pre1). For more information on NUMA and ongoing work, refer to

this document

The purpose of doing this port is to boot Linux on bigger systems that we have, in order to do cpu/memory scalability studies. This also lets us do NUMA performance work in the future. Another advantage is to be able to leverage this work on the upcoming SGI CC-NUMA Itanium boxes, which will be an SGI supported product. Initial results from scalability studies using mips64 is documented at

The OSS SGI site.


This discussion has been archived. No new comments can be posted.

SGI And /Massive/ Linux Machine

Comments Filter:
  • While Linux has a way to go, the Linux project is getting help from all comers and all corners... including many corporations with extensive knowledge of developing OSes and related elements (not least of all SGI to whom the Linux-using public at large owes a debt of gratitude) and this should allow it to come up to the level of commercial OSes more rapidly than those commercial OSes were developed themselves. Look at it this way... it's now like IBM and SGI are co-developing a capable OS that can be run on many platforms, with a third party moderating so (hopefully) no nonsense occurs like the OS/2 - Windows dustup.

  • by Anonymous Coward
    The perception that SGI got anything of value from Cray for the O200/O2000 is a myth perpetuated by SGI marketing (which I'm sure they regret now). The design was finished long before Cray was acquired. Naming the link CrayLink was a marketing gimmick that _really_ pissed off the engineers at SGI who designed it.

    Even better, the engineers who did the due diligence on the Cray acquisition said "don't do it". McCracken was obsessed though and the disaster happened.

  • They are calling it the "NUMA Link" now. Maybe those engineers will be just a little bit happier..

  • change Linux to Any OS in your above statement and it will hold true. see solaris - dog slow on 4 CPUs. the way you write the kernel becomes inefficient with a barrier at around 4 CPUs if you want low end performance. high end performance is completely opposite to it. its a tradeoff - no OS can go around it. we might have a high end kernel or a low end kernel VM option in linux soon. recompiling it plugs in the VM you want.

  • Huh? This system can do genetic pattern matching, but it's far less cost effective than a pile of small machines. Fortunately, the people who actually spend millions of dollars on machines to solve problems like gene matching investigate the problem more carefully than your friend.

    Two companies doing this problem are Celera Genomics and Incyte. Incyte has a cluster of 1,200 x86 machines (3,000 cpus) running Linux. Celera Genomics has a cluster of 1000 Alpha cpus in 250 nodes; Celera purchased their machines before it had been shown that Linux could handle that kind of task.

    And a company that specializes in getting fast storage for the movie industry is MountainGate.
    I'm not so sure that even the rendering example is really valid. Much rendering treats rendering as an embarrassingly parallel problem: invidual frames slow, entire movie fast. That's much more cost-effective.

  • Sadly i misread it the same way :)
    But it takes one to know one :)
  • Right on.
    It looks like the major actors finally realize that developing and using systems for the future has a lot in common with clearing a field of land mines. There are too many gotcha's that bite in places you didn't know you had places. While I cannot imagine IBM and whatever has replaced the seven dwarves totally embracing open source, I think that they realize that it is suicidal not to utilize the advantages of open source in the core basic areas where it has an extreme advantage. I have nothing to back it up, but I think the interest of the big boys comes from the anecdotal tales of long uptime, not the well configured and it does its job, but the ill-configured, reconfigured, abused system that managed to stay up when they would be excused for dying from being hacked to death.
  • by mikeee ( 137160 )
    Are you sure the E10K isn't UMA? That's what I've always seen documented... Ironically, it's actually a Cray-designed system that SGI sold to Sun before spinning off Cray. Sun probably makes more money selling E10Ks than SGI makes in total.

    According to rumor, Sun's next-generation hardware is ccNUMA, though.
  • You have an absolute top of the line Indigo then. The majority of them on the market are R3000 33 MHz or R4000 100 MHz. These are slow, especially the R3k. There's still a bit of life in the R4k boxes, but the R3ks are truly dead.

    Don't waste $500 on a x86 chip. Just get a REAL COMPUTER for the same price and see!

    Amen, brother.

  • Pretty poor troll there, friend. Linux != x86. How many times must I tell you this? Linux is a nice OS. Not perfect, just nice. x86 is a shitty architecture. Not merely bad, shitty. There are many things that peecees cannot do and will never do. There are a few things that Linux cannot do; perhaps it will do them in the future. *thwap*clue stick*
  • Dood, Im gonna play pong with it :)

    -Rick S

    "Charlie Don't Surf!"
    -Lt Colonel Kilgore
  • I second that. "CrayLink" was just marketing. They wanted to leverage the Cray name on SGI's highest-end systems to win sales at the national supercomputer centers. Cray was not involved with o2000 engineering. I was at SGI/Cray at the time too, in Minnesota.
  • you are probably right. Though maybe some of them read slash. Maybe they know now. :) Maybe not.

  • AC wrote:
    linux linuces unix unices
    linucis linucum unicis unicum
    linuci linucibus unici unicibus
    linucem linuces unicem unices
    linuce linucibus unice unicibus

    But Latin has 6 cases!

    Didn't you forget the Vocativ?


  • if I remember my latin correctly,.. Vocativ is hardly ever used, and if it is used it is only to address people. Flame me if I'm wrong...
  • Don't use 'buses.' It means kisses.

    Unless you're talking about this. []

  • You can put the monitors on peecees. Besides, in many ways those machines are much faster than any peecee. Well, the Indy anyway, Indigos are hopelessly old. Never throw away workstations; they retain value for at least 10 years after manufacture. Peecees may in the "10 bucks and you have to haul it away" category after 2 or 3 years, but real computers are useful and worth money for many years.
  • by Chuck Chunder ( 21021 ) on Sunday August 06, 2000 @06:39AM (#876560) Homepage Journal
    If you look at the linked info you will see that:
    a) There are in fact 14 scsi devices attached. (13 drives and a cdrom).
    b) Even so only 4 of the 24 scsi hosts are actually used (So 20 scsi hosts are being 'wasted', not 10).

    Your initial question ('There isnt anything special about 10 drives, so why have 24 scsi buses?') was backwards. They are developing on a big-arse piece of machinery here. The point here isn't making efficient use of 14 scsi devices, it's showing that Linux can run and access 24 scsi buses. Your question should probably have been 'If they want to really show that you can use 24 scsi hosts shouldn't they have a shitload more drives'. Quite possibly for a proper demonstration, but for a dev box then scattering a few drives over a few hosts is probably satisfactory.
  • the plural of bus is buses.
    • Um, their Cray division did alot of the work for the O2000! In fact at release the >64CPU configs were only avail from cray. Oh, and the frame to frame comm channel? It's named the "CrayLink".

    Uh, no. The O2000 was ready to go when the Cray thing (merger/mistake) happened. It pissed a LARGE number of hard working SN0/LEGO engineers off that the Origin interconnects were called "CrayLink" when Cray had zero involvement (at least at that point). I left soon after, so I dunno what Cray ever did ultimately for SN0 or SN1


  • I agree with the remark about most places do slow frames in a massively parallel system. We made the same choice. 40 dual 600mhz Linux boxes.
    But... we also had lighting tests and renders for marketing that needed to done quicker than a nightly turn arround. That is when we used the O2000. We turned this machine into our file server, so now we can only dream of when we could render single frames faster.
    In a perfect world, both systems would exist. A bunch of Linux boxes for 24x7 renders, a massively parallel box for large or quick turn arround single frames that could be used for "normal" renders 24x7.
  • Only 10 disks? What are they doing with the other 22 Controllers?
  • Amen. I have a hand-me-down Indy that runs like it was just pull from the crate. You just have to diligently patch that SHITTY OS. Absolutely no reason to throw it away. Makes a great workstation. Now if I can get a port of Linux for it -- great.
  • by enneff ( 135842 ) on Sunday August 06, 2000 @03:51AM (#876566) Homepage
    ...give this guy a fat ip pipe and a gnutella node! This machine has 10 PCI busses, with 24 scsi controllers and 10 disks. !!!!!


  • by GrEp ( 89884 ) <crb002@gmai l . com> on Sunday August 06, 2000 @03:52AM (#876567) Homepage Journal
    You can see why they ditched Crey Supercomputers. They noticed that busniness want cheap processing power, and they don't care how to get it. If you want economical, "Beowulf" clusters are the way to go now a days.

    I am supriesed it has taken them this long to get some deals like this out the door.
  • by DarkClown ( 7673 ) on Sunday August 06, 2000 @03:52AM (#876568) Homepage
    SGI really does seem to be going after linux. I recently took an rhce test in Dallas and out of 13 people in the class, 11 were from sgi. Kind of a trip.
  • ... from autobi
  • What the hell does this have to do with Beowulf?

    This is about running Linux on BIG machines which can be called small supercomputers in and of themselves.
  • I'm not trying to undermine the efforts of these guys, because I'm sure what they're doing is valid and is actually quite interesting in itself, but I'm having problems trying to see the commercial benefit of doing this. Above we are told that the purpose of this project is to "boot Linux on bigger systems", but that doesn't really same like a viable piece of research in many ways.

    Commercially, if I want lots of nodes (16 nodes here), with Linux, I'm more likely to think Beowulf. If I want them to all appear as one machine, to be honest if I'm spending this sort of money I can see the benefits on going with Sun and Solaris. If I want lots of virtual linux machines running on one large easily-managable system then we already have Linux on S/390..

    Can anybody tell me what the real commercial incentive is to run Linux on bigger systems? I'm just curious that's all. Perhaps I'm missing something here (almost certainly I'm sure). :-)
  • There isnt anything special about 10 drives, so why have 24 scsi buses?

    I cant see why you would need moe than 3 scsi buses to run 10 drives, that leaves 21 free.

    It would be nice to see how fast they could get a software raid0 array going.
  • by tolldog ( 1571 ) on Sunday August 06, 2000 @04:25AM (#876573) Homepage Journal
    NUMA != Beowulf

    Numa is not even close to how beowulf works. NUMA allows the procs to actualy work together with shared memory instead of a near shared memory that beowulf provides.

    Trust me, the cray link is much more efficient than what a fiber connection bewteen beowulf boxes would be, even if you went all out and did some sort of cube configuration.

    If beowulf was better, people wouldn't be shelling the money for the SGI boxes when they need the horse power, they would have some "wulf" farm working on the problem.
  • "... the largest configuration has 10 PCI busses (busi?) "

    OED [] 2nd edition gives "busses" and "buses" with the former appearing in older citations.

  • by stuce ( 81089 ) on Sunday August 06, 2000 @04:25AM (#876575)
    SGI has already commited to producing huge
    NUMA servers based off the Itanium processors.
    Porting IRIX to this new architecture will be
    a huge undertaking as it has been tied to the
    MIPS architecture forever. Linux on the other
    hand ports quite easily. SGI is doing research
    as to what it would take to get Linux to
    run well on massive boxes like these.

    If linux can cut the mustard there will be
    no need to port IRIX and that will save SGI
    one huge headache.

  • by Fluffy the Cat ( 29157 ) on Sunday August 06, 2000 @04:02AM (#876576) Homepage
    For SGI, the incentive is pretty obvious. At the moment they produce machines with massive numbers of processors (we've a 256 node SGI here) and need an operating system to run on them. IRIX is massively better than Linux for this sort of thing at the moment, but using IRIX means that they have to deal with everything else associated with programming an OS rather than just the bits they're good at. By improving Linux sufficiently so that it has the same sort of level of performance as IRIX on massively parallel machines they can drop IRIX development and let someone else deal with most Linux bugs, saving themselves rather a lot of time and money in the process.
  • This is a great machine for rendering or any other application that is both CPU and memory bound.

    Some jobs do not parrallel well, such as individual frame rendering. With 24 boxes, the 5 + minute overhead of loading the scene file plus the memory spent on loading the textures and the geometry would be done on each machine, costing you 24x's the overhead of doing it on one machine. Trying to do this with a "quasi" shared memory system would kill the network. But would remove that hidious overhead.
    Doing this on a NUMA box fixes all of those problems. The memory is shared. The procs all look like one machine. The system runs smooth and well.

    This is why SGI is still in the large graphics server environment. People want individual frames done fast.

    The benifit of this being a linux box and not Irix....
    I, a huge linux vs. irix advocate, strugle to see why this would be good. Most of the apps that I would use are built for Irix first and then Linux (like Maya's renderer). I can see where others might have custom apps to use this, but the code would probably port to Irix just as easily as it would to Linux on the MIPS.

    It is a step in the right direction, IA64 NUMA boxes running linux. The ultimate in render farm machines.
  • You can play Quake 3 on it!
  • by pointwood ( 14018 ) <(jramskov) (at) (> on Sunday August 06, 2000 @04:46AM (#876579) Homepage
    > Discovered 32 cpus on 16 nodes

    Why does my kernel not discover something like that? ;-)
  • That's more than there are in the whole city of edinburgh. I think they even deliberitly put less of them on when it rains.

    I want one :( an SGI - not a bus
  • Point taken, I didn't mind to read the article until after the post. In actual prduction you know that they are going to cluster these babys anyway,
  • I misread this the first time. I read it as:
    ...give this guy a fat pipe and a gnutella node.
    Guess we know where my mind is right now.
  • Sortof cute, in a ed(1) sort of way, to see that sort of machine being UUCP capable.
  • It's cool /. still using the old sgi logo. The new one doesn't compare.
  • They didn't "ditch" them per se... They bought them so they could gain access to the stuff they wanted and incorporate that into their machines and then resold them once there was nothing to gain... Supercomputers aren't a dying business, they're just not SGI's target market right now... And they really need to focus on having a few profitable lines rather than many loss creating lines... Just as Apple did.
  • Linux runs quite well on Indy. My Distro [] is available, and there is Debian port [] in the works. See also The SGI site [] and The Unofficial site []. There's even X available now for some configurations. This port isn't production-ready but it's certainly ok for casual use.
  • Vocative is identicaly to accusative for
    most declination forms.

    The only difference is the us-declination where
    the ending is replaced by an o.

    Thus nominativ "Marcus" becomes "Marco" or
    even "Marc".

    Vocativ means "calling form", it is only
    used for names and titles in direct speach
    to the called person.

  • I disagree. I've read some stuff about the O3000 last week and it's maybe a nice box from the technical point of view, but as a ISP I wouldn't buy one.

    • You can get COD (capacity on demand) for most Sun servers now, not only the E10k. I haven't found something similar at SGIs website. They seem to have the more flexible hardware, but if your demands grow, you have to order bricks, wait for delivery, shut down and install them. With Sun, you have the hardware right there and only need an additional licence. IMHO Sun has the better solution here.
    • Repartitioning works with our E10k nice and dynamic. No downtimes. I thing this is a important feature, if you have to run 24/7.

    I haven't ever worked with SGI boxes and this is just a quick shot. I think, the SGI boxes are nice, but not tailored to our demands. Maybe they are nice for universities or somewhere else, where availability isn't that important.

  • Nice machines though, even if a bit long in the tooth (the O2000 is fourish years old, the O3000's should be announced anytime now, go look at comp.arch)
    It happened roughly a week ago, during SIGGRAPH 2000. The Origin 3000 is a beaty. See here [] for details.
  • Uhm, no.

    They're just proud as hell that they've actually booted a o2000 with Linux... All those SCSI buses are probably just there because they were in the machine when they got it :)

  • by battjt ( 9342 )
    Wanting only one OS would be like wanting only one tool in the tool box. Use the right tool for the job.

    From a less abstract perspective, I would rather want just NT than just one of any Unix OS. NT has more apps and is a generally a better fit for that lowest common denominator spot. Of course, I don't like using sporks, I prefer a spoon and a fork.

    Then again, to finish out my wishy washy opinion this morning, it might be best for SGI to get out of the OS business. If SGI, IBM, and Compaq get out of the OS business (transitioning to Linux or some other common code base), then they might be able to leverage each other and focus on less redundant tasks.

    need coffee....
  • Think about it. The plural of fungus is fungi. You take the 'us' and replace it with an 'i'. So therefore, wouldn't the plural of bus, be bi?

    Just a comment. I can't think of any other things that end with '' so I can't verify this.


    Daniel Zeaiter
    ICQ: 16889511

  • Ok, but then at least boxen ok! (?)

    It's in two of my dictionaries at least:

    From The Free On-line Dictionary of Computing (07Oct99) [foldoc]:


    /bok'sn/ (By analogy with {VAXen}) A fanciful plural of {box} often encountered in the phrase "Unix boxen", used to describe commodity {Unix} hardware. The connotation is that any two Unix boxen are interchangeable.


    From Jargon File (4.0.0/24 July 1996) [jargon]:

    boxen /bok'sn/ /pl.n./ [by analogy with {VAXen}]
    Fanciful plural of {box} often encountered in the phrase `Unix boxen', used to describe commodity {{Unix}} hardware. The connotation is that any two Unix boxen are interchangeable.

  • Hippopotamus -> Hippopotami . . .
  • Why have 24 SCSI controllers for only 10 disks?
  • I do think Boxen pronounce better than Boxes (Boxeseses), the typo is intentional. However Virii just sound awful.

  • The benifit of this being a linux box and not Irix.... I, a huge linux vs. irix advocate, strugle to see why this would be good. Most of the apps that I would use are built for Irix first and then Linux (like Maya's renderer). I can see where others might have custom apps to use this, but the code would probably port to Irix just as easily as it would to Linux on the MIPS.

    If they can get Linux to run on most of their machines, then they will get access to all (or most of) the Linux stuff and they don't have to maintain the kernel for themselves (which I believe is not a cheap thing to do)
    Futhermore wouldn't it be nice to have an "IT-infrastructure" with only one OS to support - no Irix, AIX, Windows, etc. to support - only Linux.

    If this is completely wrong, then it is most likely because I don't know much about system administration or Linux/Unix generally :-)
  • because this is just a development system where they are trying to stretch/test the capabilities of the OS and not a production system that needs to be useful/cost-effective?

    Some people are way too quick to criticise (or if that wasn't criticism, some people are way too quick to ask stupid questions...).
  • Actually SGI just scavenged from cray most of what it considers useful technology. The beauty of the Origin boxes is that they play both sides. They can do the graphics heavy lifting for movie producers or they can do number crunching for accountants.

    Either way this is a lot of horsepower to throw at any problem. Beowulf is cheaper for the graphics work however. Nothing else can approach the economies of scale piled up behind Linux on midrange PCs.
  • I mostly agree with all of your above points. I hesitate only because Linux has a away to go to match Irix (or some other *nix OS) in some instances. SGI boxes are hanging arround as servers for a reason.
    I would love to have one server OS and tune the platform to the job. But... it will be a few years before places are doing this. This port is a step in the right direction.
  • by NumberCruncher ( 19377 ) on Sunday August 06, 2000 @05:11AM (#876601) Homepage
    The commercial benefit is several-fold:

    • Large memory/IO capable systems running a standard OS, with standard tools, and a well known ABI/API
    • Highly scalable and reconfigurable modular computing. If you need more power, add more C-Bricks. If you need more IO, add more P or X bricks.
    • Large application base: Linux has captured mindshare of devolopers. Applications are being ported at a furious rate. It is becoming the dominant platform for software development (over Solaris and other similar Unices).

    There are many other reasons as well, but frankly this type of machine is what many people have been waiting for. The total cost of ownership of all those Sun machines is far larger than of this machine. The performance of this machine is significantly ahead of your typical Sun machine.

    One of the nicest features of this machine is that you can reconfigure it with a reboot (no recabling) to come up as a single large machine or as multiple medium machines, or many single machines. You can configure the computer to your needs, not shoehorn the problem to fit within a solaris boxes limitations. And unlike on other OSes, the partitioning actually works here.

  • Can anybody tell me what the real commercial incentive is to run Linux on bigger systems? I'm just curious that's all. Perhaps I'm missing something here (almost certainly I'm sure).

    It's a research project. Maybe there's a commercial incentive to run Linux on bigger systems. Maybe there isn't. You have to try and find out. And to try, you first have to get Linux to run on bigger systems. That's what they've done - running Linux on a small bigger system. (And that's all the information the article gives - they've booted in on a 32 CPU machine, it didn't say anything about performance)

    -- Abigail

  • by joshv ( 13017 ) on Sunday August 06, 2000 @05:15AM (#876603)
    I am glad to see some work being done on Linux to add real support for truly massived parallel systems. It has always been said that Linux does not scale well past a few processors (perhaps 4 at most) because modifying Linux to support systems with larger processor counts would hurt performance on low end hardware. Additionally one can assume that the kernel developers in generally don't have access to such massively parallel architectures.

    This little project holefully will prove that it can be done, and one might hope it's results will be applicable to less exotic multiprocessor hardware (say an 8 or 16 way x86 server).


  • Hey, next time you do something like that, email me and i'll pay the shipping :)

    I'll always love those old Indy's...
  • by stripes ( 3681 ) on Sunday August 06, 2000 @08:48AM (#876605) Homepage Journal
    You can see why they ditched Crey Supercomputers.

    Um, their Cray division did alot of the work for the O2000! In fact at release the >64CPU configs were only avail from cray. Oh, and the frame to frame comm channel? It's named the "CrayLink".

    Nice machines though, even if a bit long in the tooth (the O2000 is fourish years old, the O3000's should be announced anytime now, go look at comp.arch)

    If you want economical, "Beowulf" clusters are the way to go now a days.

    Sure, if you need very little communication between machines Beowulf is great, and the O2000's expensave comms (the xbow and craylink) are waisted. If you need a lot of comm, but not a lot of com a O2000 is great. If you need a lot-lot of comm maybe you are out of luck until the O3000, HP SuperDome, or IBM Power5 show up.

    Quick MP break down:

    • NORMA - NO Remote Memory Access - Beowulf is a NORMA, to get at memory on other systems you need to make OS calls, or at least use really expensave mmap'ed (NFS) files (i.e. non local memory costs 1000x more then local memory to access).
    • NUMA - Non-Uniform Memory Access - remote memory costs maybe 10x more then local memory access, and caching has to be handled specally (i.e. each system has to know when to flush it's cache "magically"). Very few examples, some IBM research machines do this.
    • ccNUMA - Cache Coherent Non-Uniform Memory Access. Remote memory access costs maybe 10x local, but the cache's work. Most large multip-processor machines work this way. The O2000, the Sun E10000. The E10000 has much less then a 10x penality for remote memory, but local memory costs more then the O2000's local accesses. On both the OS can move pages from CPU board to CPU board depending on access. The E10000 comes closest to giving the impresion that it is a UMA machine (and the O2000 isn't bad at it)
    • UMA - Uniform Memory Access. There is no remote memory, or no penality for accessing it. A tipical multiprocessor PC works this way. Tipically easy to build for small numbers of CPUs, incresingly impossable for larger numbers (or useless - you could make a UMA for 1024 CPUs by having extra shitty memory access times, but there is no known way to make one with good access times, even if you had a literally unlimited budget!).

      A NORMA (or better) is great for raytracing, crypto cracking, and the like. A UMA is great for N-body simulation (with large N). I wouldn't want to track the flow of air molicules over a wing with a Beowulf, but I wouldn't want to pay for a ccNUMA if I was "just" running PR-Renderman.

  • by LL ( 20038 )
    DrWiggy [] wrote Can anybody tell me what the real commercial incentive is to run Linux on bigger systems? I'm just curious that's all. Perhaps I'm missing something here (almost certainly I'm sure). :-)

    What are the inherent limitations of the information infrastructure? Certainly not hardware (vendors are keen to flog as many boxes as possible) or software (internet distribution costs are pretty much zero). The inherent bottlenecks are development time and trained staff. Linux attracts a pool of talent or at least the opportunity to learn the guts of a system without paying a hideous cost in acquiring system tools and development environments (cough*Microsoft*cough). Distribution boils down to boring physical aspects, you don't put datacentres and six-9 reliable systems in unstable locations (e.g. earhquake zones) or where there is a limit in the technology level. Given inherent limits in manpower you will see a move towards consolidation once the complexity passes a critical point as the cost of training exceeds the value for a small site. Currently web sites are still pretty primitive but once the next generation of tools are refined (e.g. Zope, ACS, etc) expect to see more critical mass building up. It's not so much the CPU but the storage as a couple of petabytes tends to be rather difficult to shift :-).

    Given that latency is a big problem and getting bigger with faster CPU clock speed (e.g. 1 disk read = x megainstructions), it then makes sense for CPUs to be sited really really close and from a management/administration viewpont, a clump of systems is easier to manage than a zillion boxes (again issues to do with backups and system security). Why do we have petrol stations and gas farms (at ports/airports/etc)? Because you put the resources near where they will be used. Again, if you use the motoring analogy, Cisco builds the highways, SGI builds the gas farms, Microsoft builds the tollroads, etc ... The skills required to create highly scalable *systems* is scarce which is why even PCs have yet to go beyond dual processors, and outside niches you might as well forget about parallel software that can use the horsepower.

    So SGI is taking a strategic look at their markets and IMHO rebranding themselves as the BMW of the internet world focusing on industrial-strength trucks. The trends they wish to ride on are commodity OS (large pool of development to overcome constraints of talent) plus increasing outsourcing of specialist internet services (consolidated servers to achieve economy of scale - see their O3000 brick concept to understand their direction). In order to play you have to pay (in terms of a learning curve).


  • Maybe your kernel just isn't adventurous enough? Get it a subscription to National Geographic.
  • Beowolf clusters are great for cheap fast mutilnode processing. However, The Big SGI servers blow Beowolf clusters away in the bandwith catagory, even within the node itself. For example, the new Origin 3000 series can hold up to 512 R14000 MIPS processors and 1 TB of RAM. With 5.6 Meg/sec between RAM on different nodes. The total system badwith is up to 716MB/sec. Eventhough the speed between nodes in a Beowolf cluster can be up to 1 Gig a sec over the Gigabit Ethernet, the speed between RAM is dog slow. Because Beowolf is tied to, for the most part, inferior Intel technology the bandwith between RAM is very limited. The other big difference is in IO bandwith. The Big SGI boxes blow almost everything else away when it comes to IO bandwith. With SGI's port of XFS to Linux you can bet that you will get much better disk IO on an SGI box than Intel. With all of this being said, I do believe that the effort of porting Linux to this archetecture is useless. Why would someone spend all that money on a "supercomputer" like the o3000 only to put Linux on it. You lose so much performance it just wouldn't be cost effective. SGI has to realize that they have a great operating system in IRIX and a great hardware archetecture. They have thier niche, "supercomputing" and graphic rendering, and for the forseeable future nobody can take that away from them. Right now the people buying their products are not buying them because they are the best for the money, but because they are the best at what they do. They should stick with it and make small advances. SGI working on linux for the desktop is great. I cant wait to start using XFS. However, they will die if they start trying to change their bread and butter products to cheep and inefficiant.
  • I don't completely agree with that.

    Beowulf is not good for rendering. Each job can have up to 500-700 megs of memory being used. Share this over a 100bT or Fibre or some other network protocol. It won't work.

    We use other approaches for rendering. We spread the shot over a machine, not the frame. We eat the overhead of starting the renderer and reading the file. If possible, for those users who need one frame done fast, we threw it on our 4 proc O2000. That machine was taken from me, so now they just have to wait 4x's longer.

    Beowulf has its uses. Production rendering is not really one of them.
  • Doing this on a NUMA box fixes all of those problems. The memory is shared.

    Agreed. The memory bandwidth makes a big difference in being able to handle fine-grained parallelism. (Beowolf, OTOH, is better suited for coarse-grained parallelism.)

    Using a system such as this DRAMATICALLY reduces the delay required to propagate interim results to other processors. (Memory is so much faster than disk, and, if memory serves me, shared memory greatly improves on performance-robbing cache synchronization.)

    Another perfect application for this kind of system would be investigating the Human Genome. It's pattern matching on a huge scale. And, from discussions with a friend who decides on and purcahses the hardware for just such a company, his challenge is getting enough storage, and being able to get to it FAST. This looks pretty fast to me! :)

  • Nah. See if we can get them to run a simple little Java program on Netscape! <grin>

    cf: Java Security Hole Makes Netscape Into Web Server []

  • The only testing i can image is being done on the unused scsi buses is hardware detection.

    How can you test the capabilities of a scsi bus if nothing is plugged into it, they may as well leave them in their boxes on the shelf for all the testing it will do.

    If it makes any difference i asked purely out of wondering, I would wonder the same thing if there were 24 ide buses and 10 dirves. But if you must know i do consider scsi to be something that only has merit in a few situations, scsi is mostly used by stupid people looking for an expensive toy.

  • You can see why they ditched Crey Supercomputers.

    Not exactly. They found a way with NUMA to leverage Cray's technology... to get high-performance; they now just link a bunch of "Cray-ons" together! <grin>

Last yeer I kudn't spel Engineer. Now I are won.