Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Linux Software

Building a Linux Cluster from the Ground Up? 17

dooling asks: "How would one go about building a Linux cluster from the ground up? I read a lot about Linux clusters on /. and have been able to find some information on configuring the cluster, but have found little on how to assemble the hardware, i.e., what is necessary, how they should be connected, etc. So does anyone have reliable information on hardware assembly and configuration? Also, (if you've never done this before) is it worth building your own, or is it better to just buy one prebuilt and preconfigured? If you want specifics: 20-40 machines, Linux (probably RedHat 6.x), disk or diskless?, do not need video cards (but should we have them?), switch or hub (best way to hook them up). We will be doing pretty straightforward scientific computing (floating point number crunching). "
This discussion has been archived. No new comments can be posted.

Building a Linux Cluster from the Ground Up?

Comments Filter:
  • My Question is would it be possible to build a General Purpose Beowoulf computer. That runs apps like Netscape,Word Perfect and Gimp really fast
    beowulf would not increase the speed of any of these apps since they are not coded with a multi-processor platform in mind.
    -xyster
  • I would suggest putting floppy drives in all of these if you are short on cash. this way you can save some money on expensive nic ROMs. If you do have the money look at getting nic cards with ROM chips on them so you can boot them all using bootp. (from what i last say, each ROM is ~40)

    the first cluster i set up didn't have harddrives and i nfs mounted the filesystems across the network. worked pretty well but it was a pain to get working.

  • I'd suggest the following:

    1. Custom-built computers, as you don't need graphics cards. I wouldn't bother with putting them in - you're better off spending the money on more memory, a faster processor, or a faster hub.
    2. HUB. Definitely. They're faster for this than routers, and are a good choice.
    3. The NEWEST versions of MPI and PVM. Shop around, there are many implementations of each and some are faster than others. Don't assume that Debian's is the latest OR the greatest. It's probably the most generic, or the first that OKed them including it.
    4. Optimise your software. Compile everything to be fast, small, and do exactly what you want. That includes all the network code, the kernel, etc.
    5. For maths stuff, go for a CPU that's good at the maths you're going to do. The raw MHZ rate is no use to you. If you're doing FPU stuff, what's the FLOPS rating? THAT is what matters.
    6. Go for the fastest networking cards your software will drive -usefully-. eg: There -are- Gigabit ethernet cards around, but as you can't drive them at that speed, on a PC, there's not much point.
    7. Streamline hardware AND software. Minimise distances of cable. (Yes, that DOES make a difference.) Don't install =anything= you don't absolutely need.
    8. If you're using Pentiums or better, install PGCC and compile everything to work with the best optimisation you can get away with.
    9. If you've got the space, the guts, and the technical know-how, don't be afraid of supercooling and overclocking. Again, it DOES make a difference.
  • A friend of mine put together a cluster in our high school. He did things a bit differently

    1. Custom-build
    Definitely the way to go. You can get nice machines for a few hundrew bucks each. But put cheap video cards in -- it makes maintenance much easier. And some MB's may not boot without them. Spare ISA ones that are lying around should do the trick -- you'll never be taking them out of text mode. The machines we got had cheapie vidcards on the MB, which was fine.

    2. HUB
    Switched all the way.

    3. Versions
    If you want the latest ver, use the ones from Debian potato (unstable) :)

    5. CPU
    I recommend Celeron 450a's (300a's OC'd to 450MHz). You'll need a nice motherboard that will let you set core CPU voltages (or some Celerons may not OC, which happened to us). But there are some relatively inexpensive dual motherboards that let you set core voltage, and Celerons and slockets are still pretty cheap (our machines were before the slockets came out, so they're single CPUs).
    Celerons are far faster than equivalently clocked K6's, so go with Intel unless you want to spring for K7's (now that would be slick!).

    6. NICs
    I'm not sure what the gigabit advantage would be -- probably depends on what you're crunching. Obviously if computation time is high relative to data quantity, you're fine. But gigabit equipment is expensive. Consider ATM equipment -- fast and cheap. ATM switches are way cheaper afaik, and there are a couple of ATM boards supported by Linux. It's ideal for this kind of application since only the head node (which would then need another NIC) needs to talk to the outside world.

    8. PGCC
    I've seen no indication that
    a) Pentium-optimized code is particularly better (and I suspect its stability ...)
    b) It's faster at all on PPro-based chips. Optimizing for PPro and Pentium are two very different things. I'd just go with a standard Linux distro -- it'll make your life easier.
    It's irrelevant anyhow, since your code will make all the difference. It might be worth playing with different compilers to see what makes your stuff go fastest. Post the results for the rest of us!

    9. Overclocking
    Any moron can OC a Celeron 300a to 450MHz with a decent motherboard. Beyond that takes guts and skill -- and may not be worth it, since a decently sized cluster (ours was 16 machines) will start to show some variance in chips -- as we found out. On the upside, the load balancing software should be able to compensate just fine if you have a few dud nodes. Many Beowulf clusters are heterogeneous.
  • I recommend a Cisco Catalyst 100Mbit switch

    While the Cisco products are damned good, they are fairly expensive. There are other 100Mbit switches out there that will work fine that are considerably cheaper.

    Run fastether (preferrably 3com) cards in EVERY node

    While your advice to use 100MBit NICs is correct, you might want to rethink your suggestion of brand. Look at what the people who are building big Beowulf clusters are using. Its usually Tulip chipset based cards. I don't know if more recent models are better or if the drivers have gotten better lately, but not that long ago the general consensus seemed to be that 3Com 100Mbit cards were reliable but disappointing performers. At any rate, you can get a Tulip (or PNIC) based card like the D-Link DFE-500TX, the Bay Networks NetGear FA310TX, or the LinkSys (forget the model number -- they make two 10/100 cards, one of which is a semi-NE2000 clone and should be avoided, the other uses a PNIC2 and is the one to get) for under $50, whereas the 3C905 series cards are generally $70+.

  • 2.HUB. Definitely. They're faster for this than routers, and are a good choice.

    Hubs allow packet collisions, which will KILL a beowulf. Use a switch. Switches dont -always- nuke collisions, but at least more than 1 node can transmit at a time. Hubs are totally shared, so if you start getting more than 1 computer trying to transmit at once, it'll go ackbewm.

    I recommend a Cisco Catalyst 100Mbit switch. Run fastether (preferrably 3com) cards in EVERY node.` Channel bond if necessary.

    Having come from a 450 resident dorm with a counted 600 active computers on a flatnet (totally shared, no switches or routers), I can tell you hubs are -slow- for more than 3 or 4 systems.

  • Have you worked with the 2.2.12 kernel? I had no problem (that patience wouldn't solve) getting the clients to boot NFS with Slack 4.0 but have been unsuccessful under Slack 7.0

    I know it is not in the server setup, I can still connect with the old bootdsks, but I get a failed root mount with the Slack 7.0 bootdsks.

    I upgraded because MOSIX will run under Slack 7.0 with very little alterations in paths.
  • If you are looking at using your existing apps, a traditional Beowulf setup will not help.

    Try using MOSIX, it can spread the load between the various machines even if the software is not written for it. If you can convince it to run, it is very picky about kernel versions, file positions, etc.
  • I noticed that everyone thinks "Bewolf" when someone mentions clusters. How aboout other forms of clustering?
    What about Mosix, for instance? Does it work as well as advertised? It sounds like a very cool thing, doesn't anyone experiment with it and share their conclusions?
    And what about clustering for 24/7?
    No, I can't spell!
    -"Run to that wall until I tell you to stop"
    (tagadum,tagadum,tagadum .... *CRUNCH*)
    -"stop...."
  • oops... I just read the post more carefully... well, I have no experience with diskless nodes; that certainly would make a lot of sense for a number of reasons...

    Video cards: if you want to use a KVM switch for administration, you'll need them: in that case, go for cheapo cards for the computing nodes, and get a top-of-the-line one for the front node.

    I recently [linuxshowcase.com] saw VA Linux's [valinux.com] set-up for a linux cluster: their kick-ass feature is administration thru a serial console (no need for more expensive KVM switching) and direct manipulation of each node's BIOS from the front-node (no cpu cycles spent on that). OTOH, I wasnot impressed by their top specs on their 2U systems (2 PIIIs, 1 GB RAM total): some of the code I am porting needs ~1GB/CPU, and that's not a top-of-the-line code either. Plus, 2-CPU systems mean more rack space (as opposed to quads); however, SMP performance under Linux still has a long way to go, so YMMV...

    A final note: be careful about disk I/O: the common solution on MPP machines is to have each process write to disk; in linux clusters, this usually means writing to some NFS-mounted partition. Well, the NFS implementation (and I am not sure here if it's NFS in general or linux's version) cannot keep up with a lot of reads/writes to the same space... Change the way you do I/O (one process handles everything) or clone the filesystems and use local space (a bitch to maintain, though)...

    Beowulfs are cool, but we are not nearly commercial-level usability here... what did the old maps say, "monsters here"? ;-)...


  • Look at the Beowulf Underground [beowulf-underground.org] for an excellent compilation of links, resources and software.

    Also, if you're not gonna need some of the more specific software (kernel patches, ethernet channel bonding and the like that usually come in RPMs) but just want to implement generic MPI or PVM, I'd go with Debian next time instead of RH6.0, purely for maintenance reasons.

    The big question you have to ask yourself though, is what kind of application you want implemented, and build the cluster to match it... if you don't have an application in mind, then you probably don't need a Beowulf...

  • Here's a book for you.
    How to Build a Beowulf (a guide to the implementation and application of PC clusters)
    By Thomas L Sterling, John Salmon, Donald J. Becker and Daniel F. Savarese. ISBN 0-262-69218-X
    From M.I.T Press
    Published 1999

    I've read this book and although some of it is very basic (such as how to install Linux) It is a very informative book, well referenced throughout If you are intrested in building clusters on other OS's as well.
  • Seti@home and Distributed.net are pretty useless for a Beowulf - they're designed to be distributed over multiple machines. You wouldn't see any performance increase out of a Beowulf-aware version of any of these.
  • Hello, I rectenly came across 40 compaq 486's (hey they were free :)I have a left over 12 port 10M/bit 3com hub, lots of cat5 cable and all the 486's have nics. I would like to mess with beowoulf to some get experence. My Question is would it be possible to build a General Purpose Beowoulf computer. That runs apps like Netscape,Word Perfect and Gimp really fast or should I stick to doing something like SETI@home or RC5? Eric

Suggest you just sit there and wait till life gets easier.

Working...