Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×

Building Your First Cluster? 71

An anonymous reader asks: "I'm interested in building a DIY cluster using Linux and will be using conventional Linux software. However, the number of possible ways to do this is huge. Aside from Beowulf, there's Mosix, OpenMosix, Kerrighed, Score, OpenSSI and countless others. Therein lies the problem. There are so many ways of clustering, development seems to be in fits and starts, most won't work on recent Linux kernels and there's no obvious way to mix-and-match. What have other people used? How good are the solutions out there?"
This discussion has been archived. No new comments can be posted.

Building Your First Cluster?

Comments Filter:
  • Re:Well, uh... (Score:5, Insightful)

    by kimvette ( 919543 ) on Wednesday July 26, 2006 @09:25PM (#15788091) Homepage Journal
    "massively SMP" does not provide fault tolerance and does not eliminate certain bottlenecks such as disk I/O and network throughput, so if it's for an extremely high volume/high availability fileserver, mail server, or web server, massive SMP isn't going to cut it.

    Also, I'd go a render farm (if that's the task) if I had to choose between clustering and SMP, because if one node dies (depending on the managing application) the job just continues, whereas if it's on one single monster machine with no fault tolerance, if the job dies you often have to start rendering again from the beginning. Not fun.

    So let's back up and ask:

      1. What problem are you trying to solve?

      2. If it's a learning experience, try them all, take notes on which suit you best for tasks a, b, and c,

      3. What are your priorities
  • by Fishbulb ( 32296 ) on Wednesday July 26, 2006 @09:55PM (#15788243)
    Cluster need special software to take advantage of the disturbed computing. They are built with a specific task in mind. Or do you already have a need and just failed to tell us?

    And specifically, is this a processing cluster or a failsafe cluster? I kind of assume a processing cluster, since that's what most people on slashdot refer to as a cluster, but in my experience most of the clusters out there are failsafe clusters ("5 9's" of service versus raw horsepower). Two rather different applications of clustering, requiring different design philosophy and sometimes different clustering applications.

    OB Mony Python ref:
    "I don't know what you mean, an african cluster or a euopean cluster?"

  • by tcpipgeek ( 991310 ) on Wednesday July 26, 2006 @10:26PM (#15788414)
    I am pretty much summarizing what has been already mentioned. Cluster is somewhat of an ambigous term which means tons of different things. You really need to specify what you want to achieve before any meaningful suggestion can be provided.

    This is a good reference:

    http://linuxclusters.com/compute_clusters.html [linuxclusters.com]

  • better reason (Score:2, Insightful)

    by tezbobobo ( 879983 ) on Wednesday July 26, 2006 @11:27PM (#15788721) Homepage Journal
    Common people. It really saddens me that te only reason people can think for doing this is rendering, compiling, and coolness. Maybe, and I'm wishing more than expecting, the guy is compiling a new breed of kernel for super gaming. I think the most fun thing to do now is assume he is doing it from a gaming point of view and move into fun, spectulative hypothesising. If it doesn't help the poor guy, then at least it may give him some muc cooler ideas.
  • Beowulf! (Score:2, Insightful)

    by sp1nm0nkey ( 869235 ) on Thursday July 27, 2006 @03:32AM (#15789425)
    What do you want to DO with it? To get one thing straight, beowulf is a distribution, and bproc/mosix/lam/mpich are ways of getting apps to communicate over a cluster. What technology you use is going to depend on the app. If the app is written for mpich, you have to use mpich. If it's written for bproc, you need to use bproc. If you're writing it, look around at the various technologies and see which one you like the most. MPI is a smallish layer above sockets that allows you to explicitly pass messages from cluster node to cluster node. Bproc allows a program to fork across the nodes of a cluster and then join back together. For just getting started (and I'm probably biased since I work for scyld), Beowulf is awesome! The latest distro is about to go beta fairly soon. It installs on top of redhat, and right after you install, you can power up the nodes and if they're set to pxe, all the nodes will come up as compute nodes. It comes with MPICH and Bproc and a few interesting demos (tachyon, a raytracer; and a fractals program) and linpack. The only bad thing about Bproc is that it has to patch the kernel. However, it works very well. I've heard bad things about OpenMosix, it does some fairly bad things like migrating file descriptors and some other silly things. The main thing is it's pretty much a hack to get threadded applications not intended to be clustered to distribute across nodes, which is just... not a good thing. Applications should be written to work in a clustered environment. Anyway, have fun!
  • by TheRaven64 ( 641858 ) on Thursday July 27, 2006 @08:07AM (#15790022) Journal
    The article submitter seems to have decided two things:

    1. They want to use Linux.
    2. They want a cluster.
    In general, these are the two things that should be decided last. Other posters have addressed the 'why do you actually think you need a cluster' issue, so I will take a look at the 'why do you want to run Linux' bit.

    If what you want is reliability, then nothing beats OpenVMS. You have to pay a premium for hardware that can run it (VAX, Alpha or Itanium only), but if you really need that much reliability then it might well be worth it.

    If you want a compute cluster, then Solaris might be a better bet. Most of the Linux clustering distributions still use (heavily patched) 2.4-series kernels, which leaves you without things like AIO. The Solaris cluster management tools have been progressively refined for well over a decade, while the Linux equivalents are in their infancy.

    Finally, if what you really want is a big computer, then take a look at DragonFly BSD in the next few months. Their aim is to build a system designed for Single System Image clustering (i.e. pretending you have a big NUMA computer, rather than a lot of little nodes) and they have made a lot of progress recently.

    Building a cluster is not too hard. Maintaining one is. Our OpenMOSIX Linux cluster really needs a full time technician to dedicate 40-60% of their time to it.

  • Warewulf (Score:1, Insightful)

    by Anonymous Coward on Thursday July 27, 2006 @10:01AM (#15790553)

    So far, no one has mentioned. Warewulf [warewulf-cluster.org].

    I have built three Warewulf clusters in the past year. I like how light weight and customizable WW is. It consists of a bunch of scripts that netboot/etherboot/PXE boot a custom RAM disk as your root file system from a tftp server (in my case the head node). (The smallest RAM disk we have built is around 10 MB. Everything else can be NFS mounted so each of the nodes has the capabilities of a standalong workstation.) From there you can configure it to do whatever you want. (Corollary: if you want to do it all, it may be better to start with one of the heavier cluster distributions like Rocks. I prefer my systems to be lean and mean.) By default it assumes that you are going to run "diskless" [1] but that can be overridden. I use the local disks as swap and /scratch.

    [1] One of the biggest headaches in running a cluster is ensuring that each of the nodes are the same. Configuration creep is one of the banes of administering a cluster. Running diskless is a major advantage because it is easy to bring a node up to date...just reboot. It only takes 30 seconds for my nodes to reboot and be back up. (It would be about 10 seconds shorter if the BIOS would allow me to turn off the checks for Promise SATA RAID that I am not using.) Another advantage is that there is no local state so I can swap out a failed node fast with no hassle.

    WW currently works best on RedHat systems, but ports for Debian and other Linux distributions are in the works. (My last two have been Debian based, but I had to do a lot of tweaking to get it right.) The community is very active and the principles are very involved in helping you be successful.

    Give it a try.

"If I do not want others to quote me, I do not speak." -- Phil Wayne

Working...