Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Unix Operating Systems Software

Is There a Use for a Public Beowulf? 17

Anonymous Coward asks: "If the average Slashdot reader had access to a Beowulf cluster, what would they use it for? Everyone seems to think that Beowulf clusters are fairly interesting, but does anyone have any particular job they would assign to one? If someone were to create a publically accessible Beowulf cluster, what would you do with it? Is there even a demand for such a beast?" Now this would be a neat hack, but the logistics behind running such a thing would be immense. But even though something like this may not be needed now this might not necessarily be so in the future. Something like this might be a great tool for that novice astronomer in the neighborhood ... or aspiring mathematicians in high school.
This discussion has been archived. No new comments can be posted.

Is There a Use for a Public Beowulf?

Comments Filter:
  • by Anonymous Coward
    I've been working on building a free public access beowulf cluster for some time now. It's only a 32 node P133 system, but it will be free.

    I'll be submitting a post to slashdot as soon as it's up, but you can keep current by looking at www.ultrax.co.uk [ultrax.co.uk].

    cheers,
    Tim
    --
    tim@ultrax.co.uk
  • "If the average Slashdot reader had access to a Beowulf cluster..."
    they'd want to get together with other Slashdotters with Beowulf clusters and make a Beowulf cluster out of them :)
  • by DaveF ( 13373 )
    This would be expensive, but it could work out. Figuring say at least $2500 for a decent node, like say maybe 128MB of RAM or more. Maybe like a PIII-600 or higher, although you could get off cheap by using Celeron's, but not a whole lot of L2 cache there. So if you want to have a huge cluster, lets say 300 nodes, thats $750,000. Now add in racks, and switches, etc, etc...we're talking over $1,000,000. Of course, if the programs are just running on the cluster, and don't have an interaction with the outside world, as little as a T1 could be sufficient. Also have to figure in monthly administration costs.

    Okay, so you get funding, and let's say it goes over big and you have say 50 paying subscribers, we'll shoot small. Each paying $10/mo. That's $500/mo. total. That'll take a long time to pay off the investment. I do think though, that it could work with investment, and then have some paying subscribers, maybe they get priority over non-paying subscribers.
  • If I had a college student make the cases, I could see probably 500MHz, 128Mb and ~3Gb nodes (100Mbit) for about $400 each.
    So a 10node system would be about $5k
    and a 20 would be $10k
    (upper limits on pricing)
    Hmmmm.
  • Well, I can't really think of any besides normal usages, but if a university were to set one up that would be cool. I set up a temporary 16 Node Cluster using a CD/zip drive and NFS running for a High School Research Project (best experience with Linux yet).. I just used it for an hour or two to get times for generating images on a different number of computers (for usuability stats).. This all cost me the price of a few CD-Rs, a Zip drive, $2 linux CD's (which I had) and A LOT OF TIME!! but the experience was great.. I might put up the details on my web page once I'm done presenting the project (3wks).. It would be cool If i could have acces to one 24/7..
    I would probably use it for generating images or somin.. I have no clue what the average ./ user would use it for.. It still would be cool to say I have access to computer number 150 on the top 500 supercomputer list..

    Kenny

  • by KevF ( 23545 )
    Sure you could finish off the distributed.net RC5-64 challenge pretty quick on one

    All Ford, All The Time

    FordTalk [fordtalk.co.uk]

  • As I understand it, it is not easy to take full advantage of a Beowulf system. It's not like you just recompile netscape and run it on a 128 node cluster to get bitchin' performance. No, the app you're running must, first of all, be suited to parallel computing.

    This means that it must have computation-intensive segments which need not be run in a sequential fashion. Anything with large amounts of number crunching is usually a candidate.

    Second, you must rewrite the application using a parallel computing interface to take advantage of this. I've seen it done, and it's not exactly trivial.

    Given all of this, however, I think a public Beowulf could be a wonderful way to introduce high-school students to parallel computing early on. Most high schools could not afford to dedicate even 16 machines to such a task, so having a public cluster available would make it easier. If there was an instructor who understood how to teach it, this could give students who are interested in scientific computing a great head start.

    Having scored at a HS programming contest just yesterday, however, I realize that not all programming instructors are all that clued.

  • Well, to find a use for a public supercomputer, you would have to think about what large computer clusters are used for. I can think of a few:
    • Mathematical Simulations
    • Video/Image processing
    • Educational tool
    That's all I can really come up with at the top of my head...

    In my opinion, maybe such a system (or project) shouldn't focus so much on supercomputing, but just computing in general. If you could have a system that would provide shell accounts, with full access to languages, compilers, etc. Prehaps this system could provide students a way to have easy access to programming facilities, UNIX mail, etc.

    The problem is, most of the current desktop machines are powerful enough to eliminate the need for time-sharing computer power. (At least, for what a "free" public computer would be used for)

    *shrug* Just my opinion...

  • I actually do a lot of numerical modeling, and I do some rather cheezy little things to take advantage of as many cpu's as I can find. But because I'm not a CS student I don't get access to many fast machines. I've been lucky enough that I can break up my problem and fit the data on to a floppy disk, drag it to the computer lab late on a friday night, beg for 6 machines and let it run a few hours. But to be honest it'd be very nice to get an my processing down in minutes not hours. Right now, my data set takes about 20 hours to process on a pent III 550. I've profiled it a lot and thats about the best I can do.
  • You're probably right if you go for unrestrained scheduling. But it is true that scheduling for computer time is about that strict on the resources I've been exposed to here at the U. (Mmmm, Cray T3E)

    Perhaps if some form of subscription service was implemented to help subsidize the costs and regulate user usage times, this would be more feasible. Something like, "$20/month for x hours of time." And there are plenty of batch scheduler systems out there that can queue your job and run it so you don't have to actually be up at 4am on a MOnday morning... You could probably even write one that emails you the results when it gets done, which would be spiffy. I know I wouldn't mind spending that little to have access to that kind of computing power. Of course, also offering a mechanism to get free or subsidized time for those who can't afford the subscription cost would be neccessary.


    --

  • First up, you need a stock of (lets say 32) reasonably powerful PCs that already earn revenue, or whose cost is already covered by some funding. They would typically be dual-boot WinDoze/Linux boxes, 400MHz, 128Mb, 6Gb for Win, 12Gb for Linux.

    Examples I can think of or have seen are:

    those in the basement of a University library where students type end-of-semester papers

    those used for general office/secretarial work in a company

    those in an internet cafe.

    You get the idea.

    These PCs would be in use from, say, 9h00 to 21h00 running WinDoze. Even the a careless or ill-intentioned user would prbably be incapable of damaging the Linux partitions from within Billy Boy's "Os". Then, at 21h00, the night-shift takes over. The machines are rebooted into Linux and become nodes in a Beowulf cluster.

    There's a job-queueing server behind the firewall; users connect via ftp and leave their jobs in the incoming directory at any time. During the night, the job-queueing server submits these jobs to Beowulf. Users collect the results of the job that are found in the outgoing directory of the job-queueing server.

    I'm sure there are many, many establishments with PCs sitting around idle for most of the time. I work in a company where around desktop PCs are used 8h00 - 20h00 Monday to Friday. That leaves 12 hours per night, plus all 48 hours at the weekend, i.e. 108 hours per week available.

    The point is, these scenarios take PCs that are already paid for and use time that is usually lost. The only extra cost is the time it takes to set up the Beowulf cluster and implement the job-queueing server. There might not be any need to have an operator there during Beowulf operations, just during the boot phase.

  • You'd need to develop some sort of queueing system so that users could submit there jobs, specifying the number of processors, length of calculation, etc. and then the batch system could optimally run the jobs. At least thats how they do it on supercomputers that I have used.

    What would be neat is if a LUG would team up with a school district and develop a system at a particular high school that would be a resource for the whole district to use for science projects or whatever. The LUG could help develop a cirriculum that could be taught in workshops at the different schools.

  • My junior/senior year, the local LUG set up a 5-node Beowulf. Nice little system, but I never saw more than one person at a time use it.

    If 10 people all want to use it at the same time, what happens? They're all fighting for resources (RAM/CPU). May as well do it on your desktop. Or am I missing something?

    A public-access Beowulf would need either a LOT of nodes, or very strict user scheduling (user A gets 2 hours on Tuesday, user B gets 4 horus on Wednesday, etc.)

  • Well, in addition to being a computer geek, I love chemistry. Problem is, most chemical simulation software (as opposed to pretty opengl visualization software) is either very expensive or very memory/cpu cycle hungry (model water on your PC, no problem, model a 40,000 carbon biopolymer, watch your Athlon go up in smoke along with your RAM... ;^) ), or (very frequently) both.

    But if some entity (.gov or .edu) had an open access beowulf with things like NAMD, Gaussian, Molpac, Moldy, (etc etc etc) loaded on it, that would allow the chemically-inclined members of the populace to actually get real data right now instead of having to get a PhD in order to have access to a {Beowulf | Cray | whatever}.

    Another option immeadiately presents itself: Massively Parallelized Povray. :^) For making pretty pictures of the molecule you just spent 5,000 cpu-hours modeling. Or 3d renderings of Natalie Portman's ass covered in hot grits, if you want to skip the chemistry bit...


    --

Thus spake the master programmer: "After three days without programming, life becomes meaningless." -- Geoffrey James, "The Tao of Programming"

Working...