
All about Clustering... 4
King Monkey asks: "Over the past year or so I have see several mentions on the Interet about connecting computrs together in order to pool processing power and resources. I have not yet however seen anywhere that exlains the differences between the various implementations. What is the difference (if any) between clustering, Beowulf and Parametric processing. These are just the ones I have heard about. I am sure there are more I have not heard about. I would also like to learn about these."
Clusters (Score:2)
The High availability cluster is something else entirely. These clusters are not built for speed, but rather reliability and distributed load bearing. It usually means a group of machines that behave to the user as if they are one. Kind of like a certain major website that we're on. You generally have one or two traffic servers whose job is to send requests to the computer that meets certain criteria. Perhaps you want load balancing web servers. The traffic computers would send some requests to one server, some other requests to the other server, and so on based on some predetermined criteria. It can also be used to make sure no requests go to a dead machine. There is some real good information on this out there, but the most easily digestible is probably at TurboLinux [turbolinux.com] and their High Availability Cluster solution and RedHat [redhat.com] with their Piranha solutions.
I know that was oversimplified, but I hope that it helps.
4 Types of Clusters according to LinuxWorld (Score:1)
First off it disappoints me that these types of questions get posted here considering this was answered a little over two weeks ago with this article on Slashdot itself. Linux Clusters Explained [slashdot.org] But, hey what are you going to do?
The reason why I can see this coming up on Slashdot is because there really isn't a definitive guide out there that you can just kick back and read. \tangent\You would actually have to look. You know with a search engine or something. That's right "King Monkey" search engines exist.\/tangent\
Dun Dun DUN!!! That is until now. According to O'Reilly's site http://www.oreilly.com/catalog/clusterli nux/ [oreilly.com], they have a book in the works. And it should be out in August sometime.
But hey, don't let that stop anyone from actually searching for the information.
Okay, okay I'll stop picking on King Monkey and Cliff. We all love Slashdot anyhow.
High Reliability. (Score:2)
As others havce mentioned, there are clusters for super computing and for high reliability. I know nothing about the former, so concentrating on the latter:
There are two main ways to go about dealing with failures. One is to have data on both hosts consistent at all times, and the other is to discover the needed data when the other fails. There are tricky issues with both, but normally it is obvious which to choose. Sometimes a mixture works - which is what we choose for the product I'm working on.
An example of the all data on both is a database. You probably cannot discover at failover time who has appopintments when. So you write software to send a write command to two comptuers, each running your database. (Or more commonly you have a dual ported RAID disk so that when one comtpuer fails the backup can work with the master's disks) If your primary comptuer fails you can shift quickly and transparently to the backup. Some places will divide reads between the backup and the master (This doesn't work so well with shared disks, but works great for the two databases approach) SO that you never know which computer will get your request. It doesn't matter though as both are up to date.
An example of discovery is internet routers. If your cisco 7000 fails your pull a backup off the shelf, configure it, and connect the cables. (I don't know about the 7000 series, but for the smaller cisco routers it is very common to buy two identical ones at a time, configure them identical, connet cables to one, and set the second on top of the first with no cables - not even power connected.) When you connect the backup it uses the standard protocols to figgure out the network.
There are more examples. Like I said, in the project I'm working on now both make sense in different areas. Discovery takes longer to take over, but it doesn't have to worry about corrupted data.
Get Greg Pfister's book (Score:1)
It's not Linux specific, but it is a superb overview of the problems and solutions in low-end parallel computing. It also discusses the three favourite solutions (SMP, NUMA and clusters) in depth and goes over their strengths and weaknesses.
--
Cheers