I assume this is an epic troll, but am going to give an honest answer anyway, because there are some legitimate questions buried in there.
I work with a aggregate.org
a university research group which has a decent claim
to having built the very first Linux PC Cluster, set some records
with them (KLAT2 and KASY0 were both ours), and still operates a number of Linux clusters, including some containing GPUs, so I feel like I have some idea of the lay of cluster technology. It is *way* overdue for an update (and one is in progress, we swear!), but we also maintain TLDP's widely circulated Parallel Processing HOWTO
, which was the goto resource for this kind of question for some time.
In a cluster of any size, you do _not_ want to be handling nodes individually. There are several popular provisioning and administration systems for avoiding doing so, because every organization with a large number of machines needs such a tool. The clusters I deal with are mostly provisioned with Perceus
with a few ROCKS
holdovers, and I'm aware of a number of other solutions (xCat
is the most popular that I've never tinkered with). Perceus can pass out pretty much any correctly-configured Linux image to the machines, although It is specifically tailored to work with Caos NSA (Redhat-like), or GravityOS (a Debian derivative) payloads. Infiscale, the company that supports Perceus, releases the basic tools and some sample modifiable OS images for free, and makes their money off support and custom images, so it is pretty flexible option in terms of required financial and/or personnel commitment. The various provisioning and administration tools are generally designed to interact with various monitoring tools (ex. Warewulf
) and job management systems (see next paragraph).
Accounting and billing users is largely about your job management system. Our clusters aren't billed this way, so I can't claim to have be closely familiar with the tools, but most of the established job management systems like Slurm
, and GridEngine
(to name two of many) have accounting systems built in.
The "standard" images or image-building tools provided with the provisioning systems generally provide for a few nicely integrated combinations of tools, which make it remarkably easy to throw a functioning cluster stack together.
As for GPUs... be aware that the claimed performance for GPUs, especially in clusters, is virtually unattainable. You have to write code in their nasty domain-specific languages (CUDA or OpenCL for Nvidia, just OpenCL for AMD) and there isn't really any concept of IPC baked in to the tools to allow for distributed operations. Furthermore, GPUs are also generally extroridnarly memory and memory bandwidth starved (remember, the speed comes from there being hundreds of processing elements on the card, all sharing the same memory and interface), so simply keeping them fed with data is challenging.
GPGPU is also an unstable area in both relevant senses: the GPGPU software itself has a nasty tendency to hang the host when something goes wrong (which is extra fun in clusters without BMCs), and the platforms are changing at an alarming clip. AMD is somewhat worse in the "moving target" regard - they recently deprecated all 4000 series cards from being supported by GPGPU tools, and have abandoned their CTM, CAL, and Brook+ environments before settling on OpenCL, and only OpenCL. Nvidia still supports both their CUDA environment and OpenCL environments, and (with some caveats) all the cards they have ever claimed to work for compute can still be used. Offsetting the somewhat easier and more flexible software situation on the Nvidia side, the AMD cards tend to offer peak FLOPS/dollar numbers something like 4x what the Nvidia cards can provide, which makes the various parts surprisingly well matched. Note that the difference between the special compute hardware ("Tesla" and "Firestream") and consumer cards tends to be that they have a little more memory, and are enormously more expensive , so the consumer cards are way ahead in terms of FLOPS per dollar. We're currently speccing out a 64-node cluster hosting Radeon HD5770s that will (in theory) peak a little above 85TeraFLOPS of GPU performance for less than $10k in GPUs. To head off a common "oops" moment, it sounds as though your machines will be "server style" (ie. rackmount, high reliability PSUs, etc.), which can be a challenge, since that kind of system is generally not designed for hosting physically enormous, power hungry PCIE cards like GPUs.
The questions posed in the OP are *very* early issues in the planning process for setting up a cluster, but enjoy your journey into the woods, this stuff is fun.