They are going to have to get their electricity from somewhere & generating capacity don't grow on trees.
Unless they burn, um, err, apples? Yes, APPLES!
Man, I'm awesome!
Compositing using OpenGL 1.x
So what is suggested here is to delete support for compositing using OpenGL 1.x.
Personally, I can hardly blame the developer for wanting to prune that list a bit.
And, if you don't want to see this feature deleted, now is your opportunity to step up to the plate and contribute!
What you're looking for is the Green500 list [green500.org]
Indeed, but the site was down when I wrote my previous reply so I had to resort to the top500 list and calculating flops/watt for the few top entries manually.
In any case, as one can see from the list, the best GPU machine manages to beat the K machines by a factor of 1.66, a far cry from the factor of 3-6 you originally claimed. And most GPU machines fall behind the K.
I think the sparc viiifx is quite impressive, it gets very good flops/watt without being a particularly exotic design. Basically it's just a standard OoO CPU with a couple extra FP units and lots of registers clocking at a little lower frequency than usual. No long vectors with scatter/gather memory ops, no GPU's, no low power very slow embedded CPU's like the Blue Genes etc.
I have no knowledge of the design tradeoffs of the individual systems, but I'd say that it's fairly impressive that both the top500 and the Green500 have so many GPUs in the top 10, given that they're both CPU-dominated lists.
Large GPGPU clusters are still a relatively new phenomenon, give it a few years and I suspect you'll see a lot more of them.
How much does computer time on these things cost? How is the cost calculated? Is time divided up something like how it's done on a large telescope, where the controlling organization get proposals from scientists, then divvies up the computer's available time according to what's been accepted?
On the supercomputer centers I'm familiar with, scientists write proposals which are evaluated by some kind of scientific steering committee which meets regularly (say, once per month), and gives out a certain amount of cpu-hours depending on the application.
Do they multi-task (run more than one scientists' program at one time)?
Yes. Typically the users write batch scripts requesting the amount of resources their job needs. E.g. "512 cores with at least 2 GB RAM/core, max runtime 3 days", and then they submit the batch job to a queue. At some point when there are enough free resources in the system, the batch scheduler launches the job. When the job finishes (or during its runtime) the usage is then subtracted from the quota they were awarded in the application process.
Does the computer run at top power (10pf) at all times, or does the resource usage go up and down?
Usually all functioning nodes are running and available for use, yes. Typically load is around 80-90% of maximum, due to scheduling inefficiencies etc. (e.g. a large parallel job needs to wait until there are enough idle cores before it can start, and so forth).
And lastly, how hard is it to write programs to run on these things? Do the scientists do it themselves, and if so, do the people who run the supercomputer audit the code before it runs?
Pretty tricky. Usually they use the MPI library. The programs are either written by the scientists themselves, or by other scientists working in the same field. The supercomputing center typically doesn't audit code, but may require the user to submit scalability benchmarks before allowing the user to submit large jobs. For some popular applications the supercomputing center may maintain a version themselves (so each user doesn't need to recompile it) and provide some more or less rudimentary support.
There are a few efforts to bring similar capability to ethernet as well, TRILL and 802.1aq, AFAIK neither of which is ratified at the time of writing this.
The way I read it, it means that the nodes have 4 1 GbE ports builtin on the MB. If you're going to use IB, you'll by separate PCIe IB cards for each node. The 1GbE ports can then be used to run management traffic etc. Or left unused, there's no law saying you have to use them all, and since 1GbE ports are practically free it's not like you're leaving any money on the table either.
Wrt RDMA over ethernet, iWARP this and that, yes I know it exists. My point was that RDMA has been supported on IB since day 1, the software stack is mature and widely used, which can't be said for ethernet RDMA. Since IB infrastructure so far is cheaper there's really no reason to go with 10GbE.
That's not to say that 10GbE is useless. Of course it's useful, e.g. if you run high-bandwidth services over TCP/IP accessible from outside the cluster. But that's not what you're doing on a cluster. A cluster interconnect is typically used for MPI and storage, both of which can run over RDMA, avoiding the by comparison heavy-weight TCP/IP protocol. And of course, at some point 10GbE will replace 1GbE as the cheap builtin stuff on MB's.
"The four building blocks of the universe are fire, water, gravel and vinyl." -- Dave Barry