What you're looking for is the Green500 list [green500.org]
Indeed, but the site was down when I wrote my previous reply so I had to resort to the top500 list and calculating flops/watt for the few top entries manually.
In any case, as one can see from the list, the best GPU machine manages to beat the K machines by a factor of 1.66, a far cry from the factor of 3-6 you originally claimed. And most GPU machines fall behind the K.
I think the sparc viiifx is quite impressive, it gets very good flops/watt without being a particularly exotic design. Basically it's just a standard OoO CPU with a couple extra FP units and lots of registers clocking at a little lower frequency than usual. No long vectors with scatter/gather memory ops, no GPU's, no low power very slow embedded CPU's like the Blue Genes etc.
I have no knowledge of the design tradeoffs of the individual systems, but I'd say that it's fairly impressive that both the top500 and the Green500 have so many GPUs in the top 10, given that they're both CPU-dominated lists.
Large GPGPU clusters are still a relatively new phenomenon, give it a few years and I suspect you'll see a lot more of them.
How much does computer time on these things cost? How is the cost calculated? Is time divided up something like how it's done on a large telescope, where the controlling organization get proposals from scientists, then divvies up the computer's available time according to what's been accepted?
On the supercomputer centers I'm familiar with, scientists write proposals which are evaluated by some kind of scientific steering committee which meets regularly (say, once per month), and gives out a certain amount of cpu-hours depending on the application.
Do they multi-task (run more than one scientists' program at one time)?
Yes. Typically the users write batch scripts requesting the amount of resources their job needs. E.g. "512 cores with at least 2 GB RAM/core, max runtime 3 days", and then they submit the batch job to a queue. At some point when there are enough free resources in the system, the batch scheduler launches the job. When the job finishes (or during its runtime) the usage is then subtracted from the quota they were awarded in the application process.
Does the computer run at top power (10pf) at all times, or does the resource usage go up and down?
Usually all functioning nodes are running and available for use, yes. Typically load is around 80-90% of maximum, due to scheduling inefficiencies etc. (e.g. a large parallel job needs to wait until there are enough idle cores before it can start, and so forth).
And lastly, how hard is it to write programs to run on these things? Do the scientists do it themselves, and if so, do the people who run the supercomputer audit the code before it runs?
Pretty tricky. Usually they use the MPI library. The programs are either written by the scientists themselves, or by other scientists working in the same field. The supercomputing center typically doesn't audit code, but may require the user to submit scalability benchmarks before allowing the user to submit large jobs. For some popular applications the supercomputing center may maintain a version themselves (so each user doesn't need to recompile it) and provide some more or less rudimentary support.
There are a few efforts to bring similar capability to ethernet as well, TRILL and 802.1aq, AFAIK neither of which is ratified at the time of writing this.
The way I read it, it means that the nodes have 4 1 GbE ports builtin on the MB. If you're going to use IB, you'll by separate PCIe IB cards for each node. The 1GbE ports can then be used to run management traffic etc. Or left unused, there's no law saying you have to use them all, and since 1GbE ports are practically free it's not like you're leaving any money on the table either.
Wrt RDMA over ethernet, iWARP this and that, yes I know it exists. My point was that RDMA has been supported on IB since day 1, the software stack is mature and widely used, which can't be said for ethernet RDMA. Since IB infrastructure so far is cheaper there's really no reason to go with 10GbE.
That's not to say that 10GbE is useless. Of course it's useful, e.g. if you run high-bandwidth services over TCP/IP accessible from outside the cluster. But that's not what you're doing on a cluster. A cluster interconnect is typically used for MPI and storage, both of which can run over RDMA, avoiding the by comparison heavy-weight TCP/IP protocol. And of course, at some point 10GbE will replace 1GbE as the cheap builtin stuff on MB's.
If someone's goals don't agree with yours, then the polite thing to do is refuse to give them advice, not give them bad advice.
So the FSF should not put up a web page explaining which licenses they recommend and why, because someone on the Internet might disagree? Seriously?
Persistency: once eth0, always eth0 - this is what most commentators here seem to think this is all about, but it's already taken care of by udev with most modern distributions.
To some extent. The persistency is taken care of by adding state to the system, that is, by storing the MAC's somewhere. That fails e.g. if you switch out a broken NIC, or if due to some hw failure you move the HD to another identical server.
Naming: The article says they're changing the naming. This is what makes no sense. It's not "required." ethx is just fine, as long as the names are enumerated consistently (meaning that on two "identical" boxes, the order is identical based on physical port).
IIRC the justification for this is that using ethX would race with the original kernel names. This thingy is based on udev, when the kernel boots devices are given ethX names and then udev rules rename them according to bios names, or PCI bus order etc.
Seems nowadays quite many of the pro(sumer) sound cards are external ones connected via USB.
Presumably the idea being to isolate the DAC from all the electrical noise inside the case?
What about latency on these things? One would imagine that one extra protocol hop would add latency, and then traffic would have to be shared with other traffic on the same bus? I mean, people doing audio production seem to be sensitive to latency, to the point that Linux users use the RT kernel. Is USB really up to it?
Replying to myself, TFA contains some info about this. Hey, this is slashdot, who has time to read TFA?
"Virtual" means never knowing where your next byte is coming from.