Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×

Comment Re:Meanwhile... (Score 2) 77

Some additional points:

- FWIW, Linux finally got rid of the BKL in the 3.0 release or thereabouts.

- Many (most?) 10Gb NIC's are multiqueue, meaning that the interrupt and packet processing load can be spread over multiple cores.

- Linux and presumably other OS'es have mechanisms to switch to polling mode when processing large numbers of incoming network packets.

That being said, your basic points about interrupt latency being an issue still stand, of course.

Comment Teh sky, it's falling!!111 (Score 5, Informative) 148

To recap, KWin currently supports:
  • No compositing
  • Compositing using the 2D XRender interface
  • Compositing using OpenGL 1.x

  • Compositing using OpenGL 2.x
  • Compositing using OpenGL ES 2 (code mostly shared with the OpenGL 2.x codepath)

So what is suggested here is to delete support for compositing using OpenGL 1.x.

Personally, I can hardly blame the developer for wanting to prune that list a bit.

And, if you don't want to see this feature deleted, now is your opportunity to step up to the plate and contribute!

Comment Re:Pricing would be interesting! (Score 1) 68

What you're looking for is the Green500 list [green500.org]

Indeed, but the site was down when I wrote my previous reply so I had to resort to the top500 list and calculating flops/watt for the few top entries manually. :)

In any case, as one can see from the list, the best GPU machine manages to beat the K machines by a factor of 1.66, a far cry from the factor of 3-6 you originally claimed. And most GPU machines fall behind the K.

I think the sparc viiifx is quite impressive, it gets very good flops/watt without being a particularly exotic design. Basically it's just a standard OoO CPU with a couple extra FP units and lots of registers clocking at a little lower frequency than usual. No long vectors with scatter/gather memory ops, no GPU's, no low power very slow embedded CPU's like the Blue Genes etc.

I have no knowledge of the design tradeoffs of the individual systems, but I'd say that it's fairly impressive that both the top500 and the Green500 have so many GPUs in the top 10, given that they're both CPU-dominated lists.

Large GPGPU clusters are still a relatively new phenomenon, give it a few years and I suspect you'll see a lot more of them.

Comment Re:Pricing would be interesting! (Score 1) 68

Oh? So how come the VIIIFX based "K computer" then, apart from being the current #1 in performance, also beats the GPGPU clusters (with the latest Nvidia Fermi cards) in flops/watt on the latest top500 list: http://top500.org/list/2011/06/100 ? And heck, that's on linpack, which should be the pretty much optimal workload for a GPU.

Comment Re:How is it used? (Score 1) 125

How much does computer time on these things cost? How is the cost calculated? Is time divided up something like how it's done on a large telescope, where the controlling organization get proposals from scientists, then divvies up the computer's available time according to what's been accepted?

On the supercomputer centers I'm familiar with, scientists write proposals which are evaluated by some kind of scientific steering committee which meets regularly (say, once per month), and gives out a certain amount of cpu-hours depending on the application.

Do they multi-task (run more than one scientists' program at one time)?

Yes. Typically the users write batch scripts requesting the amount of resources their job needs. E.g. "512 cores with at least 2 GB RAM/core, max runtime 3 days", and then they submit the batch job to a queue. At some point when there are enough free resources in the system, the batch scheduler launches the job. When the job finishes (or during its runtime) the usage is then subtracted from the quota they were awarded in the application process.

Does the computer run at top power (10pf) at all times, or does the resource usage go up and down?

Usually all functioning nodes are running and available for use, yes. Typically load is around 80-90% of maximum, due to scheduling inefficiencies etc. (e.g. a large parallel job needs to wait until there are enough idle cores before it can start, and so forth).

And lastly, how hard is it to write programs to run on these things? Do the scientists do it themselves, and if so, do the people who run the supercomputer audit the code before it runs?

Pretty tricky. Usually they use the MPI library. The programs are either written by the scientists themselves, or by other scientists working in the same field. The supercomputing center typically doesn't audit code, but may require the user to submit scalability benchmarks before allowing the user to submit large jobs. For some popular applications the supercomputing center may maintain a version themselves (so each user doesn't need to recompile it) and provide some more or less rudimentary support.

Comment Re:Uh oh.. (Score 1) 387

One major advantage of IB here is that it natively supports multipathing; there's no need to avoid loops in the graph either by topology or by using spanning trees. This allows one to build networks with decent bisection BW without needing big and expensive über-switches.

There are a few efforts to bring similar capability to ethernet as well, TRILL and 802.1aq, AFAIK neither of which is ratified at the time of writing this.

Comment Re:Uh oh.. (Score 1) 387

I'll assume you know more about this than me, but he did say that the nodes are going to be wired with 4x GigE. Might there be a penalty bridging from that to IB rather than 10GigE?

The way I read it, it means that the nodes have 4 1 GbE ports builtin on the MB. If you're going to use IB, you'll by separate PCIe IB cards for each node. The 1GbE ports can then be used to run management traffic etc. Or left unused, there's no law saying you have to use them all, and since 1GbE ports are practically free it's not like you're leaving any money on the table either.

Wrt RDMA over ethernet, iWARP this and that, yes I know it exists. My point was that RDMA has been supported on IB since day 1, the software stack is mature and widely used, which can't be said for ethernet RDMA. Since IB infrastructure so far is cheaper there's really no reason to go with 10GbE.

That's not to say that 10GbE is useless. Of course it's useful, e.g. if you run high-bandwidth services over TCP/IP accessible from outside the cluster. But that's not what you're doing on a cluster. A cluster interconnect is typically used for MPI and storage, both of which can run over RDMA, avoiding the by comparison heavy-weight TCP/IP protocol. And of course, at some point 10GbE will replace 1GbE as the cheap builtin stuff on MB's.

Comment Uh oh.. (Score 4, Insightful) 387

Shouldn't you have figured out answers too all these (simple) questions before ordering several million $$$worth of hardware? Sheesh.. As for you specific questions: - IB vs. 10GbE: IB hands down. Much better latency and more mature RDMA software stacks (e.g. for MPI and Lustre). Cheaper and higher BW as well. - GPU: NVidia Fermi 2090 cards. CUDA is far ahead of everything else at the moment.

Comment Re:Sparc based (Score 1) 179

This one uses SPARC chips designed and fabbed (IIRC?) by Fujitsu. Sun/Oracle has nothing to do with it. AFAICT the politics behind this machine is that a few years ago NEC pulled out from the project to design a next generation vector chip for use in a Japanese Earth Simulator follow-up. Hence the project resorted to the Fujitsu SPARC chips, which are not really designed for HPC but are still a domestic design. I wouldn't expect this machine design to become popular outside Japan.

Slashdot Top Deals

"The four building blocks of the universe are fire, water, gravel and vinyl." -- Dave Barry

Working...