Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?
Note: You can take 10% off all Slashdot Deals with coupon code "slashdot10off." ×

Comment They already co-design the hard-/software (Score 1) 223

Basically, the procurement process for supercomputers is like this: the buyer (e.g. a DOE lab) will ready a portfolio of apps (mostly simulation codes) with a specified target performance. Vendors then bid for how "little" money they'll be able to meet that target performance. And of course the vendors will use the most (cost/power) efficient hardware they can get.

The reason why we're no longer seeing custom built CPUs in the supercomputing arena, but rather COTS chips or just slightly modified versions, is that chip design has become so exceedingly expensive and that the supercomputer market is marginalized by today's mainstream market.

Also, the simulation codes running on these machines generally far outlive most supercomputers. The stereotypical supercomputer simulation code is a Fortran program written 20 years ago, which received constant maintenance in the past years, but no serious rewrite is viable (costs exceed price of hardware). So vendors will look for low-effort ways of tuning these codes for their proposed designs. Sticking with general purpose CPUs is in most cases the most cost efficient way.

Comment Capacity vs. capability (Score 1) 223

So, what you describe is essentially the difference between capacity and capability machines. The national labs have both, as there are use cases for both. But the flagship machines, e.g. Titan at the Oak Ridge Leadership Computing Facility (OLCF), are always capability machines -- built to run full system jobs, jobs that scale tens of or hundreds of thousands of nodes.

Comment Exascale machines are for scientific computing (Score 2, Informative) 223

These Peta/Exascale supercomputers are build for computer simulations (climate change, nuclear weapons stewardship, computational drug design, etc.), not for breaking encryption. That's also one reason no one is using them to mine Bitcoins: they're just not efficient at that job. To compute lots of hashes, dedicated hardware designs (read: ASICS) far outpace "general purpose" supercomputers.

Comment Supercomputers already do drug design (Score 1) 57

Computational drug design is already a big topic in supercomputing, although it's much more focused on interactions of individual molecules. That's currently so complex that it's more efficient to build specialized machines (e.g. http://en.wikipedia.org/wiki/A... ).

Comment Why is the hardware so complex/expensive? (Score 2) 217

From what I read the dongle is merely the interface from the camera (USB) to the smartphone (USB). That should be trivial. (For my setup a USB OTG cable + adapter to mini USB is sufficient, there are tons of apps to control cameras).

The article states that they had to use a beefier micro controller etc., but I wonder: why not do all the processing on the smart phone? These days our phones have so much processing power AND sensors, there should be no need to do any kind of non-trivial logic outside, especially when you're just trying to launch your first product.

Comment C++14 != C++98 (Score 3, Interesting) 407

I wish people would stop treating modern C++ as if time had been standing still in the past decades. Yes, C++ is complex, but also expressive. Modern features (e.g. lambdas+auto+templates) often let you write code which is just as concise as its Ruby counterpart, but much more efficient.

Comment We're no longer at the origin (Score 1) 181

Architectural improvements for general purpose CPUs yield less and less benefits: Even more registers? Even better branch prediction? Even larger caches? It'll all yield but a few percent, at least for current Intel designs. So, the way to go is currently more and more cores, but what good is it to have many cores that can't all fire simultaneously?

Comment Clickbait Caption, but Valid Arguments (Score 3, Insightful) 181

Of course general purpose CPUs exist, simply because we call them that way. But it is also true that each design has it's own strengths, and "dark silicon" is another driver for special purpose hardware. Efficiency is another. Andrew Chien has published some interesting research on this subject. In his 10x10 approach he suggests to use 10 different types of domain-specific compute units (e.g. for n-body, graphics, tree-walking...), each of which is 10x more efficient than "general purpose CPUs" in its domain (YMMV). Those compute units bundled together, make up one core of the 10x10 design. Multiple cores can be connected via a NoC.

Let's see how software will cope with this development...

ps: can special purpose hardware exist if general purpose hardware doesn't?

Comment Re:Hai! (Score 1) 111

One reason might be that railways are more efficient in densely populated areas. There express trains can even compete with airplanes. Yesterday we went from Tokyo to Osaka. Flight time would have been ~1h, plus 1h checkin and transfer to/from the airport (~45min. each). The Nozomi Shinkansen took us there in 2:30, and both stations were directly at the center of the cities.

Most of Japan's population is situated in coastal regions, so just a hand full of routes can service all major cities. Imagine how many connections you'd need in the US...

Comment Yesno? (Score 1) 89

It's not that specialized. It's just plenty of DSPs strapped together on a torus.

Actually Anton uses ASICS, their cores are specially geared at MD codes. This goes way beyond just "strapping together DSPs". They have IIRC ~70 hardware engineers on site. (Source: I've been to DE Shaw Research last year).

Unlike what wikipedia claims, you could probably achieve comparable performance using a more classical and general-purpose supercomputer setup with GPU or Xeon Phi accelerators, provided the network topology is well tuned to address this sort of communication scheme

No, you can't, and here is why: Anton is built for strong scaling of smallish, long running simulations. If you ran the same simulations on a "x86 + accelerator" system (think ORNL's Titan) then you'd observe two effects:

  • The GPU itself might idle a lot as each timestep only involves few computations, leaving many shaders idle or waiting for the DRAM.
  • Anton's network is insanely efficient for this use case. IIRC it's got a mechanism equivalent to Active Messages, so when data arrives, the CPU can immediately forward it to the computation which is waiting for it. That leads to a very low latency compared to a mainstream "InfiniBand + GPU" setup.

(most recent supercomputers don't use tori)

Let's take a look at the current Top 500:

  • #1 Tianhe-2: Fat Tree
  • #2 Titan: 3D Torus
  • #3 Sequoia: 5D Torus
  • #4 K Computer: 6D Torus
  • #5 Mira: 5D Torus
  • #6 Piz Daint: 3D Torus
  • #7 Stampede: Fat Tree
  • #8 JUQUEEN: 5D Torus
  • #9 Vulcan: 5D Torus
  • #10 nn: 3D Torus

So, torus networks are the predominant topology for current supercomputers.

The only difference between a car salesman and a computer salesman is that the car salesman knows he's lying.