It's not that specialized. It's just plenty of DSPs strapped together on a torus.
Actually Anton uses ASICS, their cores are specially geared at MD codes. This goes way beyond just "strapping together DSPs". They have IIRC ~70 hardware engineers on site. (Source: I've been to DE Shaw Research last year).
Unlike what wikipedia claims, you could probably achieve comparable performance using a more classical and general-purpose supercomputer setup with GPU or Xeon Phi accelerators, provided the network topology is well tuned to address this sort of communication scheme
No, you can't, and here is why: Anton is built for strong scaling of smallish, long running simulations. If you ran the same simulations on a "x86 + accelerator" system (think ORNL's Titan) then you'd observe two effects:
- The GPU itself might idle a lot as each timestep only involves few computations, leaving many shaders idle or waiting for the DRAM.
- Anton's network is insanely efficient for this use case. IIRC it's got a mechanism equivalent to Active Messages, so when data arrives, the CPU can immediately forward it to the computation which is waiting for it. That leads to a very low latency compared to a mainstream "InfiniBand + GPU" setup.
(most recent supercomputers don't use tori)
Let's take a look at the current Top 500:
- #1 Tianhe-2: Fat Tree
- #2 Titan: 3D Torus
- #3 Sequoia: 5D Torus
- #4 K Computer: 6D Torus
- #5 Mira: 5D Torus
- #6 Piz Daint: 3D Torus
- #7 Stampede: Fat Tree
- #8 JUQUEEN: 5D Torus
- #9 Vulcan: 5D Torus
- #10 nn: 3D Torus
So, torus networks are the predominant topology for current supercomputers.