I would go with Haswell-EP Xeons -- probably 2697v3 (14 cores @ 2.6-3.6): a two-socket motherboard gives you 28 physical cores per board, for prices in the $12K range. Just one of these is quite a powerful system. If you can get by with a 2-node system, then 10GE interconnect is good enough (AND MUCH CHEAPER); for more nodes, you will need Infiniband (since 10GE does not scale well). The 4-node/IB cluster will be on the order of $60K, and will offer more performance than a $160K solution of a couple of years ago.
These will offer far better performance than the Opteron solution.
Can you compile your own application? If so, use the Intel compilers, and make sure you compile targeting the Haswell instruction set (-O3 -Xhost -march=corei7-avx2 -mtune=corei7-avx2 if I recall correctly): the full AVX2 Haswell instruction set is rather more powerful for your app than the predecessor "AVX" SandyBridge/IvyBridge instruction set, which is far more powerful than the previous
Nehalem/Westmere SSE4.2 instruction-set, which is somewhat more powerful than a simple "-O3". If you can't compile on your own, try to make sure the vendor's executables target AVX2; the right compile-flags will double your performance over "-O3"...