Your options really depend on what sort of 'high performance' you have in mind. When it comes to performance per core, Xeons typically crush Opterons; but the pricing reflects that, especially if you need the 4-8 socket support and RAS features. If what you need is large amounts of RAM with the lowest possible spending on the system around it, Opterons have tepid performance per core; but are likely to be the cheapest option that still supports ECC, more than one socket, buffered DIMMs, and any other niceties you wouldn't get from just desktop. If your application is one that can be made to fit, GPUs are enormously powerful for the range of things that they are capable of doing well.
They also depend on how big you need your system to be and how tightly coupled it has to be. If your intended application handles its own network-level parallelization and doesn't depend on very low latency, blessed are you. Price per core skyrockets if you go above 2 sockets, and GbE is effectively free(at least in the sense that you pretty much can't buy a system or motherboard that doesn't come with at least 2 NICs by default, often more) and relatively cheap to switch. If you need lower latency, this will hurt more and you are looking at myrinet or infiniband. If your application needs a cluster that presents a single system image; especially one that also has genuinely low latency, you probably need to fortify your checkbook and consult an expert. You can get systems with more than 8 sockets and the appropriate custom interconnect; but you won't like paying for one.
Unless you value your time at surprisingly low rates; you probably won't want to build your own systems from parts; but depending on how tightly coupled you need, this may be something that you need to purchase as a system or something you construct from multiple computers you purchase.
Can you use either hardware you have or AWS(or one of their similar competitors) to better characterize what your application actually needs?