i did this before, on a very small scale, for GBP 1,000 about 10 years ago. sales teams kept offering me 2ghz dual-core machines at GBP 300 each and i had to tell them this:
"look, i have a budget of 1,000 GBP. you're offering me a 2ghz system for 300. so i can only buy 3 machines, right? so that's a total of 6 ghz of computing power. on the other hand, if i buy this GBP 125 machine which has only a 1ghz processor, i can get 8 of those, which gives a total of 8 ghz of computing power. so _why_ would i want FASTER?"
so i bought qty 8 of motherboard, CPU, 128mb RAM, low-cost case containing a PSU already, and accidentally included a 3com network card because i didn't realise that the built-in ethernet on the motherboard could do PXE boot..... but still, all-in that was 125 GBP and each one took 15 minutes to assemble so it was no big deal. got myself 8ghz of raw computing power, which was the best that i could get for the money that i had.
and that's the question that you have to ask yourself. what's the highest performance / price metric that can be achieved?
the highly specific problem that i was endeavouring to parallelise was a very small memory footprint non-I/O-bound task: running the NIST.gov Statistical Test Suite. i booted all 8 machines off of my laptop, over PXE boot with an NFS read-only root filesystem. had to wait 30 seconds between each because my 800mhz P3 laptop with 256mb of RAM reaaallly couldn't cope with 8 machines hammering it... not over a 100mbit/sec link, anyway.
once started, i wrote a script that ssh'd into each and left them running the STS for a day at a time. very little actual data was generated: a report.
but the issue that you're solving may involve huge amounts of disk I/O, it may involve huge amounts of inter-connectivity (inter-dependence between the parallel tasks). you may even have to use a GPU (OpenCL) if it's that computationally expensive... ... and that's where anyone's advice really ends, because unless you know exactly what it is you need to do - in real, concrete terms of I/O per second, GFLOPs/sec, GMACs/sec, inter-communication/sec, you really can't and shouldn't even remotely consider spending any money.
so please consider writing a spreadsheet, based on the performance/price metric, extending it to the domain(s) that you're interested in optimising. then the answer about what to buy should be fairly self-evident.
oh and don't forget to include the power budget (and cooling) because i think it will shock the hell out of you. remember you need to include the maximum specs, not the "average" or "scenario design power".