I agree. 32 bit a PGAS memory model is silly. Giving each core its own 32 bit address space and using MPI for communication would be much more useful. Then, it could at least be a good learning tool for HPC programming techniques. Right now, it looks pretty useless.
Even GPGPU is limited for what it can do for HPC. There's a lot more to HPC than raw mathematical power. Memory is often the bottleneck, not the FPUs. The reason we even deal with multiple processors is that the performance increase of single cores has nearly stalled, forcing the use of multiple processors. Communication between multiple cores/processors is a very complicated thing, as well, and getting good performance is a lot more complicated than hooking up a bunch of processors in a grid. For example, the supercomputer I work with has 90,112 2.3GHz cores and 90TB ram; 16 cores per chip in 704 blades, interconnected with a 3d torus network topology. It's the memory/cache size and speed and network topology that makes it a supercomputer. You could get the 800TFLOP/s in a much smaller package using GPUs, but the performance would be drastically less. Even with the 64 cores parallella could have, distributing the workload on a 64 core grid isn't easy. GPGPUs use work groups of smaller numbers of cores to make this sharing a bit more easy to manage. They should have at least made the interconnects a 2d torus rather than a grid, thereby reducing the maximum path length in half. In order to do stuff like quantum mechanics, a 5d torus is optimal. Memory access is the key. This is a bit like comparing apples to oranges, but that's exactly my point: the thing is not a supercomputer.