You are right that our current algorithms will have to change. That's one of the major problems in exascale research. Even debugging is changing, too, with many more visual hints to sort through millions of logs. Algorithms may start becoming non-deterministic to reduce the need to communicate, for example. Of course, I'm referring to millions of cores, here. Desktop applications using a few cores is a much simpler task, but still an area that a lot of developers lack good training in. At least the methods have been largely figured out for things at the consumer and server level.
Supercomputers are usually just measured by their floating point performance, but that's not really what makes a supercomputer a supercomputer. You can get a cluster of computers with high end graphics cards, but that doesn't make it a supercomputer. Such clusters have a more limited scope than supercomputers due to limited interconnect bandwidth. There was even debate as to how useful GPUs would really be in supercomputers due to memory bandwidth being the most common bottleneck. Supercomputers tend to have things like Infiniband networking in multidimensional torus configurations. These fast interconnects give the ability to efficiently work on problems that depend on neighboring regions, and are even then a leading bottleneck. When you get to millions of processors, even things like FFT that have, in the past, been sufficiently parallel, start becoming problems.
Things like Parallella could be decent learning tools, but having tons of really weak cores isn't really desirable for most applications.