The PI uses 4 watts, so a cluster of 64 PIs will use around 256 Watt. A NVidia GTX960 will provide 2,308 GFLOPS at 120 Watt or around 20 GFlops per watt. GTX980 is even better with 28 GFLOPS per Watt. Adapteva Epiphany-IV is supposed to do 100 Gflops at 2 Watt.
Tegra X-1 can do 512 GFlops at likely something between 5-10 Watts.
But even if you would build a Tegra X-1 cluster, for many applications it would still be less power efficient than a smaller number of more powerful machines with a good interconnect:
Even most parallel applications need some communication and exchange of results between the different threads. This will be very slow on the rasberry cluster.
But a rasberry pi cluster should be a good educational tool to teach cluster programming. Processing speed is slow, communication is also slow but the ratio between communication bandwidth and processing speed is likely quite similar to real clusters. So the skills that you learn when mapping small problems to a rasberry pi clusters can also be applied when mapping big problems to real clusters. And at the same time building one of these clusters is around the same price as a single compute node in a real cluster. So you can easily give students access to such a cluster.
You could solve the small problems way more efficiently using a single GPU, but if you want to solve the big problems a single machine is not going to be enough and you will have to deal with the limted communication bandwidth between the nodes.