If you use a hyper-cube, then the processors on the outside edges have no one to talk to. For a single dimension example, imagine a series of processors where every processor in a line has two communication links, one to talk to its neighbour on the left, and one to talk to its neighbour on the right. This is great for all the processors in the middle of the arrangement. However, in a one-dimensional straight-line arrangement, the processors on the end are either missing a left (or a right) neighbour. The solution to this problem is to connect the processors on the ends to each other, making the line a circle or ring.
A one-dimensional hypercube is a line. In supercomputing, it is often desirable to avoid any topology where the there is a flat (non-connected surface) on the side of the cube. Connecting the opposite edges of the cube to each other results in the torus topology in higher dimensions, and the ring topology in 1-D. For a picture of this effect, see the torus interconnect article on wikipedia.
While it is theoretically possible preferable to have really high-order interconnects, in practice wiring considerations limit the maximum number of interconnects. As such, most practical torus architectures are limited in the number of neighbours they can support.
FYI: The tree architecture is avoided in supercomputing for a different reason. Typically, each node has the fastest interconnect that can be provided, as interconnect speed affects system speed for many algorithms. Imagine if each leaf at the bottom of the tree needs 1X bandwidth. Then the parent node one-element up needs 2X bandwidth. The next parent node up requires 4X bandwidth, and so on. With tens of thousands of nodes in the supercomputer, it quickly becomes impossible to make fabricate interconnects fast enough for the parent nodes of the tree.
A practical application of the tree problem occurs on small Ethernet clusters. It is easy to make a 16-node 10Gb Ethernet cluster, because standard switches are readily available. As the system approaches hundreds of nodes, it becomes difficult to find fast enough switches. Even if the data communication speed to each node is reduced to 1Gb, for sufficiently large numbers of nodes, the backplane switches will be overwhelmed.