On the other hand, designs with less energy loss will open up the potential of higher speeds, once the techniques get refined.
No and here is why.
For a CPU of a given complexity, a specific area is needed for transistors, routing, etc in a given process. If the process density goes up, then the power has to be lowered to maintain the same power/area because the area largely determines the thermal resistance and for the past few generations, high performance CPUs already operate with the junction temperature as high as is reliable. So power is proportional to chip area and higher density processes yield smaller chips so power has to be lower.
You can see this trend in Intel processors since about the Core2. The highest power models all have a power rating proportional to area and since more recent models are smaller, they have lower power ratings.
This is also why stacking memory on top of logic is not going to happen for anything except low performance logic.