Don't you worry, the GHz war is not done! There's talk of exotic materials (SiC, diamond, etc...) going to 10 GHz. If someone figures out how to make the Rapid Single Flux Quantum [wikipedia.org] digital chips with high temperature superconductors, then we may seriously start to see 1 THz clock speeds in practical computers, using extreme Peltier cooling to get the CPU core down to cryogenic temps.
The GHz war is over. The speed of light won. A long time ago, it stopped being "all about the transistor" and started being "all about the wires". IBM won the race to copper in 180nm (back when it was 0.18um), and that helped make those technologies even better, but about the time we hit 90nm, semiconductors were "fast enough", or even by some measurements stopped being able to speed up. Since then, almost all speed increases have been largely (but not exclusively) due to the transistors getting smaller, reducing the distance wires need to go.
The RC delay of wires is the major problem. R isn't going to be getting much better than copper. Silver has a lower resistance by a little bit, but it's too reactive to be used anywhere real. In these geometries, any alloy would be insufficiently mixable to be reliable, to say nothing about more exotic materials (like ceramics). There's some room for improvement in the dielectric (the "C"), but by the time you make a box with corners covering water permeability, thermal coefficient of expansion close to the wires, mechanical properties friendly to sub micron manufacturing, you have to concede you're not going to be able to get more than 20% faster there (and that we could dispute separately).
Take a cache. The slowest path is having a memory cell read. That tiny little device needs to have a measurable change in voltage on the bitlines, and be sensed by a sensing structure. That sensing structure has nothing to do with storage, so it's pure overhead and thusly you want as few of them as possible. Can you have it 16 bits away? 32? The days are gone that it was 64 bits away for any meaningful performance. There's nothing you can do to the characteristics of that little device (which needs to be minimum feature size to maximize the density of the cache) to dominate over the characteristics of the bitline he's trying to affect.
Take a data path. Even if 95% of your data is highly predictable, easily pipelined stuff with local signals, your critical path is going to involve signals from other areas of the chip, and they're going to have to be rebuffered and trucked from hundreds of microns away. No giant buffer in the history of man can dominate over a long distance wire. The signal will show up "eventually".
3GHz is a good place to stop. We make it to 4GHz with compromises in power, but beyond that and you're dedicating so much of your chip to rebuffering that you're blowing a lot of power on that. At that point, your pipeline is so many stages that branch mispredicts are very painful. You're devoting so much of your cycle time to setup and holds for your latches that you're going to be embarassed at how little work you can do in each cycle.
1 THz clock speeds are on their way, and maybe even higher. But they're not useful to CPUs or GPUs. They're useful for more exotic applications, primarily technology demonstrations.