It's not as obvious as it sounds. Some things get easier if you're basically still building a 2D chip but with one extra z layer for shorter routing. It quickly gets difficult if you decide you want your 6-core chip to now be a 6-layer one-core-per-layer chip. Three or four issues come to mind.
First is heat. Volume (a cubic function) grows faster than surface area (a square function). It's hard enough as it is to manage the hotspots on a 2D chip with a heatsink and fan on its largest side. With a small number of z layers, you would at the very least need to make sure the hotspots don't stack. For a more powerful chip, you'll have more gates, and therefore more heat. You may need to dedicate large regions of the chip for some kind of heat transfer, but this comes at the price of more complicated routing around it. You may need to redesign the entire structure of motherboards and cases to accommodate heatsinks and fans on both large sides of the CPU. Unfortunately, the shortest path between any two points is going to be through the center, but the hottest spot is also going to be the center, and the place that most needs some kind of chunk of metal to dissipate that heat is going to have to go through the center. In other words, nothing is going to scale as nicely as we like.
Second is delivering power and clock pulses everywhere. This is already a problem in 2D, despite the fact that radius (a linear function) scales slower than area and volume. There's so MUCH hardware on the chip that it's actually easier to have different parts run at different clock speeds and just translate where the parts meet, even though that means we get less speed than we could in an ideal machine. IIRC some of the benefit of the multiple clocking scheme is also to reduce heat generated, too. The more gates you add, the harder it gets to deliver a steady clock to each one, and the whole point of adding layers is so that we can add gates to make more powerful chips. Again, this means nothing will scale as nicely as we like (it already isn't going as nicely as we'd like in 2D). And you need to solve this at the same time as the heat problems.
Third is an insurmountable law of physics: the speed of light in our CPU and RAM wiring will never exceed the speed of light in vacuum. Since we're already slicing every second into 1-4 billion pieces, the amazing high speed of light ends up meaning that signals only travel a single-digit number of centimeters of wire per clock cycle. Adding z layers in order to add more gates means adding more wire, which is more distance, which means losing cycles just waiting for stuff to propagate through the chip. Oh, and with the added complexity of more layers and more gates, there's a higher number of possible paths through the chip, and they're going to be different lengths, and chip designers will need to juggle it all. Again, this means things won't scale nicely. And it's not the sort of problem that you can solve with longer pipelines - that actually adds more gates and more wiring. And trying to stuff more of the system into the same package as the CPU antagonizes the heat and power issues (while reducing our choices in buying stuff and in upgrading. Also, if the GPU and main memory performance *depend* on being inside the CPU package, replacement parts plugged into sockets on the motherboard are going to have inherent insurmountable disadvantages).