AMD has been developing a new microarchitecture, Zen, which will replace the horribly-designed Bulldozer. It's rumored to be made on a 14nm node, and they re-hired the guy who designed the K10 architecture (aka the last good CPUs AMD made), so I expect it to be reasonably competitive with Intel. I really hope it is, at least.
Your terminology is completely out of whack ("stacked single-die CrossFire GPU" is a phrase with more contradictions than whitespace characters), but I'll analyze what you were trying to say instead of what you actually said:
#1: Current chip-stacking tech doesn't allow for all that much bandwidth between chips, especially when going above two layers. CPU cores need a pretty hefty amount of bandwidth to their cache, so that's already problematic. Stacking dies also limits thermal performance - if you stack two dies, you have 2x the heat in 1x the heat-conducting surface area. For low-power stuff, that's fine, but CPU cores get pretty hot. Many high-performance dies are already performance-constrained by how much heat they can conduct to their cooler.
#2. This is a good idea. Or rather, the good idea is "APU on an interposer using HBM for main memory". You'd need bigger CPU caches - HBM is ridiculously high-latency even by VRAM standards, it will really hurt CPU performance otherwise. And it will limit upgradability - no way to just pop another DIMM of DDR3 in there. But the GPU gains should be worth it.
#3. Again, thermals will absolutely prevent you from stacking GPU dies. HBM and stacking doesn't do ANYTHING for the power efficiency of the chips you're stacking, so that's two 100W+ dies on top of each other. Not gonna happen. You could stack them side-by-side on an interposer, but at that point why not just fabricate them as one die?
#4. The cost of an interposer is significantly greater than that of a printed circuit board, and a lot of stuff won't benefit from the greater bandwidth to the CPU - stuff like a USB controller or audio chipset. Stacking the dies is also more expensive than just using a PCB - it's done in phones where space is REALLY constrained, but even the smallest desktops aren't that tight for space yet. So all that's left is putting everything onto one die - which runs into yield problems, because with bigger individual dies, a single defect will wipe out a lot more silicon. AMD actually *is* already doing this with their lowest-end laptop/desktop parts - look at Socket AM1, there's not much on the motherboard besides external connectors and power-delivery circuits. But they're also pretty low-end in performance.