Now that they can extract pure silicon 28 with a simple linear accelerator (which should have been obvious), it should be possible to use much larger dies without running into imperfection problems. That doesn't keep to Moore's Law, admittedly, but it does mean you can halve the space that double the transistors would take, since you're eliminating a lot of packaging. Over the space of the motherboard, it would more than work out, especially if they moved to wafer-scale integration. Want to know how many cores they put onto a wafer using regular dies? Instead of chopping the wafer up, you throw on interconnects Transputer-style.
Graphene is troublesome, yes, but there's lots of places you need regular conductors. If you replace copper interconnects and the gold links to the pins, you should be able to reduce the heat generated and therefore increase the speed you can run the chips. Graphene might also help with 3D chip technology, as you're going to be generating less heat between the layers. That would let you double the number of transistors per unit area occupied, even if not per unit area utilized.
Gallium Arsenide is still an option. If you can sort pure isotopes then it may be possible to overcome many of the limitations that have existed so far on the technology. It has been nasty to utilize, due to pollution, but we're well into the age where you can just convert the pollution into plasma and again separate out what's in it. It might be a little expensive, but the cost of cleanup will always be more and you can sell the results from the separation. It's much harder to sell polluted mud.
In the end, because people want compute power rather than a specific transistor count, Processor-in-Memory is always an option, simply move logic into RAM and avoid having to perform those functions by going through support chips, a bus and all the layers of a CPU in order to get carried out. DDR4 is nice and all that, but main memory is still a slow part of the system and the caches on the CPU are easily flooded due to code always expanding to the space available. There is also far too much work going on in managing memory. The current Linux memory manager is probably one of the best around. Take that and all the memory support chips, put it on an oversized ASIC and give it some cache. The POWER8 processor has 96 megabytes of L3 cache. I hate odd amounts and the memory logic won't be nearly as complex as the POWER8's, so let's increase it to 128 megabytes. Since the cache will be running at close to the speed of the CPU, exhaustion and stalling won't be nearly so common.
Actually, the best thing would be for the IMF (since it's not doing anything useful with its money) to buy millions of POWER8 and MIPS64 processors, offering them for free to geeks individually on on daughter boards that can be plugged in as expansion cards. At worst, it would make life very interesting.