Also, multicore designs can have separate memory.
NUMA comes to mind but it has complexity issues added to the OS and application. Accessing another CPU's memory is expensive, so the OS needs to try to keep the threads close to the data. The applications need to try to do cross socket communication by using a system API, assuming it exists, to find out which socket the thread is on and trying best to limit to current socket threads. Cross socket communication is probably best done via passing messages instead of reference because copying and storing the data locally, even if duplicated, will be faster than constantly accessing the other socket's memory.
Then you have the issue of load balancing memory allocations. May not always be an issue but it can become an issue if you consume all of one socket's memory. There are other issues like one socket may have direct access to the NIC while the other socket has direct access to the harddrives. Topology is important.
As soon as you step out of a cache-coherent system, then you run into even more fun problems. Stale data is a huge issue. You need to make sure you're always getting the current data and not some local copy. At the same time, without cache-coherency, cross core communication is very high latency. Most x86 CPUs can remain cache-coherent into single cycle latencies. While copying the data may not be any faster, you know if the data changed very very quickly. If the data is read a lot and rarely changed, then you have some nice guarantees about how quickly you know if the data changed and only incur the access cost when that event happens. Without coherency , you are now forced to check out to memory every time, incurring high latency costs every access.
With multi-core systems, cache-coherency has an N^2 problem. I'm sure someone will come up with an idea of "channels" to facilitate low latency inter-core communication while allowing normal memory access to be separate. Possibly even islands of cache-coherency in an ocean of cores. Each island can be a small group. Some of the many-core designs where they have 80+ cores have heavy locality issues. Adjacent cores are fast to access and far aware cores are very expensive. Pretty much think of each core only able to talk to adjacent cores, and requests to far away cores need to go many "hops". Even worse is cores physically nearer the memory controller have faster access to the memory. All memory requests have to go through these cores. Lots of fun issues that requires custom OS designs.
The main problem Intel has is that CPUs have not had significant speed improvements for years and that the high prices Intel asks
And yet Intel is still the fastest and best value. Here's hoping for some competition.
You might have mail.