The P5600 core is being touted as supporting up to six cores in a cache-coherent link, most likely similar to ARM's CCI-400.
The CCI-400 is not relevant here. In both MIPS and ARM worlds CPUs are now multi cores capable out of the box. One cluster can be configured from 1 to 4 cores typically, and here for this latest MIPS up to 6. The L2 management is handled as part of the cluster, which also typically supports coherency with external hardware accessing the L2 through one or several coherency port(s). The L1 cache(s), the L2 and the hardware are kept coherent inside a cluster (with some limitations at times on the low end, there are variants). All this can be taken for granted at the high-end, as here.
Now what the CCI-400 does is different: it extends coherency management between several clusters. This is very important in the ARM world because of the big.LITTLE scheme: you want the big cluster and little cluster to be kept coherent to speed-up and easy the transition between the low-power and high performance modes (that also helps when all cores are used at all times, as the OS can migrate tasks between cores more efficiently).
Rising defect densities have created a situation where — for the first time ever — 20nm wafers won't be cheaper than the 28nm processors they're supposed to replace.
The economic part is often left out on tech sites discussions, but it matters a lot. Up to now we had a sustainable situation where the cost of new processes increased regularly, but at the same time eventually the cost of the new process was lower. This allowed to get all on board and to also increase the reachable market, to get more revenues. That's why we have small micro-controllers everywhere nowadays.
Now when the cost of new processes increases, only the part of the market that trully need the improved density and performance will move on. And that's only a small part of the whole market. So we will have increasing costs, with a reducing addressable market. Double whammy. Expect end prices for high performance to rise quickly. That may slow down things significantly.
We'll see how it develops soon, but I would expect the economic to bite before we reach tech limits.
Owning its own fab means that Intel can tweak process technology to match the particulars of a given architecture (and vice-versa)
That may be understood as an Intel exclusive, but it's not entirely true. Even in the fabless world the big shots (Qualcomm, NVidia, AMD & co) have very early access to new process nodes and can certainly tune their design to it, and have their own specifics tweaks made. So they can do both kind of adaptation too, although it's not as integrated as for Intel. If you draw a line, Intel is at one extreme being able to have close integration, the small fabless companies are at the other extreme taking the stock TSMC or GF or UMC or else offering as-is. But the big fabless guys are somewhat in the middle.
ARM, in contrast, is limited by the decisions of the foundry manufacturers it partners with.
It's also a bit misleading. ARM has early access to all big fabs (Globalfoundries and TSMC), and because ARM is so pervasive there is a very very high pressure for a fab to provide the best ARM implementations on their process. So sure, it's the fab making the decisions on their process in the end. But you can bet they will pay a lot of attention to any ARM feedback gained during the early access co-work.
ARM doesn't only provide processor IP, they do the whole range now from memory cells to GPUs to interconnect to memory controllers. And they work with the fabs to optimize their design for them and provide their customers "Process Optimization Packages" (POPs) that summarize how to get the best of a process for their IP. So ARM has the know-how, the access and the pull to have a big say in what happens in the fabs roadmaps.
This is why all fabs start a new process node in high power first, as it's easier and you can get away with a so-so yield. Then some move to the more tricky low-power variant where it's at the same time more technically challenging and you need to get the cost low on top of it.
Now Intel is certainly the king of high-performance processes. But for low-power for mobile TSMC has been producing 28 nm parts for some time now, while what you can buy from Intel is still 32 nm. The 22 nm low-power Atoms from Intel are supposed to be available real soon now. But the key question is their cost: Intel could afford to take a hit to take some market initially, but in the long run they need to be competitive at the lower price point seen in mobile. Will there be able to do that it's still an open question. Having good performance from hand picked 22 nm Atom and having 22 nm parts in products are good steps but it's not sufficient. In the long run Intel must be able to make sufficient money on low cost parts. TSMC knows how to do that and will be there. For Intel, it's still an open question.
Another important point is that TSMC has experience in supporting customers for fabless, as it's their core business. While for Intel it's still very new. People may not appreciate how difficult all this is and how good support is key. There's a lot of know-how and process based on painful experiences, and even with a good process it takes time to build a good service part. Intel has started this, but mostly with simple designs (FPGA are the simpler you can get, that's why they're always the first on any new process node) and most are not out yet.
Lastly, Apple has massive volumes and can only take limited risks on production of their APs. TSMC looks like a much safer bet at this point. If Intel can prove they really can deliver in volume at low cost with good support (it's economics & process here, not having good technical metrics on a few samples) then it'll still be possible to switch in the future. At this time it would be a bit reckless IMHO. And I'm sure Apple did their due diligence all right on all this (with people that do understand the details of this complex business).
If blobs were ONLY firmware, they could run ONLY on the device, and could be loaded once at installation time. Very few fall into this category. (Some wifi chips do load this way upon every boot).
Even when a firmware blob runs only on the device I would expect it to be loaded every time the device is reset, particularly for a WiFi chip. If you want the blob to be persistent you must add a local Flash to the WiFi subsystem, which increases the BOM cost. And at the very low price in WiFi this is just not acceptable anymore, the chipmaker would sell nothing. So there's no such local storage (except for a minimum bootloader maybe, and that could be in the chip in ROM) and the chip will load it's executable from the host at each reset. I've worked on systems (cellular, not WiFi) that do just that.
You are in a maze of little twisting passages, all alike.