What differentiates them is the number of instructions per cycle each core can execute, which is informed by the amount of on-die cache (L1, L2, and in the higher-end chips, L3) available. The cores are the same, the glue logic is effectively the same, with changes being necessary for the larger or smaller caches, but the caches are larger and that allows more instructions to be queued for execution...
First, I will freely admit that I couldn't design a modern CPU with a gun pressed to my head. BUT I'm not sure that the design of the "Core" series has much to do with ARM design. Yes, I get what you are saying about larger caches being generally faster/better; but the length of the execution pipeline makes even a bigger difference, as do niceties like lookahead, out-of-order execution, etc. And I seem to remember an Intel document where they SHORTENED the instruction pipeline, because the "cost" of refilling a long pipeline based on a bad-predictive branch decision was higher than refilling a shorter pipe. So sometimes, more isn't always better in CPU design.
It wasn't until the A7, which is a 64-bit chip, that Apple's CPUs became competitive with the high end of the Android market; and at that point they were completely destroying anything in the Android world. That is owed in large part to the larger 64-bit instruction set;
Ok, I'll give you that; especially with ARM, where Opcode and Operand are often combined in the same instruction. (Extrapolating from my experience writing 32 bit ARM assembly language).
A dual core chip can only work on two different things at once and, as a result, will switch execution context twice as often as a quad core chip when more than two processes are simultaneously demanding CPU time. Context switching is expensive, and that's why benchmarks that don't account for it don't matter in the real world.
You are right that context-switching is VERY important. And I seem to remember a recent article that complained the Android handled multiple threads much worse than iOS (which I found amazing, actually, because all the Linux people whine about OS X's context-switch overhead). And since Benchmarks actually test SYSTEMS not CPUs, it kinda all gets lost in the sauce.
Bottom line: I would really like to know what someone on the A[x] Chip Development Team has to say about all this. Do you have any of those friends in your back pocket?