So even if Google doesn't index, say, the Wall Street Journal, can't Google still get the same news contributions form the AP newswire?
Or is there something special about AP license terms or something?
Having shifts in the address calculation is fine for ARM7 where you're trying to squeeze every possible functionality out of a tiny number of gates, and don't really care about performance. But for even a reasonably high-performance design, you need to have a consistent pipeline.
Probably the most important pipeline is the Decode->RegRead->AddressFormation->Dcache->Writeback pipeline. The latency of this pipeline is critical for performance. ARM has some advantages here: uniform (or, somewhat less so, semi-uniform, a la Thumb2) is easier to decode than variable-length at the byte level x86. Most architectures have an adder in the AddressFormation part (though notably not ia64). If you add two registers (which you can't in MIPS) you probably want to be able to shift by the access size because you're doing something like indexing into an array. So a small left shifter before the adder isn't uncommon, and it's usually about a 4:1 mux in terms of delay.
But ARM allows you to do full rotations in front of the adder. This means you need more levels of logic in front of the address calculation adder, which hurts your memory latency. You can make it a multicycle instruction or split it up into multiple instructions (and many implementations do), but that of course adds significant complexity.
The page table formats are kind of kooky. Most 32 bit architectures choose 4K pages as the minimal page size. 4K L1 translation and 4K L2 translation translates all 20 bits you need. The page tables are a multiple of the page size, which is handy. It's so clean, it's pretty obviously the "right" thing to do.
ARM has a 16KB l1 translation, because they used to support 1KB pages, but no longer do. They have strange attributes that move around the format, which makes it more difficult to manipulate the page table entries. They also have no free bits, which makes it a pain in Linux to keep information like how new or clean the page is.
I will say that the page tables are getting cleaner as they deprecate things like 1KB pages, but they're still pretty painful compared with other architectures.
The Alpha Architecture Handbook is a good read, and Alpha is my very favorite RISC. Not that it's magical, either, but it's a lot cleaner than ARM. And it's less than half the length of the ARM Architecture Reference Manual (ARM ARM, which I must admit is a clever acronym).
ARM has ARM mode, Thumb Mode, Jazelle Mode, and ThumbEE mode. FOUR instruction sets. Multiple different floating point unit specs that are incompatible with each other. Crazy page table formats. The architecture spec is over 2000 pages long, for pete's sake!
ARM has a more uniform encoding, but actually has a large number of instructions, and does crazy things like put a rotating shifter in the load address path. Not good from a modern pipeline perspective. You can get around it by breaking up the operation, but then you're getting into complex instruction decode like x86.
I'm not saying ARM is bad. I'm just saying they have no magic. You're right, Intel doesn't either (though they do have manufacturing and an army of engineers to do hand-layout). Nor does MIPS or PPC. But MIPS does make energy efficient cores, roughly as good as ARM. They haven't been as popular as ARM, but they're around.
And I'm certainly not saying x86 is great -- it's certainly not. I don't think it's quite as bad as people make it out to be...
Look, I wish the architecture made a difference. For one, we'd all probably be using Alpha. That was a great, elegant, beautiful processor architecture. For another, I'd have much better job prospects. But it doesn't matter that much. Scalar architectures are scalar architectures. Instruction set makes some difference, but not very much.
Yes, ARM marketing (notoriously overoptimistic) says they will have a 2GHz A9 in 28nm, relatively high performance process.
But A9, in terms of efficiency, is not substantially better than where Atom will be. That shouldn't be surprising. They're both scalar architectures. They both have a little less than 15 useful registers. They both have similarly deep pipelines. They both rely on branch prediction for performance. Neither company has magic, it's not surprising that they're similar on the curve of performance / efficiency.
Put another way, your instruction encoding doesn't really buy you all that much.
Now ARM has some lower-end cores (ARM9, ARM11, Sparrow/CoretexA5) that are much more energy efficient than Atom. But they're also much lower performance.
But this is how ARM's marketing plays it out: we have super-efficient cores (ARM9)! We have higher-performance cores (Theoretically, A9)! You think that ARM cores are somehow both high performance and much more efficient than Atom will be in the same technology... but this will probably turn out to be false.
Put another way... are MIPS or PowerPC cores dramatically more efficient than x86 at similar performance levels? No. They have most of the same architecture benefits that ARM does... more, in many ways, because they have about double the number of useful registers. But they're on basically the same efficiency/performance curve as everyone else.
You could probably do an x86 implementation that was similar to ARM11/A5... no floating point, no SSE, just the basic 386 instruction set. Give it a short pipeline and turn down the frequency, and it will probably compete relatively well on energy efficiency with those low-end ARMs.
The thing I DON'T understand... why does ARM marketing get an article on slashdot every week or so?
VLIW will be back soon enough
Sooner than you know.
"Your mother was a hamster, and your father smelt of elderberrys!" -- Monty Python and the Holy Grail