All the reason you list could all be "fixed in software".
The quotes around the "software" mean that i refer about the firmware/microcode as a piece of software designed to run on top of the actual execution units of a CPU.
No, they cannot. OR the software will be terible slow , like 2-10 times slowdown.
Slow: yes, indeed. But not impossible to do.
What matters are the differences in the semantics of the instructions.
X86 instructions update flags. This adds dependencies between instructions. Most RISC processoers do not have flags at all.
This is semantics of instructions, and they differ between ISA's.
Yeah, I pretty well know that RISCs don't (all) have flags.
Now, again, how is that preventing the micro-code swap that dinkypoo refers to (and that was actually done on transmeta's crusoe)?
You'll just end with a bigger clunkier firmware that for a given front-end instruction from the same ISA, will translate into a big bunch of back-end micro-ops.
Yup. A RISC's ALU won't update flags. But what's preventing the firmware to dispatch *SEVERAL* micro-ops ? first to do the base operation and then aditionnal instructions to update some register emulating flags?
Yes, it's slower. But, no that don't make micro-code based change of supported ISA impossible, only not as efficient.
The backend, the micro-instrucions in x86 CPUs are different than the instructions in RISC CPU's. They differ in the small details I tried to explain.
Yes, and please explain how that makes *definitely impossible* to run x86 instruction? and not merely *somewhat slower*?
Intel did this, they added x86 decoder to their first itanium chips. {...} But the perfromance was still so terrible that nobody ever used it to run x86 code, and then they created a software translator that translated x86 code into itanium code, and that was faster, though still too slow.
Slow, but still doable and done.
Now, keep in mind that:
- Itanium is a VLIW processor. That's an entirely different beast, with an entirely different approach to optimisation, and back during Itanium development the logic was "The compiled will handle the optimising". But back then such magical compiler didn't exist and anyway didn't have the necessary information at compile time (some type of optimisation requires information only available at run time. Hence doable in microcode, not in compiler).
Given the compilers available back then, VLIW sucks for almost anything except highly repeated task. Thus it was a bit popular for cluster nodes running massively parallel algorithms (and at some point in time VLIW were also popular in Radeon GFX cards). But VLIW sucks for pretty much anything else.
(Remember that, for example, GCC has auto-vectorisaion and well performing Profile-Guided-Optimisation only since recently).
So "supporting an alternate x86 instruction on Itanium was slow" has as much to do with "supporting an instruction set on a back-end that's not tailored for the front-end is slow" as it has to do with "Itanic sucks for pretty much everything which isn't a highly optimized kernel-function in HPC".
But still it proves that runing a different ISA on a completely alien back-end is doable.
The weirdness of the back-end won't prevent it, only slow it down.
Luckily, by the time Transmeta Crusoe arrived:
- knowledge had a bit advance in how to handle VLIW ; crusoe had a back-end better tuned to run CISC ISA
Then by the time Radeon arrived:
- compilers had gotten even better ; GPU are used for the same (only) class of task at which VLIW excels.
The backend of Crusoe was designed completely x86 on mind, all the execution units contained the small quirks in a manner which made it easy to emulate x86 with it. The backend of Crusoe contains things like {...} All these were made to make binary translation from x86 easy and reasonable fast.
Yeah, sure, of course they are going to optimise the back-end for what the CPU is the most-likely to run.
- Itanium was designed to run mainly IA-64 code with a bit support for older IA-32 just in case in order to offer a little bit of backward compatibility to help the transition. It was never though of as a IA-32 workhorse. Thus the back-end was designed to run mostly IA-64 (although even that wasn't stellar, due to weirdness of VLIW architecture). But that didn't prevent the front-end from accepting IA-32 instructions.
- Crusoe was designed to run mainly IA-32 code. Thus the back-end was designed a bit to run better IA-32 code.
BUT
To go back at what dinkypoo says: the back-end doesn't directly eat IA-32 instruction in none of the mentionned processor (neither recent Intel, nor AMDs, nor Itaniums, nor Crusoe, etc.) they all have back-ends that only consume the special micro-ops that the front-end feeds them after expanding the ISA (or the software front-end in case of Crusoe or the compiler in the CPU in case of Radeon). That's true, and you haven't disproved it (of course).
Also, according to dinkypoo, you should be able to replace the front-end, without touching the back-end, and you will be able to get a chip that supports a different instruction set (because the actual execution units never see it directly).
The Crusoe is a chip that did exactly that (thank to the fact that their front-end was pure software), and in fact HAD its microcode swapped, either as an in-lab experiment to run PowerPC instruction, or as a tool to help test x86_64 instruction.
So no, your wrong that "you can't simply change the decoder" is false. YOU CAN simply change the decoder, it has been done.
Just don't expect stellar performance depending on how much your back-end in the way it works from the target instruction set. (x86_64 on a Crusoe vs. IA-32 on an Itanium).
It might be slow, but it works. you're wrong, dinkypoo is right, accept it.