This same myth keeps being repeated by people who don't really understand the details on how processors internally work.
Actually, YOU are wrong.
You cannot just change the decoder, the instruction set affect the internals a lot:
All the reason you list could all be "fixed in software".
No, they cannot. OR the software will be terible slow , like 2-10 times slowdown.
The fact that silicon designed by Intel handles opcode in a way a little bit better optimized toward being fed from a x86-compatible frontend is just specific optimisation.
Opcodes are irrelevant. They are easy to translate. What matters are the differences in the semantics of the instructions.
X86 instructions update flags. This adds dependencies between instructions. Most RISC processoers do not have flags at all.
This is semantics of instructions, and they differ between ISA's.
Simply doing the same stuff with another RISCy back-end, i.e: interpreting the same ISA fed to the front-end, will simply require each x86 ISA being executed as a different set of micro-instructions. (some that are handled as single ALU opcode on Intel's silicon might require a few more instruction, but that's about the different).
The backend, the micro-instrucions in x86 CPUs are different than the instructions in RISC CPU's. They differ in the small details I tried to explain.
You could switch the frontend and speak a completely different instruction set. Simply if the two ISA are radically different, the result wouldn't be as efficient as a chip designed with that ISA in mind. (You would need a much bigger and less efficient microcode, because of all the reasons you list. They won't STOP intel from making a chip that speaks something else.
Intel did this, they added x86 decoder to their first itanium chips. And. They did not only add the frontend, they added some small pieces to their backend so that it could handle those strange x86 semantic cases nicely.
But the perfromance was still so terrible that nobody ever used it to run x86 code, and then they created a software translator that translated x86 code into itanium code, and that was faster, though still too slow.
Not only is this possible, but this was INDEED done.
There was an entire company called "Transmeta" whose business was centered around exactly that:
Their chip, the "Crusoe" was compatible with x86.
- But their chip was actually a VLIW chips, with the front-end being 100% pure software. Absolutely as remote from a pure x86 core as possible.'
The backend of Crusoe was designed completely x86 on mind, all the execution units contained the small quirks in a manner which made it easy to emulate x86 with it. The backend of Crusoe contains things like:
* 80-bit FPU,
* x86-compatible virtual memory page table format(one very important thing I forgot from my original list couple of posts ago; Memory accesses get VERY SLOW if you have to emulate virtual memory)
* support for partial register writes(to emulate 8- and 16-bit subregisters like al, ah,ax )
All these were made to make binary translation from x86 easy and reasonable fast.