Yet that makes what happened even more strange. A long touted advantage of RISC was that because of its simplicity it could be clocked so much faster than CISC that doing less per instruction would still be faster net throughput. Yet what happened was that CISC (in the hands of Intel) could and did do and even outdo all the optimizations of RISC, including clock speed.
As you say, the key advantage of RISC is simplicity and speed, but the tradeoff is software needs to be more complex to work around the simplified instruction set. Intel recognised the risk of RISC to their business early, particularly noting that there would be once-off cost to develop the microcode that would enable the switch to RISC, after which their x86 advantage would be lost.
Cleverly, instead of trying to fight the RISC upstarts, Intel chose to develop the microcode themselves and enable it in hardware. They first implemented the decoder in their P6 architecture, which had the raw x86 instruction set on the surface (lots of complex instructions), but under the hood, it's all RISC with the decoder replacing those complex instructions with series of simpler instructions.
So a x86 CPU works by having a quick and heavy-duty decoder in the frontend, which takes x86 instructions and converts them to an optimized internal format for the backend to process.
What Intel has done is to settle on a fixed, stable CISC instruction format for the frontend, and a decoupled RISC backend they can tweak and modify to their heart's content without fear of losing compatibility. It's not quite the perfect solution, but with today's huge, complex CPU's, the decoder is a relatively small part of the silicon.