That is more or less accurate. The goals of the original RISC were stated to be making a Reduced Instruction Set Computer, but what was in fact produced was a Reduced Instruction Set Complexity CPU. By restricting the touching of memory to only loads and stores, all other instructions that were able to be executed in one clock COULD be executed in one clock always. Whereas some CISC instructions involving arrays could kick off 10+ memory touches as a side effect, RISC instructions could never do that (sans via exceptions). So when all 10 of those memory touches weren't required, the RISC architecture could optimize away the unnecessary ones (which was a bitch in 1990, but common place by 2000 and exceedingly trivial by 2010, to put it roughly).
I taught CISC architectures (68K mostly) and was a minor architect for PowerPC (I helped work on the early EABI- embedded application binary interface- architecture)
But this leads to a problem: Cache. That CISC operation that made 10 memory touches took roughly 10-18 bytes of instruction storage (68K example), and 10 data cache accesses that would either hit or miss. But a 16 bit RISC would take 22 bytes (and didn't double the number of useful registers available) and a 32 bit RISC would take 44 bytes (but generally doubled the number of useful registers, reducing the need for so many loads and stores). Thank goodness you took fewer transistors to implement the instruction pipeline, because you need them all back to make the Icache bigger! The hope being that those 10 memory touches were rarely needed if you had more registers, so you could cut back on other loads somewhere (but we didn't get really good at doing that automatically until the late '90s, by which time we could show that the RISC penalty was effectively negated, specific numbers remain the property of my name-changed employer but were down to single digit percentage differences). Dcache would have the same hits and misses, unless you were also able to allocate saved transistors to some Dcache which might affect hit rates by some low single percentage points.
But with complicated instructions come pipeline clocking challenges. Implementing the entire x86 pipeline in 5 stages would result in having a sub-200 MHz pipeline today- the P4 push to 4 GHz required up to 19 stages (and who knows how many designers) worst case, IIRC! Meanwhile, most RISC architectures zoom along happily with 5-7 stages and only manufacturing nodes or target design decisions keep them from clocking up to x86 frequencies.
Hands down, it was never any 'benefits' of CISC (or, specifically, the x86 architecture) that allowed Intel to take the field, it was market forces and manufacturing might. A win is a win.
BTW, to the AC GP, just because an instruction appears complex (most SIMD operations, MADDs, FPSQRTRES, etc...), they still count as RISC if they can be either executed in one clock or at least pipelined with nominally one result per clock if they don't impact the pipeline for all the other commonly executed instructions. After all, we can made a divide instruction execute in 1 clock, too, as long as you don't mind your add instructions taking 16x longer (though still one clock), but that is cheating.