One of the biggest things that prevent a superscalar out of order processor from being fully utilized is data dependencies between instructions. For, instance, assume you have a 4 way out of order processor and want to calculate:
A = B*C + D*E
There's three math operations in there, and you can do up to 4 at a time, so you ought to be able to issue all three operations at the same time, right? And the answer of course is "nope", because you can't issue the addition until after the results from the two multiplications have already completed.
So, let's assume that you have a idealized 4 way superscalar processor with out of order execution and every instruction will take just 1 clock cycle. But, the average code that the processor sees will 100% of the time be able to execute at least 1 instruction, 50% of the time at least 2 instructions. 25% of the time at least 3 instructions, 12.5% at least 4 instructions, and so forth, halving the percentage of the time that N+1 more instructions can be executed. So, on average, how many instructions per second will this processor handle?
Doing, the math, you'll see the answer is 1.875 instructions per clock cycle. So, in other words, less than half the theoretical capacity of the chip. Now, there is some specialized code that can fully utilize the capability of saturating all 4 execution units, such as matrix multiplications and such. But the average code has that 100/50/25/12.5/.... pattern.
Now, assume you want a faster processor. Adding the ability to execute 5 instead of just 4 instructions at a time would speed things up. So, instead of an average of 1.875 instructions per clock, we get 1.9375 instructions per clock and waste an even larger percentage of the possible power of the chip. The limit would be 2 instructions per clock.
But, if you add a completely separate set of registers and execute a completely independent thread, so that the two threads don't share any dependences with each other, we get, with the same code base, on average, 3.25 instructions per clock, with 1.625 instructions per clock on each thread. So, by going to hyperthreading, we increase the system performance from 1.875 instructions per clock to 3.25 instructions per clock (a 73% increase), but decrease the per thread performance since the two threads are competing with each other for execution resources.
So, yes, dropping hyper threading support will increase single thread performance. And will lower power requirements since more of the chip is idle since it will usually not be capable of using its full capacity due to data dependencies between instructions. Seems a reasonable compromise since a laptop isn't a server handling multiple high computational tasks simultaneously.