Exponential compute resource does not mean exponentially more capable.
In fact, capability is more logarithmic, just obscene investments in resource produce marginally better results. The biggest gain is how long the models can keep up generating tokens in a coherent looking way, but the quality of the product represented by those tokens remains flawed in now familiar ways. Most painfully the flaw are non obvious.
Gave claude opus 4.6 a cakewalk of a task today, maybe 100 lines of code to generate. The output was a failure, "passing" test cases by adding clauses to swallow errors instead of fixing the mistakes, like invoking methods and resources that didn't exist. The output was salvageable insofar that it did get some tedious plumbing roughly right, apart from swallowing all errors and I had to fix that and rip out the upper layer of stuff and do it myself. But all this was, in the scheme of stuff I do, a super easy task. Usually I wouldn't trust it even this far, but it was so mind numbingly easy I thought it might just do it and save me tedium.
Tech execs in the midst of a hype cycle are always just insufferable.