Complex puzzles require deep reasoning.
True in spirit, but misleading in implication. "Deep reasoning" isn’t synonymous with explicit, stepwise logic. Much of human problem-solving relies on heuristics, pattern recognition, and compressed experience. We often simulate solutions rather than derive them. The complexity of a puzzle doesn’t necessarily demand conscious logic—it demands a good internal model that can make the right inferences efficiently, which is a broader and deeper capability than just reasoning.
As humans, we are programmed to use our brains and multi-paradigm experience to quickly trim down the decision tree of obviously-wrong solutions.
That’s not how cognition works. There is a 500ms wide wall between reality and our perception of it. That’s how long it takes for a photon striking the retina, or for vibrations hitting the tympanic membrane, to be transduced into neural signals, processed through the thalamus and primary sensory cortices, and integrated in our perception of the world. This gap is real, and not controversial, and any theory of cognition has to account for it. Yours does not.
Our brains do not perceive reality in real time. Instead, they have a model of reality, and then update that model with predictions based on the current model. Our brains don’t wait for options to trim; they’re constantly generating predictions about what we’ll see, feel, and do next. For example, when you reach for your coffee cup, you don’t scan a decision tree of possible cups or grasp points. Your brain already predicts where the cup is, how it feels, and how your hand will move. Action flows from a continuous simulation, not from post hoc evaluation. You “just do it” because the model is already in place.
As we go down the complexity depth, we prune more silly solutions and just refine the end outcome; we become better at homing in on the solution.
This is a neat narrative, but not supported by cognitive science. Humans actually get worse at solving deeply complex problems unless they can offload structure to external tools (math, diagrams, language). We aren’t exhaustive tree-searchers—we’re satisficers, pattern matchers, and model-builders. Pruning silly options, as you put it, makes sense after you've internalized the structure of a domain, not as a general-purpose hueristic for all complex tasks.
AI models are different in this regard. They are just statistical probability machines.
This assertion is true, in a reductionist sense, and like all reductionist arguments against AGI, it misses out on emergence. Modern language models are trained as probabilistic predictors, yes—but what emerges from that process are latent internal representations that encode abstract relationships, causal inference patterns, and even planning behaviors. Saying they’re just statistical is like saying the brain is just a pile of neurons firing. True in a reductionist sense, but profoundly uninformative.
The greater the complexity depth, the more variables they need to consider in the equation, and without actual intelligence and perception of the problem, they are fundamentally unable to accurately and efficiently discriminate against obviously wrong solutions;
Hmmm. Did you read the paper? The paper shows that even advanced LLMs fail at deeper compositional problems not because they can’t process more variables, but because their reasoning effort actually decreases as complexity increases. That suggests a misalignment between internal representations and inference strategies—not a hard ceiling on intelligence. A good analogy is a student taking a math test. On familiar problems, they work step-by-step and usually get it right. But when the problem is longer or phrased differently—something novel—they sometimes give less effort, not more. They rush, guess, or bail out, even with time left on the clock. Not because they’re incapable, but because their usual strategy doesn’t apply, and they haven’t internalized how to adapt. The paper shows that something similar is emerging in LLM behavior. When tasks get harder, they don’t dig deeper—they think less. It’s not a failure of compute; it’s a failure of alignment between what the model knows and how it applies that knowledge under strain.
paralysed and requiring more and more computational power with no guarantee of a good outcome.
Paralysis implies indecision; these models don’t dither—they confidently return incorrect answers, just like the math student above who circles “C” and moves on. That’s arguably worse than hesitation, because it hides failure behind fluency. And yes, deeper problems demand more compute—but that’s just as true for humans (we’re just better at concealing when we’re lost). Importantly, scaling does yield gains—until the model’s internal representations can no longer scaffold the task. That doesn’t mean AGI is doomed. It means we need architectures that can simulate structure, not just generate sequences—and we need to measure emergent behaviors, not just final answers.