Language does not exist in a vacuum. It is a result of the thought processes that create it. To create language, particularly about complex topics, you have to be able to recreate the logic, or at least *a* logic, that underlies those topics. You cannot build a LLM from a Markov model. If you could store one state transition probability per unit of Planck space, a different one at every unit of Planck time, across the entire universe, throughout the entire history of the universe, you could only represent the state transition probabilities for the first half of the first sentence of A Tale of Two Cities.
For LLMs to function, they have to "think", for some definition of thinking. You can debate over terminology, or how closely it matches our thinking, but what it's not doing is some sort of "the most recent states were X, so let's look up some statistical probability Y". Statistics doesn't even enter the system until the final softmax, and even then, only because you have to go from a high dimensional (latent) space down to a low-dimensional (linguistic) space, so you have to "round" your position to nearby tokens, and there's often many tokens nearby. It turns out that you get the best results if you add some noise into your roundings (indeed, biological neural networks are *extremely* noisy as well)
As for this article, it's just silly. It's a rant based on a single cherry picked contrarian paper from 2024, and he doesn't even represent it right. The paper's core premise is that intelligence is not lingistic - and we've known that for a long time. But LLMs don't operate on language. They operate on a latent space, and are entirely indifferent as to what modality feeds into and out from that latent space. The author takes the paper's further argument that LLMs do not operate in the same way as a human brain, and hallucinates that to "LLMs can't think". He goes from "not the same" to "literally nothing at all". Also, the end of the article isn't about science at all, it's an argument Riley makes from the work of two philosophers, and is a massive fallacy that not only misunderstands LLMs, but the brain as well (*you* are a next-everything prediction engine; to claim that being a predictive engine means you can't invent is to claim that humans cannot invent). And furthermore, that's Riley's own synthesis, not even a claim by his cited philosophers.
For anyone who cares about the (single, cherry-picked, old) Fedorenko paper, the argument is: language contains an "imprint" of reasoning, but not the full reasoning process, that it's a lower-dimensional space than the reasoning itself (nothing controversial there with regards to modern science). Fedorenko argues that this implies that the models don't build up a deeper structure of the underlying logic but only the surface logic, which is a far weaker argument. If the text leads "The odds of a national of Ghana conducting a terrorist attack in Ireland over the next 20 years are approximately...." and it is to continue with a percentage, that's not "surface logic" that the model needs to be able to perform well at the task. It's not just "what's the most likely word to come after 'approximately'". Fedorenko then extrapolates his reasoning to conclude that there will be a "cliff of novelty". But this isn't actually supported by the data; novelty metrics continue to rise, with no sign of his suppossed "cliff". Fedorenko argues notes that in many tasks, the surface logic between the model and a human will be identical and indistinguishable - but he expects that to generally fail with deeper tasks of greater complexity. He thinks that LLMs need to change architecture and combine "language models" with a "reasoning model" (ignoring that the language models *are* reasoning - heck, even under his own argument - and that LLMs have crushed the performance of formal symbolic reasoning engines, whose rigidity makes them too inflexible to deal with the real world)
But again, Riley doesn't just take Fedorenko at face value, but he runs even further with it. Fedorenko argues that you can actually get quite far just by modeling language. Riley by contrast argues - or should I say, next-word predicts with his human brain - that because LLMs are just predicting tokens, they are a "Large Language Mistake" and the bubble will burst. The latter does not follow from the former. Fedorenko's argument is actually that LLMs can substitute for humans in many things - just not everything.