What you're describing is the appearance of reasoning, which is not the same as actually reasoning. Joe Weizenbaum's Eliza program gave the appearance of understanding and empathizing with the user. That illusion was so convincing that even people who understood how the program worked were taken in, a fact that Weizenbaum found disturbing.
You're going to have to define "reasoning" if you want to make that argument. Otherwise it's a no true Scotsman fallacy.
This was easier to see with earlier models, where it took very little effort to show that the system was just producing text that looked like reasoning, not actually reasoning. For example, while the model would initially appear to be able to solve river-crossing puzzles, it would fail in amusing ways if you made small changes to the problem. Something as simple as changing the order of the items or the kinds of items would result in silly things like the risk of the cabbage eating the wolf or leaving the goat alone with the cabbage to spare the wolf. While newer models seem better, it's important to remember that nothing fundamental has changed.
The newer model, operating agentically, is able to generate code to solve the river-crossing puzzle. That is no longer vulnerable to more complex set of inputs or the swapping of order. Moreover, an example of an LLM making a mistake is actually not a good counter argument for intelligence. Humans also make mistakes all the time. Given enough time and effort, I'm sure you can find a human that will make the exact same mistake as the LLM.
This really should come as no surprise as we designed these things operate on statistical relationships between tokens, not on facts and concepts. They really do produce text one token at a time, functionally retaining no internal state between them. That is, they have no mechanism by which planning a response beyond the current token could be managed. If that wasn't enough, the model proper doesn't even select the next token, it only produces next-token probabilities from which the next token is ultimately selected. (Imagine trying to write a response when all you can do is roll the dice on a set of probable next words!) While they give the appearance of producing a well-considered holistic response to your prompt, such a thing is very clearly not possible.
Tokens represents facts and concepts. If something is unable to operate on those, even statistically, then it is thinking in some sense. An agent built on top of LLMs is able to leverage the reasoning capabilities of programming, so it is no longer simply selecting from statistical probabilities.
That's why I said that a quick look "behind the curtain" at how LLMs function should be more than enough to completely dispel the notion that anything remotely like factual reasoning is happening. Like Weizenbaum, I also find it disturbing that people cling to those mistaken beliefs even though they should know better. The illusion is compelling, sure, but we know that it's just an illusion.
If you look "behind the curtain" at the human brain, you might come to the same conclusion. After all, your brain operates on a purely physical level. All of its neurons are subject to the laws of quantum mechanics. There is nothing particularly intelligent about that. However, when you put them all together, we have what people call intelligence. That's emergence, which is when a complex entity gains properties that its constituent parts do not have on their own.
It's true that the neurons in our brain operate under a different model than probabilistic inference. However, it's not at all obvious to me that emergence can only occur in one of these and not the other.