Good point. Obviously, a counter example is very simple in "proof structure", especially as it does not need to tell you anything about what an optimal result would look like.
I guess Mathematicians will continue to have good job opportunities after all.
As to what that LLM does, there are a lot of non-statistical tools it could be using. Obviously, if they tell us, say, "the LLM handed 100'000 possible counterexamples to Wolfram Alpha and Wolfram Alpha picked the single one that was not nonsense", that kind of would destroy the picture they are trying to paint of their product. On a related topic, I am beginning to suspect that Claude Mythic may be running very conventional tools to help it find bugs in code.