"write a six-word story about baby shoes," riffing on the famous (if apocryphal) Hemingway story. They failed but not in the way I expected. Bard gave me five words, and ChatGPT produced eight"
I dabbled with ChatGPT and first asked if it knew what a drabble was. It responded yes - a story of exactly 100 words.
However, when I then asked it to produce a drabble on "something vaguely scientific", over multiple attempts, it did anything from 85 to 130 words. But never hit the magic 100, despite the "exactly 100 words" being part of the core definition.
Thinking about it, and the best way to encourage random ideas in a group of people is to have one person spouting random stuff to a room of listeners. The listeners will then either rebut the original statement, or will then go "Hmmm. Not thought of that. Sounds interesting". Either way, the LLM will be tuned either to have improved accuracy (discount the provably incorrect) or better "insight" (the random discovery that actually works, like bacon and maple syrup, rather than the ones that don't, like bacon and wallpaper paste)
So perhaps the inability to do basic counting is a feature not a bug, as it is designed to provoke further thought. Which is fine when it is used amongst a bunch of fellow researchers, but makes for errors when just taken at face value