But what if you had a similarly loose platform but it's running a kiosk and that kiosk software is purportedly designed to keep the user on acceptable rails.
There is a lot of leverage done by the "similarly".
Apple's computers run on 6502.
This was an insanely popular architecture. It's been used in metric shit tons of other hardware from roughly that era. There are insane amounts of resource about this architecture. It was usually programmed in assembly. There has been a lot of patching of binaries back then. These CPUs have also been used in courses and training for a very long time, most of which are easy to come by. So there's an insane amount of material about 6502 instructions , their binary encoding, and general debugging of software on that platform that could be gobbled by the training of the model. The architecture is also extremely simple and straightforward with very little weirdness. It could be possible for something that boils down to a "next word predictor" to not fumble too much.
Anything developed in the modern online era, where you would be interested in finding vulnerabilities is going to be multiple order of magnitude more complex (think more multiple megabytes of firmware not a 120 bytes patch), rely on very weird architecture (a kiosk running on some x86 derivative? one of the later embed architecture that uses multiple weird addressing mode?) and very poorly documented.
Also combine this with the fact that we're very far into the "dimishing returns" part of the AI development, where each minute improvement requires even vastly more resources (insanely large datacenter, power requirement of entire cities) and more training material than available (so "habsburg AI" ?), it's not going to get better easily.
The fact that a chat bot can find a fix a couple of grammar mistake in a short paragraph of English doesn't mean it could generate an entire epic poem in a some dead language like Etruscan (not Indo-European, not that many examples have survived, even less Etruscan-Latin or -Greek bilingual texts have survived to assist understanding).
The fact that a chat bot successfully reverse engineered and debugged a 120-byte snipped of one of the most well studied architecture doesn't mean it will easilly debug multi-mega bytes firmware of some obscure proprietary microcontroller.