I wonder if you wouldn't win if you just told ChatGPT to write an chess AI and then used the chess AI to beat the Atari. Writing code is something text models are good for. Playing chess is not.
The devil is in the detail.
All chess algorithms are A-star: they search a min-max tree, but use heuristic to prioritize some branches instead of doing width- or depth- frist.
Generatingn a template of a standard chess algo would be probably easy for a chatbot (these are prominently featured in tons of "introduction to machine learning" courses that training the LLM could have ingested), writing the heurisitc function to guide the A-star search is more an art-form and is probably where the chat bot is going to derail.
Funnily though, I would expect that if you used the chatbot AS the heuristic it wouldn't be a super bad player.
Have some classic chess software that keep tracs of the board and lists all the possible legal moves, then prompt the chatbot with something like:
"This is the current chessboard: {state}, these at the last few moves: {history}, pick the most promising among the following: {list of legal move}".
In fact, decades ago that's how some people have applied hidden Markov models to chess.
Similarly, I would expect that during training, the LLM would have been exposed to a large amount of all games available only, and has some vague idea of what a "winning move" looks like given a current context.
Not much trying to simulate moves ahead, as rather leveraging "experience" to know what's best next for a context, exactly like the "chess engine+HMM" did it in the past, but a lot less efficient.