Can we please stop saying that LLMs are 'nondeterministic'... It's misinformed. And it matters (because some people associate nondeterministism with some sort of magic, and others associate the word with inaccuracy -- neither of which are true).
These models are clearly, architecturally, mathematically deterministic at a fundamental level, although this determinism is hidden from users.
Let me explain. If you take any recent LL model (GPT, Llama, Gemini, etc...) and run it with 'temperature' set to zero, then, for a given input, it will produce exactly the same token, every single time it runs. That's not surprising, the weights are baked into the nodes of the model and calaculate the same output every time (of course excluding very rare things like uncorrected cosmic-ray bit-flips).
Now, this has the side effect of introducing some undesirable behaviour (rigid responses, and looped outputs in certain contexts), so the 'temperature' parameter is used to mix things up a bit (this parameter is usually hidden ftom users in cloud models).
When tokens are returrned on an inference run, there is an array of tokens provided with probabilities of being the 'right token'. The temperature value just uses a pseudorandom algorithm to return less likely tokens a proportion of the time (give the user the 2nd or 3rd most likely token some of the time instead of the most likely). To be clear, this has no effect whatsoever on the deterministic nature of the model, it's just a fudge-factor applied on the results after the token array is returned.
The LLM models don't do something unpredictable or nondeterministic, they are just like any other computer algorithm run on a deterministic Turing machine.
The debate around what sort of informational structure is represented in these models is interesting, but please don't lean on nondeterminism to explain your biases (positive ot negative about LLMs)... it's just incorrect.