Comment Interesting fact (Score 1) 110
But I guess using a 486 today is pretty unique anyway, so it doesn't really matter.
You said it has no memory and it does. Now you seem to be making a different argument.
It's not a different argument. I guess you are not really motivated to try and understand the difference between a KV cache to reduce compute and the meaning of stateless in this discussion. So I think I'll end the discussion here. You are of course welcome to think that Transformers can actually think and reason, enough people do, and that fuels the hype for the moment.
I'm hoping everyone can get more realistic understanding what the technology actually can do, and what it can't. So we can start to use it for scenarios where it is useful. But that will take time.
Of course it does, that memory takes the form of the KV cache.
The Transformer itself has no state, so no. It has no clue if those tokens were generated earlier by itself or if it's an entirely new input. Doesn't matter where those generated tokens are stored, the actual processing by the transformer has not memory other than the input. Given the same input it will always generate the same output. It is stateless and deterministic.
They very much can extrapolate, this technology would be rather pointless if they couldn't.
That depends on what "AI" you are talking about. LLM's certainly can't extrapolate. The technology is called a Transformer and it assigns a probabilistic value to a list of next probable tokens (word or part of word). The Transformer (model) is stateless and deterministic. It only generates the probability list for a single next token each time it is run and it has no memory of previous runs. It has no clue if the most probably token will be selected or the least probably token (unlikely, but still), that is configured by the temperature settings. So no, it can select probably next tokens, that is not the same as extrapolate anything. It doesn't reason or think like that at all.
Other ML models can, if they are trained for that purpose.
16.5 feet in the Twilight Zone = 1 Rod Serling