> Personally I would be surprised if world models offered anything of value given they operate at such a low level.
You're thinking of the animal approach in the wrong way. Forget all the "world model" type, and just think of it as a predictive model, a near cousin of an LLM, that learns to predict next perceptual input(s) rather than next token form an historically gathered training set.
Let's also note that the input to an LLM really isn't text or symbolic sub-word tokens - it's really the high dimensional embeddings that are created at the input layer...
Now, contrast, or rather compare, this to the animal (let's say human) visually scanning lines of text in book, or street signs or whatever. The input will also be high dimensional embeddings, just ones that originated as visual input, and what-follows-what is exactly the same whether you are learning in real-time or learning from a frozen dataset. Obviously you will learn exactly the same as an LLM would have learnt given the same data frozen as a training set.
So, the animal can do (and we do!) exactly the same as an LLM, but it can also do a lot more, so it's capability is a superset of what an LLM can do.
Finally you should remember that continual learning and human/animal intelligence (AGI) are the two holy grails of AI research, and there are no easy answers, so if you think there are then you should realize you are misunderstanding something. If LoRA was the answer to continual learning, then they'd be using it. If looping the output of a model back into itself (re: your first reply in this thread) was all that was needed for animal intelligence, then we'd already have AGI.