Self-driving software recognizes (classifies) and understands the significance of objects such as people, animals, road signs, traffic lights, stationary and moving vehicles of various sizes, lamp posts, trees, curbs, speedbumps, buildings, walls, traffic cones etc. It models predicted behaviours of those of those object types that move of their own accord. It then makes real-time driving plans accordingly.
Large language models have trained so much on the relationships of words to each other in a large chunk of all of human discourse that their internal neural-net representations of these symbols can be said to be modelling the situations that phrases and sentences and paragraphs of these words describe.
The models are then able to generate novel language (novel descriptions of implicit or imagined plausible sub-situations) to answer questions. The situations described by the answers were sometimes only implicit in the trained-on situations, not explicit. The system "understands" that type of situation in general (the general types of relationships and evolutions of relationships that occur in those type of situations....). So it gives you a plausible but sometimes creative answer.