I think the idea is not just to have cameras for normal camera things like taking photos/videos, but to generally have more sensors on you so the AI can get a continuous read of what you're doing in order to respond and act in context. Like you could already wear headphones and have voice AI in your ear but it doesn't know your situation beyond maybe general location and whatever it can pick up with mics.
I can imagine being able to point at things to refer to them or asking "which one of these is best" type of questions without having to describe what you're looking at. It would probably be much better at proactively providing information too, or even taking actions for you.
I'm not sold on whether we really need any of that but I think that's the idea.