Some would argue the body-less entity would merely need a few volumes on physics to understand that.
No. Think about how, say, dogs understand physics. Obviously not via Newton's "laws" (or should I say, Newton's very useful mathematical approximations). Dogs navigate the world and 'understand' concepts like threats, prey, and mates well enough to persist in the world.
What LeCun is proposing is largely what self-driving cars already do. Waymo isn't driven by a Large "Language" Model that predicts word sequences based on what people wrote on reddit. It is based on a model of its physical interactions out in the world.
These big corpus of language and images that are scraped from the web are really just bootstrapping. AI's will be based more on their own experience as time passes. For example, call-center bots are presumably refined on all the data they collect interacting directly with people every day.