Google announced roughly the same thing, on device models for phones a couple weeks ago at their developer conference. The 1b model is fine for basic tasks like turning on lights, checking email, social media notifications etc and runs ok on midrange phone hardware. The 4b model technically runs but it's borderline unusable speed but it can answer questions like "how does a microwave work?" with moderate accuracy at a semi-scientific level which is impressive. I suspect most devices will be able to run a 1b and by the end of the decade most everything will run a 4b model at least at talking speed. There's a concept that all AI processing will be done in the datacenter, I suspect 80%+ of consumer LLM will happen on the device, and more complex tasks will get routed to the cloud. For a lot of end users (high school students, etc) 98%+ of requests will be on-device.