It's harder than you think. Those older systems sucked, and couldn't handle natural language queries. The issue is not processing power, it's having a large enough volume of training material and mimicking how the brain fills in gaps.
Training material isn't just a case of gathering samples. When the machine makes a mistake, it needs to understand why. The collection needs careful curation and sorting to be useful. Such databases are extremely valuable, and historically with OS projects they often started with a donation from a commercial body rather than from scratch.
Mimicking the brain is also extremely hard. Often people don't hear things very clearly or in full, due to environmental noise, poor pronunciation and the like. To compensate the brain fills in the gaps or makes assumptions. People have been trying to program those assumptions into computers since the 1980s. Again, a database of that knowledge will be vast and valuable. Either you throw massive human resources at building it, or you crawl the web and look at trillions of search queries like Google does.
That's also why they need a cloud service to do this. The database is vast and proprietary, and querying it far from a trivial SQL command.
It's not just a programming or AI training problem, which is why no-one is doing it. The closest thing the OS world has is probably Open Street Map, but creating that data set was far less laborious and uninteresting than training a computer to have some common sense will be.