Because voice processing and searching on the scale of some of the applications such as SIRI require centralized processing.
I don't buy it. These sentiments jumble a number of separable components.
Have a 10 year old device was able to do local speech recognition including arbitrary voice shortcuts and search without training. I would tell it to play song x or anything from artist y and it would most of the time get it right and just do it all offline and all on hardware at least an order of magnitude less capable than what is available today.
There are PC software packages such as Dragon and Sphinx able to do free-form speech to text locally.
You don't need "the cloud" to control a TV. Recognizing a short list of commands to control a device is relatively trivial. There is nothing wrong with searching online databases if that is explicitly necessary... What is wrong are generation of bullshit excuses to collect usage data by virtue of voice enablement. People have never really gave a shit about voice recognition enough to justify any serious R&D expenditure. Vendors push it because they want revenue stream that goes with data collection.