Comment Regression testing (Score 1) 124
If I were in charge of Siri, I'd do the same thing. That kind of real-world data is vital for regression testing. If you don't have a strong corpus of sample data, when you make changes to the code, you've got no idea if what you are doing is improving the situation for some cases, while damaging them for others. You would see people complaining about things like "Well Siri used to work for X query but now it doesn't". When you have this data, you can update the code, run the test suite, and see if it fails a large number of existing cases.
If Apple do anything to mitigate this, it will probably be some form of opt-out, but they are unlikely to make it the default, because I would imagine that building a corpus of representative speech from a thousand different accents talking about tens of thousands of different subjects is nigh on impossible otherwise, especially as jargon comes and goes so quickly these days.