Comment Re:"Rest assured, the data is going to be obscured (Score 5, Informative) 269
Disclosure:
I work extensively with Microsoft customer usage data (although on Visual Studio, not Windows)
Odds are, unless you've been very intentional about ticking the checkboxes the right way, Microsoft is already collecting usage data from you -- for a variety of products. Never without your consent, of course.
The issues around anonymizing your data and removing PII are taken very seriously. It's damn frustrating, because I often look over the data for user 234209342349 and think, "I wish I could email this guy and ask why the hell he is doing that". But there is no way for me to recover PII for VS client customers.
For the Visual Studio products, a typical approach is that data that might have a PII impact is one-way hashed on your local machine, so that PII never goes over the wire and never gets to Microsoft to begin with.
You can use tools like filemon to see where VS dumps the usage data files it generates. I don't remember if these look like binary mess on disk or not, but they get written to disk, and then you can see them go over the wire some time later. You could of course use a packet sniffer to see the on-the-wire format, and if it differs from what is stored on disk.
The data we scrub in VS covers the obvious things -- account names or email addresses -- but also some more subtle things -- like file paths (because these could contain your username, or a company name, or anything else), and even thing like VS Project Type names (because Company Foo can create their own Project Type, and might put their company name in the Project Type Name)
So anyway, there's actually not much of a story here. I can't comment on the truth or accuracy of what MJF is saying. However, what she is saying is that, in effect, the latency between usage data being locally captured/calculated, and that data being sent to Microsoft (assuming the user has allowed usage data to be sent), is now much lower than it was in the past.
For VS, at least, I know what data we have available to us. I opt-in to all of the MS data collection stuff, because I see no evidence of it being used inappropriately, and, because I know that we use it to try and understand what users are doing and why they are doing it.
Opting into the data collection stuff effectively gives you "a vote" in how we do things in future releases.