Smart people are a threat to those who hold power. Especially the subset of smart people who are politically engaged and willing to put themselves at risk to protest and demand change. And among them, the subset who are world famous and therefore have easy access to the press, well, they are just beyond dangerous.
There is a long history of new dictatorial regimes wiping out, killing, or scaring away all of the educated class, thus making the general populace less likely to organize, garner international attention, or outsmart anyone in the regime. This fits the pattern.
First of all, as someone who's work in parallel computing for a while, I think it's actually quite hard to define tasks that actually have value that can be broken down into such small and easy sub-tasks. And within the set of problems where you can do that, there is a pretty large overlap between what a completely untrained person can do and what a perl script can do. So the whole idea of an army of anonymous random humans adding microvalue that adds up to big value is problematic for me. Maybe there is theoretical value there, but so many things could go wrong.
Secondly, if you can clearly define a task like that, and what it is worth to you, why restrict your solution to humans? Provide an API and let me try to solve it algorithmically. If all you care about is getting the task done, what does it matter whether I get it done with a dozen Indian subcontractors, a thousand trained monkeys, or a clever little genetic algorithm?
Actually you *can* do that kind of multi-dimensional filtering, equivalent to multiple AND statements followed by a GROUP BY. There are different data sets here, with different usage models. Perhaps most interesting is the Public Use Microdata Sample (PUMS). Docs here: http://www.census.gov/acs/www/data_documentation/public_use_microdata_sample/
PUMS contains records representing individual responses to the American Community Survey (ACS). These individual responses include detailed data including housing data (# rooms, heating fuel, property value, mortgage, age of house, etc) and personal data (family income, vehicles, employments status, # children, language spoken, etc). Now, ACS is a sample, not a full enumeration like the decennial census, but the sampling is done carefully in an attempt to be representative. Full record definition here: http://www.census.gov/acs/www/Downloads/data_documentation/pums/DataDict/PUMS_Data_Dictionary_2006-2010.pdf
Back to the confidentiality question: this detailed data is carefully altered to protect individual privacy while still being correct at an aggregate level. Here's what the site says about this protection:
"As required by federal law, the confidentiality of ACS respondents is protected through a variety of steps to disguise or suppress original data while making sure the results are still useful. The first means of protecting is the suppression of all personal identification, such as name and address, from each record. In addition, a small number of records are switched with similar records from a neighboring area or receive another collection of characteristics developed by using a modeling technique. Age perturbation is one example of procedures that disguise original data by randomly adjusting the reported ages for a subset of individuals. The answers to open-ended questions, where an extreme value might identify an individual, are top-coded. Top coded questions include age, income, and housing unit value. In addition to modifying the individual records, respondents' confidentiality is protected because only large geographic areas are identified in the PUMS."
The Census site has a little info about this: http://www.census.gov/privacy/data_protection/statistical_safeguards.html
But more relevant is this link to the American Statistical Association, which goes into significant depth on the techniques used to protect confidentiality: http://www.amstat.org/committees/pc/index.html
On this page http://www.fcsm.gov/working-papers/spwp22.html we find a working paper from the Federal Committee on Statistical Methodology, which has deeper details on actual operations.
From that page, the "Statistical Disclosure Limitation: A Primer" document has an interesting section defining inferential disclosure - "occurs when individual information can be inferred with high confidence from statistical properties of the released data."
And the "Current Federal Statistical Agency Practices" describes the multi-dimensional linear programming used to prevent that, along with other techniques including geographic thresholds, population thresholds and coarsening.
So the summary is: Yes, it is a serious issue to be concerned about, but Census is taking it seriously, applying some real science and math to it, and it looks like they are doing a good job.
1 + 1 = 3, for large values of 1.