It's called the "tyrrany of dimensions". The more variables you have, the more data points you need exponentially to derive meaningful partitioning analysis from it, regardless of how clever your distance algorithms are.
Indeed, but only if you insist on carrying along in your analysis all the irrelevant and correlated dimensions.
And they have hundreds of questions when a dozen would be about all the entire population of Earth could support.
So do surveys, for significantly smaller sample sizes. I wouldn't be surprised if a non-trivial percentage of those questions are intentionally redundant - you know, to check *ahem* consistency, improve accuracy, etc. If, say, you have 100 questions grouped into 10 categories with 10q/cat, you have just dropped the dimensionality significantly while at the same time having more confidence in your data. A rule of thumb in surveys is don't trust the user^W^W^W^W *ahem* trust, but verify.