Submission + - New UK password guidance says re-using OK, regular changing a waste (www.gov.uk) 1
Blog launching the guidance: https://cesgdigital.blog.gov.u...
Main guidance doc: https://www.gov.uk/government/...
I think we might have a difference in understanding in what "outlier" means. An outlier isn't a data point that is shown to be incorrect; it's a data point that is numerically distant from the rest of the points in a set. The difficulty with this data set is that it's not just the extraordinarily high values that are incorrect, but that the statistically-average values are under suspicion as well. There might very well be one large company who actually did lose $30 million due to a security breach, and 100 small companies who reported losing $25,000 when they actually lost something closer to $2000. The problem is that the incorrect values aren't outliers; there's a whole bunch of them, so they don't look statistically different from the rest of the data.
No, I think we're on the same page as to what constitutes outlier. The point the paper makes is that for some surveys 75% of the average comes from an outlier or two. This is exactly the case with the 2007 ID theft survey they mention in the intro: the answers from 2 people (in a survey of over 4000) made a 3x difference in the average (and were found to be fabricated). It's quite possible that some of the non-outlier answers were fabricated also, but they don't have the same influence on the estimate.
Cant you just exclude the outliers from the analysis?
It depends on whether the outlier data is correct. If you're surveying wealth and some guy claims to be worth $50 billion, you need to figure out if he's telling the truth or not. Outliers have a huge effect on the average, that's the point of the sex-survey. The average number of partners reported by men is 5x higher than reported by women. But if you throw out the outliers among the men the averages are almost the same. Point of the paper is that in cyber-crime surveys they never even examine outlier results carefully.
It's well enough established that men claim to have more female sexual partners in sex surveys than women claim male partners, a discrepancy that can't be explained by sampling error alone.
That can be explained by a few women I know. They can take on three men at a time. So unless you correct the survey for them, the numbers won't match.
No, it can't. Suppose one woman sleeps with 100 guys. One woman increased her count by 100, and 100 guys increased their count by 1 each. The average number of heterosexual sex-partners that men and women have had is the same. Do you need me to draw you a diagram?
Everyone can be taught to sculpt: Michelangelo would have had to be taught how not to. So it is with the great programmers.