Randomizing Survey Answers For Accuracy 224
Saint Aardvark writes: "The New York Times reports that two researchers at IBM have come up with a way to persuade people to give correct answers to survey questions: randomize the results. Strangely enough, they can get accurate information out of the aggregate of enough answers -- but it's completely anonymized. Since conservative estimates say nearly half of all survey answers are bogus, there's an interest in persuading people to be more truthful. As ever, you can use the Random NY Times Registration Generator to falsify your registration details and read the article..."
How does this stop people from being false? (Score:1)
Re:How does this stop people from being false? (Score:1)
Even worse, it's a lot of fun just to screw the poll and prove statistics wrong
Re:How does this stop people from being false? (Score:3, Funny)
Re:How does this stop people from being false? (Score:2, Interesting)
When the answers are given in random order, each cycles into the different spots. The liars are actually cancelling out other liars who used the form before them. The differences in the answers are mostly based on how the truth tellers answered (I say mostly because some liars may have a different way of selecting a lie, such as the longest string offered in the answers), and so you can derive more meaningful statistics from them.
See this
Re:How does this stop people from being false? (Score:2, Informative)
Hrm (Score:2, Funny)
I don't get it. (Score:4, Insightful)
Re:I don't get it. (Score:3, Insightful)
If they don't trust the company, do you really think they'll believe some mumbo-jumbo about "randomizing"?
Fair point. One solution might be to perform the randomization on the client side and display the result. That way the user can see that the answers have been munged before they are sent.
Then again, if all you are interested in is aggregate data, just don't ask for any personally identifying information.
Re:I don't get it. (Score:2, Insightful)
But, again, why would a user bother? People resent being pestered for information. It's minimally more work to lie than to provide accurate information and much more satisfying.
Re:I don't get it. (Score:3, Interesting)
DennyK
Re:I don't get it. (Score:2)
I work in Marketing and part of my job is occaisionally analyzing the results of our eval surveys. I think far more people provide false answers out of laziness than out of deliberate lying.
I think this because we use this data (in part) to analyze the effectiveness of our magazine ads in order to budget accordingly. For a long time, one of the magazines, beginning with the letter A had been getting really good results from the eval survey. So we put more money into ads and articles for that magazine, but didn't notice any increase in evaluation downloads.
Then we changed the order of the magazine dropdown, and suddenly, no one was picking that magazine. (Now we regularly rotate the list.)
Yes, there is quite a bit of false data -- lots of people who work at Foo or Test -- but I think it's mostly people trying to get to the survey quickly and not people trying to protect their privacy.
Re:I don't get it. (Score:2, Interesting)
I find it takes a lot less time to fill in crap data than real data, what really pisses me off is places that correllate the state you select with the zip code. Places like that seem to be deliberately positioning themselves AGAINST me, so I intentionally fill it with erroneous data because they have become my adversary in the case of this page.
Filling in webforms doesn't become an issue of trust until I actually need them to have these data; in which case I try to be careful with who I give my credit card number, but don't care all that much about the rest.
I think the only reason people give out real data when presented with pointless web forms (ala NYT) is that they are unsure if it will operate properly if they enter the wrong information. I assume a goodly percentage of truthful answers come from a demographic that never intentionally fills erroneous answers into web forms; people who aren't very interested in where limitations exist in these computers that they just happen to use.
Re:I don't get it. (Score:3, Insightful)
You seem to have some sort of problem with this, as if they are somehow tricking you. No, it's just a validity check in an attempt to ensure accurate data. What I find interesting is that they would give you an error and ask you to fill in the form again.
Let me explain: let's say you've filled out a 10-question form asking for name, email, age, location, and a few "consumer behavior" questions. If you've done all this accurately, it files your data and lets your proceed. But if you've done it inaccurately (in this case, filled out a ZIP/state that don't match), it kicks you back and makes you correct it. So this time you put in a valid ZIP/state. You submit it, and it files your data away and lets you proceed.
The problem is that your data still isn't accurate, and therefore should be thrown out. Maybe your ZIP/state is correct now, but maybe you just put 90210/CA. A much better solution from a data integrity standpoint is to allow that user to enter junk data, but to not factor in that bad data when drawing conclusions.
I think there needs to be much more research in this area if anybody expects to get good data out of the internet. IBM's studies seem to be a step in the right direction. Not only do they want to improve data integrity for the company, they're also factoring in another important issue: privacy.
Re:I don't get it. (Score:2, Funny)
huh?
Re:I don't get it. (Score:3, Informative)
Frequently, the people that give input simply misread questions... for example 'How many males over the age of 18 in your household INCLUDING YOU' as opposed to 'NOT COUNTING YOURSELF'. Or they make typos. Error checking can fix that frequently. Saying that just because they mis-keyed their zip, the whole dataset is incorrect is not correct.
We've found that the most positive way to get good data is to get people that WANT to tell you their opinions to take the survey. Forcing someone to take the survey for free stuff or to take part in something just doesn't work. Giving them the free stuff then saying "Hey, would you like to give us your opinion" on the other hand, does. The only drawback is that you would assume you're tainting the respondent's opinion. Given the amount of research we've put in, we've actually found the opposite... people say "hey, I've already got my free shit, now I'll tell em how I REALLY feel". I don't see much of a purpose in what IBM has come up with.
Slashdot Poll? (Score:5, Funny)
O Yes
O No
O Cowboy Neal told me the answer
Re:Slashdot Poll? (Score:2)
Re:Slashdot Poll? (Score:2)
The fact is, those who are the "liars" must tick yes anyway, because it's the statement that cannot be true, hence it's a lie.
Then again, anybody who is a liar can also tick NO, since he's lying.
Re:Slashdot Poll? (Score:5, Funny)
Truth is often the most devious of lies.
Re:Slashdot Poll? (Score:2)
Truth is just an excuse for a lack of imagination
Re:Slashdot Poll? (Score:2)
>
>Truth is often the most devious of lies.
"If I were to ask you what your answer to the question 'will you lie when answering this question' would be, how would you answer?" ;-)
Circumvention (Score:1)
That's just way too wonderful to put into mere words.
This will not affect user behavior (Score:5, Insightful)
Re:This will not affect user behavior (Score:4, Insightful)
All they have to do is stop asking for my name and e-mail address, and I could be truthful about pretty much anything else they'd care to ask.
Re:This will not affect user behavior (Score:2)
Missing the Point (Score:3, Insightful)
Market researchers want information on YOU. They want generic info on your demographics, but this information has been available from other venues for a long time. When spy ware and other information gathering techniques are employed against someone they are being used to collect data to target marketing at that person specifically. Literally employed against that person.
As such, I'll still say that I'm female, in my 50's, from Yemen and making less than $12,000 a year. Randomize away.
Privacy? (Score:1)
My Advice (Score:1)
Re:My Advice (Score:2)
Dr. Ann Cavoukian sounds like she can help you with that too. Maybe that was her plan in the first place.
Of course (Score:2)
Typical session:
What is your age? (Results will be randomized)
23
OK, we're putting down you are 28 based on a random number we picked. Aren't we good to protect your privay?
(Then behind the scenes the database gets the real age put into it, how will the user ever know?)
Even if the user can view their profile later on, the database can just store their real age + the so-called random modifier, and the user will be none the wiser.
What a pointless "technology".
Re:Of course (Score:1, Insightful)
Not at all, not at all. Like 80% of the stuff these days, it exists merely to get some nice paperwork for the students, after that it will be forgotten. Once they have their Masters/Doctorate in an incredibly narrow field, gotten themselves into debt, given money to textbook makers and given jobs to profs, they will have their paper that will get them a nice nice job, all the while perpetuating the myth of higher education and raising the bar for everyone else.
Hardly pointless, is it? I mean, it's the only way for a modern society to still use capitalism.
Re:Of course (Score:1)
You sir, are my hero.
This is how it would work: (Score:2, Interesting)
1 Age [28] *Will be randomized*
2 Age [56 (Randomized)] *28*
The value 56 gets submitted to the server, not the value 28 - which is my real age ;).
This is auditable because I can inspect the source code which is part of the web-page, and I can even monitor the network packets if I'm really paranoid.
Now I could still lie, or mess with the algorithms in the Javascript, but what would be the point?
80N
Re:This is how it would work: (Score:3, Interesting)
I don't really understand how SSL works, but I trust my browser (a bit) and when I see https in the URL then I'm comfortable with that. Not because I fully understand SSL, but because I listen to the opinions of people who do.
So if it became accepted practice that pressing the Randomize button on your browser (why not build it into the browser) made your response anonymous then nobody needs to understand it any more than they do SSL.
Actually, why not have a new http method: POST-RANDOM instead of POST so the server knows that the data has been randomized.
80N
Trust is the issue... (Score:1)
-CySurflex
5% Error on "reconstructed data" (Score:2)
Re:5% Error on "reconstructed data" (Score:3, Informative)
Re:5% Error on "reconstructed data" (Score:2, Insightful)
However, since the reconstruction error would depend on the number of respondants, which will vary dramatically from site to site, I might also guess the 5% number was rectally extracted, and only used to make a point for the article that it will still be better than the error due to respondants lying, despite not being perfect.
All of this, or course, under the dubious assumption that people will stop lying just because random numbers have been added to their information, as numerous other posts here have discussed...
*...yea..yea..I know, there's no such thing as perfect random number generator, but those tests you hear about mathemiticians running on RNG algorithms are for the truly anal-retentive who are worried about patterns showing up after the 2^64th repetion or whatever. I doubt that even a relatively low-tech randum number algorithm would be taxed by this technique.
In reply to "you still have to trust the company" (Score:1)
Re:In reply to "you still have to trust the compan (Score:1)
However, it still doesn't fix the problem that people lie. Even if they know that their privacy is guaranteed, they'll still lie, simply because it's fun--after all, rules are made to be broken.
Re:In reply to "you still have to trust the compan (Score:2)
I'm always a bit skeptical when I'm told I'm about to be surveyed anonymously, and I can't think of a way that this can be implemented (or at least is likely to be implemented) that would reassure me. The non-skeptics are filling in their information already. Perhaps businesses could pick one in five to survey and offer the people who don't want to take it the ability to just skip it; I'll bet a good amount of crap in the databases is coming from people who have to fill in eighty mandatory fields for free e-mail or music or whatever.
optional vs. required (Score:5, Insightful)
For example, company XYZ has released a program called Widget. In order to download Widget, users are asked to fill out a survey so that XYZ may guage the demographics of their target audience.
Some sites will allow you to bypass this step and proceed to download the software. Other sites require this information before revealing the download link. I think that the psychological difference between "required" and "optional" would heavily influence the honesty of the answers.
I know that I never honestly fill out required forms. I'll fill in a bunch of bogus details, get the link, and be on my way. However, if the form is optional, I may download first and, if I like the program, provide some details to the company. The difference? I'm not being forced to give anything up in advance.
Is this true in general? I don't know. But it makes sense to me.
I have an idea for something to replace the survey forms - an AI program to carry out a conversation with the user. Ah ha! We just have to watch out for users that say to the AI - "I am lying" - and hope the AI doesn't need therapy.
Re:optional vs. required (Score:2)
Type of questions, too (Score:2)
On the other hand, if the company really only requires me to answer questions of demographic importance, such as what country and state/province I am from and my age, I am likely to respond truthfully.
Re:Type of questions, too (Score:2)
Kind of like what dboyles said [slashdot.org]: by allowing the user to skip questions they don't want to answer, the questions they do answer are far more likely to be honest.
Re:optional vs. required (Score:2)
I agree with your theory, but I want to expound on it a little bit.
I don't think many people will be inclined to actually return to the site and voluntarily provide information. However, think about the people who would fill out optional forms in the first place. The demographic probably fits that of the casual internet user. That user is much more likely to provide accurate information - but just as importantly, they're unlikely to provide inaccurate information. So by making a form optional, you've seriously improved the integrity of your data.
Then, a company can look at that (supposedly very good) data and make assumptions about the users. However, they must be careful to not assume that the data is a full picture just because it is not innaccurate (I'm purposely not using the word "accurate"). In other words, if 40% of the respondants indicate that they like Murder She Wrote, you can't assume that that extrapolates to 40% of your user base. Instead, the company must associate that data only with the respondants. But since they have very accurate information about their respondants, they can assume that their conclusions are equally accurate.
So the question arises, "What about the non-respondants?" That's true, the company doesn't have accurate information about them. But what's better, good information about a small group, or bad information about a large one?
Re:optional vs. required (Score:2)
That's so backwards though. There is a difference between a Survey and a Census. Asking every single person that comes to your site what they think is a Census. Yes, that's obviously the best way, but not the most cost effective.
A Survey is talking to a percentage of your user base, and extrapolating the data. If done properly, you can interview a random group of around 15% of your user base and be statistically 95% accurate. Thus far, it's the best we have, if you discount cheating and poor data collection practices.
My point is, if you do your survey correctly, if 40% of your respondents indicate that they like Murder She Wrote, it's safe to say that 40% of your user base also does, plus or minus a small percentage. That's the whole point of statistics.
Re:optional vs. required (Score:2)
That's assuming that the respondants represent an accurate model of your population. My argument is that that's not the case in optional, online polling. Maybe 40% of the respondants like Murder She Wrote, but maybe 70% of respondants were between the ages of 50 and 70.
Re:optional vs. required (Score:2)
Basically, if they don't get a representative sample within reason, it's due to a poorly administered survey, not the seemingly arbitrary nature of their polling.
Re:optional vs. required (Score:3, Funny)
The New York Times thinks I'm a 146 year-old lady who makes less than $10,000 a year, has 3 children in high-school, and enjoys golf and motorsports in her spare time.
old but useful (Score:1)
Does this increase trust? (Score:5, Funny)
Personally, if I don't trust them enough to tell them how much I make, I'm not going to trust them to randomize my results. I don't see how this will increase accuracy -- especially if I keep telling everyone I'm a 108 year old female in Uganda making $100,000+ per year year who works in the sales department of an Educational field and plans to make purchases of an suv, a house, a console gaming system, a optical mouse in the next six months and rates thier internet experience as very low. My e-mail address is sjobs@mac.com and I would like to apply for your quarterly, monthly, weekly, daily, and hourly newsletters and I do give permission to pass this information to your affiliates.
Grandma? (Score:2)
Beef jerky?
LOL oh this is rich (Score:1)
That's just stupid (Score:4, Insightful)
Let me summarize:
1) People lie on surveys, most likely because they don't trust the taker - but probably also just because they like putting in other answers (yeah, I'm a millionaire, woohoo!, etc). This only addresses the trust issue, ignoring other ptential sources of lying.
2) In order to work around the trust issue, they've developed a method of injecting random noise into the original answers as they are recorded and then extracting useful data in the end.
Notice their technology doesn't do anything to fix the underlying problem. The hope is that users will understand and trust the backend randomizer system, and that based on this trust they will answer more truthfully.
Without bothering with all this mumbo-jumbo, I can build a trustworthy system. I simply record survey statistics, and I promise not to use the individuals' personal data invidually.
They can either trust me that I'm telling the truth about this, or they can lie. In the IBM researchers' scenario, the users are again asked to trust that the backend system doesn't compromise them, and again they can choose to trust it or choose to lie.
Given the above, why on earth would you bother with this research and uneccesary complexity. It's not going to make any difference over just promising your users that you don't invade their privacy. You could replace their research results with a banner on top of the survey that says "After you sumbit your data to us, we use Magical HibiJibi technology to prevent ourselves from invading your privacy, so please trust us and answer truthfully"
What a waste of research.
Re:That's just stupid (Score:1)
Re:That's just stupid (Score:2)
Yes, but surveys are tagetted at and only work with masses of people, the common plural man. From this person's perspective, it doesn't matter than the randomization technically happened in their PC in a java applet or javascript code. In either case they're entering personal data and trusting the company to not abuse them.
Re:That's just stupid (Score:2)
Re:That's just stupid (Score:1)
So if ppl will still lie, the accuracy won't change.... you can't have good accuracy with "wrong" data....
Re:That's just stupid (Score:2)
No, you're still stupid.
Surveys survey the masses, which means computer illiterate people. It doesn't matter to them the ins and outs of where the randomization happens. If they have to put their real data into a form, they will assume the company asking the survey can get it if they want it. Telling them "don't worry, a client-side javascript function is randomizing this before it gets sent in" doesn't reassure them any more than the HibiJibi statement.
Well , good read. (Score:1)
Like Benjamin Disraeli once said ... (Score:1)
Comared with Slashdot polls (Score:2, Interesting)
kind of like... (Score:1)
I can't remember where I read this. If someone has a like could you please post it?
How about... (Score:2)
Create a form with attached Javascript. You enter the real data and hit the "obfuscate" button. The script then locally adds noise to your answers. At this point, the "obfuscate" button turns into "submit", allowing you to send the visibly obfuscated responses to the remote site.
Of course, you'll probably want to read the source to make sure the real answers are not sent along with the obfuscated ones. Still, this scheme would go a ways toward creating the perception of honesty.
User Interface and Implementation (Score:4, Insightful)
Explaining the whole randomization process (how it protects privacy, how it provides useful info) will be a little much for most people I think, but a good user interface might alleviate this, perhaps with a 'randomize' button that is used before hitting the 'submit' button. This would take the user input and change it right in front of their eyes. Of course many would be rightfully concerned that the randomize button is just for show (or simply encodes but doesn't anonymize), but I think that enough people might buy into the false sense of security that demonstrated 'randomization' provides to at least partly improve the % of bonafide results. Also, the system could be set up so users who don't mind submitting traceable information could be encouraged ("extra 10% off") to submit without randomization, with a simple flag sorting data into randomized/anonymous and non-randomized/non-anonymous data).
This approach would be even better if the randomization approach becomes a ubiquitous standard backed by a consistent and legally accountable and well-known entity/brand (IBM for instance). I'm not sure how well an open solution would work unless there was a central group assuming responsibility and accountability for the system, enforcing trademarks, and suing spoofers. Also, people feel safer when they feel there's someone to blame for any abuse/mistakes (hence, giving their credit card freely to a waiter but not to a website).
Old trick (Score:4, Informative)
My college stats professor 10 years ago explained a simpler trick that puts control in the respondant's hands. It went something like this:
With each question, the respondant flips a coin and looks at the second hand of a clock. Only the respondant can see the coin or the clock.
If the second hand is between 1-30 seconds, they answer per the coin (e.g. heads=yes). If it's between 31-60, they tell the truth.
The surveyor, knows very precisely the number of 'lies', can extract accurate data, and the respondant has confidence and control over their privacy. All without a transistor.
Re:Old trick (Score:3, Insightful)
Useful in theory? Very. Useful in practice? Not so much.
Re:Old trick (Score:3, Interesting)
A variation on this is to give the respondant a die (ie, half a pair of dice), tell them to pick a number between one and six, and every time they roll that number, intentionally give a false answer on the survey. Thus, looking at any individual survey response, you don't know whether it's true or false, but you can factor in the 16.7% false responses into the statistical analysis.
Sure, that can be computerized, but as someone above pointed out, how does the respondant know he can trust it? The above old technique is entirely under the respondant's control.
Drug abuse surveys etc (Score:2)
ISTR it's in Tanner's book on Gibbs sampling, as a method used to extract accurate population estimates about embarassing, personal or even incriminating subjects, such as past exposure to STDs, sexual orientation, or the use of particular controlled drugs.
Of course, your survey has to be big enough so that the expected number of true positives (N.p) stands out above the expected uncertainty in the number of false positives, approx sqrt(N.p'.(1-p'). If p is small, N may have to be really quite big.
Re:Old trick (Score:2)
1) Have you ever shoplifted?
2) Do you have any siblings? (Or some other innocuous question.)
You tell the person "Roll a die (or just mentally choose a random number between 1 and 6). If you get a 5 or 6, answer question 1. Otherwise, answer question 2." You can use the fact that there is a 1/3 chance of answering question 1, together with Bayes' Theorem, to figure out the percentage of people who said yes to question 1. People feel more confident about answering honestly, because the experiment is simple enough that most people believe the researcher doesn't know which question they answered (although some people will still be suspicious, of course).
Note: if you have them mentally choose a number between 1 and 6, you first need to do another experiment to find the percentage of people who choose 5 or 6, since it probably is not 1/3.
I read a nice little article on this subject a while back called "How to ask sensitive questions without getting punched in the nose", I believe it was in volume 3 of a series called Modules in Applied Mathematics, but I don't have it handy on my shelf. But it's a very well-known example in statistics, I believe it's called a randomized response design.
Re:Any statisticians in da 'ouse? (Score:2)
Reminds me of... (Score:2)
I'm innocent! (Score:3, Funny)
NYT Random Login Generator (Score:5, Informative)
I'd also like to remind anyone who wants to download, copy, and mirror the source of that page on their own servers, or even as an HTML page on your desktop or whatever. It's just javascript, so it's portable, and that way you'll still be able to use it when the NYT lawyers finally get around to noticing it or they start blocking requests from my page or something. (It will also help distribute my load, though I haven't had any real trouble yet...)
Re:NYT Random Login Generator (Score:2)
Re:NYT Random Login Generator (Score:4, Insightful)
Too bad there's no "Skip this crap" option in their registration screen, huh?
So, the only way to not give them your life story is to lie. I know! Let's make it easy and create a random login generator so I don't have to type more random crap on every computer I use!
And, BTW, if you think I'm paranoid, I'll let you know that I was able to make any changes I wanted [but only did what I asked, of course] to my grandmother's phone line by simply asking her age and full name -- ALL of which are sent to NYT on that page. They only asked to hear a lady's voice, which my mother happily provided. Armed with just a birthdate and name I can make all sorts of changes to your services -- anonymously.
Knowing that, do you want to give me your name and address? If you don't, you should know there's no reason why I'm not working at the NYT right now... I will tell you that were I do work I have access to many, many, many records including Full Names and Birthdates. Feeling uneasy yet? Well, if you trust me, I've never abused those privleges.
>When they change their registration process and perhaps charging for their online content, don't start bitching.
My only bitching will be the fact their site goes offline for everyone. You can't compete in a (literally) Free market by charging infinitely more than your competitors. With the amount of newspapers online right now, and the amount of good content that doesn't come from the NYT, I think they'll end up another salon.
Re:NYT Random Login Generator (Score:2)
Then I decided to make it part of my evil plot to take over the world by using it as my signature on slashdot (you have to start somewhere, you know.
It was funny to see that other people noticed it and started to use it as well.
Now that the first step of my mission is completed, I will have to think of another signature.
None of your business...... (Score:2, Insightful)
Why is it that online business feel they have the right to try and force so much personal information out of us? In brick 'n mortar stores, the worst info anyone asks me for is my zip code (or age to purchase alcohol). They can get my name if I use my credit card, but I can easily pay cash to avoid that.
It's very ironic that NYTimes would run this story.... Why do they expect me to tell them where I live, work, and what I make, just to read their articles? The paper version is nowhere near this invasive.
Not an entirely new idea (Score:2, Informative)
It seems that what the IBM folks are doing is a staightforward extension of this idea to a larger response domain (numerical ages as opposed to boolean questions) and to a more automated system in which the website flips the coin for the subject and amends his answer accordingly.
Data is already "randomized". (Score:2)
Or hadn't they thought of that?
Re:Data is already "randomized". (Score:2)
I bet if they applied their decommutation algorithm to the existing data, they'd wet their pants.
--Blair
Re:Data is already "randomized". (Score:2)
I question a basic assumption (Score:2, Interesting)
It was an interesting article, and I can see how this technique will work when the surveyors have the goodwill of the respondents, so that any respondent's primary concern is only that of keeping his individual privacy.
But is privacy the core issue in market research, or is it simply a label of convenience that a lot people use for something else that we don't have easy words for? I will lie on many surveys even when I am fully confident of my personal anonymity-- though I prefer to avoid those surveys entirely when I can. OTOH, when a survey is done by a group that I have aligned myself with, I might well enthusiastically bare my soul without any regard to the privacy issue. And I know that I am not at all uncommon in these respects.
I suspect that my reactions stem from the same source as nationalism, patriotism, ethnic pride, and that whole mess of things where I'm not behaving as an individual protecting my privacy, but as a member of a group who feels called upon to defend my group.
Mostly I see marketing as an attempt by outsiders to mess with my group, to get us to buy stuff through conning us rather than letting us apply our own standards of value to the goods offered. I think I lie on surveys to protect my group from these subtle attacks; to misdirect and confound my group's enemy.
So I really don't think privacy has much to do with it. I think all this lying is a natural group reaction to consumerism, and its belief that it is perfectly okay to sell product by conning your customers into thinking that what you are pushing today is something they want.
Not in my group, buster. We don't need no steenkeeng pushers in our neighborhood.
They need to let the randomization be client-side (Score:2)
That way, the browser tells you that your entry will be randomized to tell the site your age +-30 years, or give your actual gender 20% more frequently. Based on the numbers the site is using, you can decide whether to answer accurately, knowing just how hard it would be to track you based on this information. The web site would then be able to remove the noise from the aggregate data, and have a confidence based on the distribution they ask for (aside from people who think the margin is too small and lie).
Re:They need to let the randomization be client-si (Score:2)
Why I don't trust survey takers (Score:3, Informative)
A couple of years ago, I received a survey in the mail that said the results would be kept completely confidential and anonymous. I thought it was odd that there was a mysterious seven-digit number in one corner, but anyway, I said to heck with it and pitched it. A week later I got a follow-up letter noting that I hadn't sent in my survey yet! Some anonymity!
Incidentally, this is not the only time I've gotten "anonymous, confidential" surveys with mysterious multi-digit numbers. In at least one case, it was at a big company and the survey involved things that nobody in their right mind would want their bosses to know about... and there were mysterious multi-digit numbers on the forms and, indeed, checking with colleagues confirmed that the numbers were different on each of our forms. Naturally, we all put down safe, inaccurate answers.
neat technique first published by Robyn Dawes (Score:2)
Asking people right out "Hey, did you have unprotected anal sex on your casual encounter?" was found to be not a particularly good way to elicit truthful answers. So what you do is give people a fair coin (or the equivalent) and have them flip the coin for each question. If the coin lands heads, they answer "yes". If the coin lands tails, they answer *truthfully*. Looking over an answer sheet, you have no idea which "yes" answers are real and which are not, and subject did feel like nobody really could "get" any personal information off their answer sheet. In the statistical aggregate, however, you could get perfectly useful average rates for a given population. (Basically, you just adjust for the "yes answer background".)
A great idea, but its use in a wide-range study of this type was axed, I believe, when the study itself was blasted by certain members of congress...but that's another story.
Completely missing the point (Score:2)
The forms are giant time-wasters.
If the folks giving these surveys would stick to EXACTLY what they NEED to know, we wouldn't balk at filling them out properly- especially since personal data is one thing they generally do NOT need to know for marketing!
Forget the name, address, interests (the BIGGEST time waster of all.) Generally, the most important information that you can get from site visitors is:
1) Zip code. This tells you the geographic area that your visitors are coming from. Useful for location-relevant information, but completely impersonal.
2) Age range. This is really the prime info that marketers want, as so much of their "science" is based on generational observation. Again, totally impersonal.
3) How you heard about the site. This is the most important thing you can learn from your visitors, as it gives you some information on which advertisements are performing!
If every site I signed up to asked me these three questions and these three questions ONLY, I'd answer them all truthfully. As it stands, I have to dig through a mountain of shit, and these days I generally just throw the shovel at the pile and move on.
So far, people seem to have missed the point... (Score:2, Insightful)
The idea of randomising answers it not new. It has been used in 'socially sensitive' surveys for years, if not decades.
Simple explanation:
Have a survey of 10 questions people don't like to reveal the truth of, ech with a yes/no answer.
For each question, either
a) reply truthfully
b) flip a coin and record whatever the coin gives.
If challenged about your answer, you can always say that's the answer the coin required.
Analyse the results for a large population of completed survays. Any significant deviation from 50% yes and 50% no answers tells you which way the population answered, without revealing who actually holds those views.
All you need is a coin to randomise your answers. This is independent of any web form, doctored answer sheet etc etc - so particular answers cannot be pinned on you.
It's fun administering the same survey to people with and without the randomisation - you get to see what people in general lie about!
Hope this gives a usefule summary of the method.
Regards,
pgrb
not a new idea (Score:2)
I remember reading about something similar to this in a psychologly class in 1988 or so. The idea was for people doing a door-to-door survey asking things like sexual behavior. There's important public health reasons to have the data, but also strong reluctance to give honest answers.
What they did was give the person being polled a spinner, like from a board game. (Remember those, oh you young /.ers? Maybe not...) It was divided in two parts, 2/3 would say "yes" and 1/3 would be "no". The questioner would ask if the person's answer to some yes/no question matched that shown on the spinner (which the questioner couldn't see). You couldn't know what any single person's answer was, but you could do the math and get how many had done what.
People lie because corporations lie (Score:2)
I used to work for a company whose customers had to provide accurate information in order to sign up -- the service wouldn't work with false info -- but the problem was getting people to sign up.
One of the main selling points was that customer data was completely secure: no one will ever be able to read your data, only an aggregate report of all our users. The company went to a lot of trouble to make this point convincing, going so far as to suggest that users had legal protections against abuse. There were people in the building who spent all their time trying to think of ways to convince more people to drop their defenses so we could exploit their information -- cold, calculating, 24-7, like WOPR spends all its time playing World War Three.
I believed their claims until the day I saw a user's sensitive data on an engineer's screen. And then that engineer showed me another user's data, and another. "We've always had the ability to do this," he said, "for, ahem, quality control purposes."
If a company tells you it isn't collecting the valuable data you provide, you need to assume it is lying (unless you can personally verify the claim or you are positive that the law protects you against abuse).
People are "lying" because corporations lie, as a matter of policy. This will never change because lies are more profitable than truth. Only corporations don't call their behavior "lying," they call it "marketing." So when I fill out an intrusive form with false information, I don't consider it lying either. I call it "standing up for my right to privacy." This system of "marketing" versus "standing up for my rights" is well-balanced, but this new masking technology is simply a marketing attempt to tip the scales in the corporations' favor by tricking consumers into volunteering information on false assumptions.Randomizing for Accuracy (Score:4, Funny)
Real easy way to get honest answers (Score:2)
Going back to the example of the exit poll, if all you're going to do is try to make money by predicting who will win an election, its much more satisfying for the voter to lie and watch them squirm when they get it wrong. Why should we tell the truth?
Random function (Score:2)
It is amazing how people justify lying. (Score:2)
The company is providing you with a product, often for free, and they request that for you to use their product you give them a little personal information. It is their product, so they get to make the rules. Your choices are to give what they want and take what you want, or you could just live without it. I don't understand the position of taking what you want and not leaving what they want.
Or you consider this tiny piece of personal information part of the price. Instead of giving them $5, you give them your age, salary, and email address. You don't try to trick the grocery store clerk when you think the bill was too high for what you bought, do you? Why would this be any different? If you don't like the invasion of privacy, then the cost is too high for you and you don't take the product.
I can see where people may say that capital is a required part of making the product and personal information isn't. Since they don't need my email address then I should feel free to not give it to them. However, this personal information often does translate into capital for them (the goal of business is to make money most of the time). Besides, that isn't your decision to make. The company wants your privates and they are giving you the product, so their desire carries more weight. If you were not receiving something back in return, then their desires would not override yours.
It is only a "little" lie doesn't change the fundamental aspect that lying is a priori wrong.
Am I the only one that finds it ironic... (Score:2)
Helluva page btw, majcher. Thanks
Re:Mirror (Score:1)
Re:Mirror (Score:2)
First, I found this funny:
Programs like this one could lead to greater truthfulness in the answers people volunteer on the Web, she said, provided that they were willing to replace some of their native caution with a bit of good will toward a company and its need for data-mining.
Yes, they *need* to make even more money off your data.
Second, anyone find it interetsing that they assume a distribution and then work towards it:
"When people lie randomly -- and that is what they do now when they answer questions -- we get very poor results," he said. But by "adding random values to true values," he said, "we can reconstruct a distribution that is very close to the actual one."
Using this information, Dr. Srikant said, the researchers make a first guess at what the true distribution should be. Then the program crunches through the analysis and produces a slightly better guess. This guess is crunched again, and the process is repeated over and over again, getting closer and closer to the actual distribution.
My guess i sthat they hope people don't truely lie randomly, and then yuse their random additions or subtractions to bring people closer to the actual distribution - i.e. I may say I make $0 or %$50,000 (or what ever the low/high end is, but not pick one one away from my real income.) They are hoping that people, as a group, behave predictably even when any one individual doesn't. Which, if my org behavuior prof is to be beleived, is generally the case and the way people can shape other's responses and behaviors.
Interestingly enough, randomization is a useful tool in surveys. If you area sking about very private infromation that people may lie about if they fear the answer will be leaked, you can tell them to flip a coin - heads ask them to answer truthfully, tails put down no (for a yes no survey). With a large enough sample, you can back out the real results based on the 50/50 results of the coin toss, without knowing how anyone actually answered.
Of course, companies should probably ask themselves how many Josef Stalins live in Moscow Idaho and were born on Oct 24, 1917?
Re:Dr.... (Score:2)
"hello mr. smith, you have a 1pm appointment with dr. 'ka-voa kee-an' today."
"no thanks, I feel better."
"nonsense- the doctor just got the death machine warmed up!"
(and yes, I know it's not the same guy)