Please create an account to participate in the Slashdot moderation system

typodupeerror

## tepples's Journal: Correlation and Causation23

Journal by tepples
tepples wrote:

Correlation implies 25% likelihood of causation. Either A causes B, B causes A, C causes A and B, or chance.

In this post, Immerman wrote:

I *hate* seeing statistics abused. A 25% likelihood of causation is *not* implied. Yes, one of the four outcomes must be the case, but you don't know the relative probabilities of each. It's like grabbing a marble out of a bag containing red, green, blue, and yellow marbles - there's only four possibilities as to which color your marble is, but for all you know I filled the bag with blue marbles and just threw in a handful of the other colors, in which case it would be preposterous to claim a 25% chance of getting a red one.

I'm aware of the hyperbole in my illustration. They're probably not equally probable, but absent other evidence, one has to assume so. My point is that just because the probability isn't 100 percent doesn't mean it can always be treated as 0 percent. So if you want to plead false cause more effectively, explain why they're not equally probable. Be willing to discuss what further observations would be needed to show which of the four possibilities is most likely. But don't say "correlation does not imply causation" as if it were "correlation implies lack of causation" without providing evidence, as that's close to the fallacy fallacy and the black or white fallacy.

This discussion has been automatically archived. Discussion continues in Daniel Dvorkin's journal.

This discussion has been archived. No new comments can be posted.

## Correlation and Causation

• #### Still wrong. (Score:1)

C is an infinite class of possible "third-causes". Therefore there are infinitely more than 4 possible outcomes.
• #### Four infinities (Score:2)

And there are infinite classes of sets of intermediate steps through which A causes B or B causes A. There are also infinite chance mechanisms. Therefore, all four kinds of outcome are still infinite. I could be wrong about their being equally infinite, however, if one is countable [wikipedia.org] and the other not, like integers and reals [wikipedia.org].
• #### Re: (Score:2)

I'm afraid you cannot divide one infinity by another and get back a fraction, not ever. Read up on Hilbert's Hotel for an intuitive exploration of why this is so.

• #### Re: (Score:2)

I'm afraid you cannot divide one infinity by another and get back a fraction, not ever.

There is a bijection between rational numbers in lowest terms and the positive integers. Therefore, they are equally infinite. Calling the cardinalities equal in the sense that their ratio is 1 is hyperbole, as I have already admitted. But to more directly address the point: How else would you recommend colorfully expressing "just because causation hasn't already been proved doesn't necessarily mean we should drop the investigation of causation"?

• #### Re: (Score:2)

I think your current sig does so pretty succinctly, actually.

• #### Logical Fallacy (Score:1)

I can't say I thought I'd actually learn something from clicking here to your journal, but I did. Honestly, I expected some more arguments but perhaps they'd be more interesting than the usual, as it seemed this was about an interesting topic. Instead, you've posted that excellent website which is now on my favorites. TYVM, and have a good day.

• #### o rly? (Score:1)

"probably not equally probable" "one has to assume so" Statistical theory of theories? Get out of here kid, you don't have a clue what you're talking about. The world doesn't run on bad logic and philosophy since Galileo. Wanna be useful? Tell me about the hidden link between quantum correlations (pick any) and physical cause and get a paper out and shake the foundations of modern physics.
• #### Re: (Score:2)

So what's the correct way to illustrate that "correlation does not imply causation" does not imply "one should not investigate the likelihood of causation"?
• #### Re: (Score:1)

Look son, if you have some sort of correlation ---and I'm not talking about an imaginary one that exists in a fantasy world, a real physical correlation--- you gotta figure it out by guessing the theory and do experiments to question it's validity. Then you need to work your ass off and work out whether your theory has anything to do with causation. There are no such shortcuts. Saying "given a physical correlation and a randomly picked theory that fits the phenomenon, the theory will involve physical cause
• #### Less precise than 25% yet fits in 120 characters (Score:2)

you gotta figure it out by guessing the theory and do experiments to question it's validity

And one needs to do the same thing to establish a lack of causation. But a lot of the arguments I've seen take the form "You haven't already proved causation; therefore, working one's ass off to prove it one way or the other is futile."

Saying "given a physical correlation and a randomly picked theory that fits the phenomenon, the theory will involve physical cause with x chance." is just bad science.

So is "92.7 percent of statistics are pulled out of someone's large intestine", despite it being ironically self-demonstrating. So what should I say that's less precise than "25%" but greater than zero? The intended meaning, "not greater or less than the other possibilities u

• #### Re: (Score:2)

Correlation implies one of four possibilities: ...
because really, that's all you can honestly say.

• #### Re: (Score:1)

You're missing the whole point by insisting on talking about the wrong question, there is no such thing as "statistical theory of correlation theories"! It doesn't matter whether you state a weaker condition and say nonzero, physicists (or anyone who's doing real world science) just don't work like this! There are no-go theorems in physics and they're useful that's because they exactly say something can't happen in a theory (zero chance). But saying "X may or may not imply Y in a theory" is just useless, pl
• #### Perhaps the right question is burden of proof (Score:2)

I'll grant that I have likely been talking about the wrong question. Perhaps the right question is where the burden of proof should lie. In a lot of Slashdot stories about studies showing correlation, the attitude I see in several comments is "It should be treated as chance until proven otherwise, and I refuse to endorse committing resources to prove otherwise." The former is innocent until proven guilty, which I'll grant for now. The latter corresponds to a desire to shut down the police and the prosecutio
• #### Re: (Score:1)

I'm not much familiar with Slashdot, but now I see your point. It's the tenet of pseudo-science and it is unfortunately everywhere, even in this age.
• #### qualitative versus quantitative dishonesty (Score:2)

Oh, I completely agree that the use of "correlation does not imply causation" to dismiss the possibility of causation is a *huge* fallacy, and deserves to be called out. However, it's a qualitative fallacy, whereas yours is quantitative one. To assign a numerical probability to something when you have absolutely zero understanding of what the actual probabilities are is to be intellectually dishonest in a manner that brings nothing meaningful to the discussion and is likely to confuse the issue even furth

• #### some notes (Score:2)

the most obvious problem with your postulate is that it doesn't take the p-value into account. if i find correlation with p-value 0.00001, then the "likelihood" of it being chance should be lower than if the p-value was 0.1.

anyway, you're not really saying anything new. if you thought through what you are saying, you'd probably end up with bayesian inference or a more esoteric variant such as the dempster-shafer theory of evidence [wikipedia.org].

in short, you need to establish the prior probability of each of your hypothe

• #### Probabilities pulled from posterior (Score:2)

the most obvious problem with your postulate is that it doesn't take the p-value into account.

Anything quantitative about it (the "25%") is hyperbole, I admit. It's mostly directed at people who abuse "correlation does not imply causation" to imply "if causation has not already been proved, and if investigating it costs more than zero, then it should not be investigated". In addition, news sources that aren't paywalled tend to forget to report p-values.

anyway, you're not really saying anything new

I'm aware of that. Sometimes I have to repeat old things because new users haven't yet seen the old works.

and then evaluate the posterior probability

Which a lot of people unfamiliar with Bayes

• #### Re: (Score:2)

re posterior: people familiar with bayesian inference have the same objection. still, it's at least slightly better to establish prior probabilities which are then updated by seeing the evidence. what you're doing is saying that, whatever the data was, it's 25% across the board. if you ever want to get past this, you'll need something like bayesian inference or dempster-shafer.

re causality: my only point was that you have "A causes B" and "B causes A" as mutually exclusive categories. they aren't.

in total,

• #### Re: (Score:2)

i see you've changed your sig to something more reasonable; thank you.

i still don't like "chance," since the whole point of statistics is to rule out certain kinds of chance. there are also details like "A causes Z which causes B," and so on, and i think "A causes B and B causes A" is also possible.

• #### Where can I find the original? (Score:2)

I'm here but I'm confused

Where's the original discussion that led to this thread?

• #### Read the summary (Score:2)

From the entry:

In this post [slashdot.org], Immerman wrote

Taco Cowboy wrote:

Where's the original discussion that led to this thread?

It was a reply to a signature, and I had installed the signature after having seen numerous abuses of "correlation does not imply causation" in Slashdot comments. I apologize that I can't provide the URLs of all these comments.

• #### Not enough data (Score:2)

"but absent other evidence, one has to assume so"
and that assumption has been the downfall of many papers. It's also the same argument used to prop up things like acupuncture, homeopathy, chiropractors, and perpetual motion machines. "We observe X, can't explain it, therefore are pet solution must e the answer."

You simply to not have enough data to make any percentage guess.

• #### Or 0 percent (Score:2)

Yet too many people assume 0.00 percent for A->B, B->A, and C->A and C->B, and 100.00 percent for chance, even if there exist data otherwise.

I am not now, nor have I ever been, a member of the demigodic party. -- Dennis Ritchie

Working...