Comment ASA "Statement on p-Values" -- Feb 3, 2017 (Score 1) 331
------------------
0.5 0.91
0.2 0.80
0.1 0.67
0.05 0.50
0.01 0.16
0.001 0.02
- Even when half a field's relationships are true, at most 91 percent of published results are true. When one-tenth of a field's relationships are true, at most 67 percent of published results are true. This is abysmal. And more, why even investigate a topic when true relationships are common. Hypothesis testing then becomes a petty activity.
What the statistician can't set, and what is never mentioned
-- the Background Probability -- is most important in most research!
"PPV depends a lot on the pre-study odds (R).
Thus, research findings are more likely true in confirmatory designs
... than in hypothesis-generating experiments." The problem becomes obvious when research seeks from 30,000 genes the (at most 30 genes) influencing a genetic disease, for which R = 30/30000 = 0.001 with a PPV about 0.02! When the Background Probability (so too R) is moderate, a design with moderate power (1 - beta) can get good PPV. But research often works in a field of previously unseen results, or uses data mining software (a good generator of false results), where R does equal 0.01 or even 0.001. In these many fields, the Background Probability (so too R) swamps any statistical design's alpha and beta. "Most research findings are false for most research designs and for most fields... A PPV EXCEEDING 50% IS QUITE DIFFICULT TO GET." Indeed, a look at the PPV formula shows that whatever alpha, even a power of 1 (a little thought reveals why more power hardly helps here) produces mostly false results if the pre-study odds R itself is less than alpha! "Claimed effect sizes are in fact the most accurate estimates of the net bias. It even follows that between 'null fields' [fields with no true relationships], the fields that claim stronger effects ... are simply those that have sustained the worst biases." "This concept totally reverses the way we view scientific results. Traditionally, investigators have viewed large and highly significant effects with excitement, as signs of important discoveries. Too large and TOO HIGHLY SIGNIFICANT EFFECTS may actually be more likely to be SIGNS OF LARGE BIAS in most fields of modern research." This article can solve the p-value problem by letting researchers continue standard hypothesis testing but with smaller alpha levels. Each journal could assign an appropriate alpha-level to reject the null hypothesis -- a large alpha for the social sciences (say, 0.1) and a smaller alpha (say, 0.0001) for genetic research.
Comment Google and Youtube feed us ourselves -- tribalized (Score 2) 108
Comment Subaru already has this in my car (Score 4, Interesting) 229
Comment Sikh is NOT Muslim (Score 1) 954
Comment Probability a paper is correct in a "FIELD" = PPV (Score 1) 174
See PLOS's "most" viewed paper,
- "Why Most Published Research Findings are False"
- by John Ionnidis
- August 30, 2005
- at PlosMedicine.org
- Why Most Published Research Findings are False
Ionnidis paper proves that
"After a research finding has been claimed based on achieving formal statistical significance,
the post-study probability that it is true is the Positive Predictive Value,"
PPV = (1 - beta) R / (R - beta * R + alpha)
= 1 / [1 + alpha / (1 - beta) R) ]
where
- alpha =
.05 usually -- the probability of a Type I error - beta is the probability of a Type II error (1 - beta is the power)
- R is the ratio of true relationships to no [false] relationships in that field
Here, for psychology, with alpha = 0.05,
PPV = 0.39 = 1 / [1 +
so
- R =
- 0.03 if 1-beta = 1 [the power for a very large sample]
- 0.06 if 1-beta = 0.5
- 0.15 if 1-beta = 0.2 [the power for a moderate sample].
That is, these psychology papers operate in a field
with around R = 0.15 true/false relationships.
Germany's Pharmaceutical Bayer found only 30 percent (PPV=0.30) of all pharmaceutical papers verifiable,
corresponding to an R = 0.11.
You can change the ratio R to
R / (1-R),
the pre-study probability the relationship is true.
Call this the "Background Probability" of a true relationship.
In the extreme though not uncommon genetics field,
research seeks from 30,000 genes
the (at most) 30 genes that influence a genetic disease, for which
R = 30/30000 = 0.001
and at this small R, PPV is then also about 0.001.
Don't lose track. There are three fractions mentioned here,
(1) R (ratio of true relationships to false relationships in the field, before experiment)
(2) Background probability = R / (1-R)
(3) PPV (after an experiment and publication, this is the probability the result as significant)
While the researchers/statisticians can set alpha = 0.05, and can get beta = 0.80, their probability meaning is clouded by their frequentist interpretation. What the statistician can't set, and what is never mentioned -- the Background Probability -- differs and is important in each research field!
When the Background Probability is moderate, a design with moderate power (1 - beta) can get good PPV. But research often works in a field of previously unseen results, or uses data mining software (a good generator of false results and tool of charlatans), where R does equal 0.01 or even 0.001. In these many fields, the Background Probability swamps any statistical design's alpha and beta. "Most research findings are false for most research designs and for most fields... a PPV exceeding 50% is quite difficult to get." Indeed, a look at the PPV formula shows that whatever alpha, even a power of 1 (a little thought reveals why more power hardly helps here) produces mostly false results if the Background Probability itself is less than alpha!
If R must be relatively large in a "field" for published results to represent true relationships, then a large proportion of relationships considered in that field are true (significant). Such a research field should be exceedingly boring. In the other extreme, in a "field" with relatively few true relationships, research produces mostly false conclusions. However, in followup studies from published results (eg, pharmaceuticals check results with further studies), R becomes large (note the conditioning). When you see that the probability published research represents a true relationship is smaller than the chance a random coin flips heads, then you quickly see the need for more followup research.
It is important to refine these ideas by bounding the term "field", not to all research, or even to biological research, but maybe to research on cancer -- involving a careful choice of bounds. This is another case revealing the importance of conditioning, if not the Conditionality Principle itself. Here, the choice of "field" affects the Background Probability, equivalently R. Since each journal represents a "field", each journal could require its own level of evidence; eg, genetics could require alpha = 0.001, and psychology could require alpha = 0.01.
The proud fool echos that psychology is effete, full of the innumerate and pompous head cases. Almost everyone else looks at psychology's 0.39 reproducibility from a Classical (frequentist) perspective, a view less than 100 years old. On the other hand, Bayes Theorem has been in use for 250 years. Moreover, whatever you do, you should not violate Bayes Theorem. The august Bayes Theorem has been mathematically proven true and confirmed over the centuries. The above PPV takes the background probability (prior) into account, a probability that is relevant though not exactly known. When you reduce Bayes Theorem to an arena with two states -- true relationships and false relationships -- the results greatly simplify and wonderously clarify.