He writes his paper and submits for publication: "Rats prefer to turn left", P 0.05, the effect is real, and all is good.
There's no realistic way that a reviewer can spot the flaw in this paper.
Actually, let's pose this as a puzzle to the readers. Can *you* spot the flaw in the methodology? And if so, can you describe it in a way that makes it obvious to other readers?
I guess I don't see it. While P 0.05 isn't all that compelling, it does seem like prima facie evidence that the rats used in the sample prefer to turn left at that intesection for some reason. There's no hypothesis as to why, and thus way to generalize and no testable prediction of how often rats turn left in a different circumstances, but it's still an interesting measurement.
Another poster got this correct: with dozens of measurements, the chance that at least one of them will be unusual by chance alone is very high.
A proper study states the hypothesis *before* taking the data specifically to avoid this. If you have an anomaly in the data, you must state the hypothesis and do another study to make certain.
You have a null hypothesis and some data with a very low probability. Let's say it's P 0.01. This is such a good P-value that we can reject the null hypothesis and accept the alternative explanation.
Can you point out the flaw in this reasoning?
You have evidence that the null hypothesis is flawed, but none that the alternative hypothesis is the correct explanation?
The scientific method centers on making testable predictions that differ from the null hypothesis, then finding new data to see if the new hypothesis made correct predictions, or was falsified. Statistical methods can only support the new hypothesis once you have new data to evaluate.
The flaw is called fallacy of the reversed conditional".
The researcher has "probability of data, given hypothesis" and assumes this implies "probability of hypothesis, given data". These are two very different things which are not always both valid.
Case 1: Probability that person is woman, given that they're carrying a pocketbook (high), Probability that person is carrying a pocketbook, given that they are a woman (also high).
Case 2: Probability that John is dead, given that he was executed (high), Probability that John was executed, given that he is dead (low).
In case 1 it's OK to reverse the conditional, but in case 2 it's not. The difference stems from the relative populations, which about equal in case 1 (women and pocketbooks), and vastly unequal in case 2 (dead people versus executed people).
Given a low P value (P of data, given hypothesis) does not in general indicate that the probability of the null hypothesis is also low (P of hypothesis, given data).