When I suggested it to me sister while she was doing her teacher training, her response was but it depends on the child, with some suited to phonics and some to whole language, completely missing the point that on *average* one method might lead to higher reading ages than the other and you could perform an experiment to determine which if either was statistically better.
The trouble is getting a sample size large enough for meaningful determination. In the case of teaching phonics vs whole-language, the outcomes depend on student - meaning income, race, geography, parental engagement, preparation, etc - and on the teacher. So you really need a national-scale study, with many different teachers (each following the same syllabus), coordinated over years, with some way to make sure that students don't switch methods just because their parents move.
No one's going to pay for that. No one's patient enough to wait for this year's first graders to graduate from high school to evaluate the techniques.
People will pay to summarize anonymized student data over a few years and a handful of school districts, but that gets you the kind of hand-wavey outcomes everyone's complaining about.
You can do great physics because you can make sure that every piece of steel you test is almost identical. Do a test 5 times, and you can be confident to 4 significant figures. You can do good biology because you can make sure that your mice all have the same genotype, diet, and general environment. You can usually tell +/-10% with a couple dozen animals. Humans? You can't control their genetics or their environment. You can't trust them to finish the experiment. You can't even make many reproducible measurements, because you can't take the subjects apart when the experiment is over, and none of the cognitive tests are objective.