Or, y'know, you're a white dude born March 29, 1955 in Raleigh, North Carolina. [duke.edu]. But nice try. I never said anybody got in "only" because of their family's wealth. But stress levels in the first six months of life have a huge impact on brain function. Affluence is strongly correlated to better education at all levels. Worse yet, the study is sampling from profs at US universities. And affluence in the US is strongly correlated to race. Not even bothering to look at institutions in India, Japan, China, Russia, or Korea. There's a huge amount of mathematical talent in these countries that is largely unknown to American mathematicians. Gonna be hard to tease "whitey with a Y-chromosome" out of the data.
Again, I don't argue with any of that, although it isn't 1970 any more and there's a lot more racial balance than you perhaps might think especially in the top institutions that get their pick of the best people in the world, not just the US. Our department chair is female and chinese, for example. In our department we have tenured faculty from China, Korea, India, Pakistan, Israel, as well as non-white and/or non-male faculty who are second generation plus American citizens. We're way past the point where any of this is a "token" representation, and Duke has been actively trying to increase the diversity of its faculty with new hires for decades at this point. We still suffer somewhat from damage done back in the 70's and 80's (or earlier) -- it takes 25+ years to grow a physics Ph.D., maybe 30 to get them through a postdoc and get on a tenure track, and there is still a disparity in the numbers of women that are majoring in physics and pursuing a Ph.D. but the gap is gradually narrowing.
As for whether or not it is a shit study -- sure, maybe it is. One of many things I study is how the brain learns and what sort of factors might count as "intelligence". Intelligence itself is quite difficult to define or measure, and there are numerous studies that suggest that it is remarkably easy to obtain biased results when studying it, especially if one uses a comparatively narrow definition and ignores selection biases such as the ones you describe. It is also fairly well known at this point that a person's intelligence is governed by many factors, BOTH genetic AND environmental, and that for a person to attain the sort of peak accomplishments in math or science that the study is using as an inclusion criterion, one very likely has to HAVE both factors in some sort of fortuitous constellation.
But again, the only way you can properly conclude that it is a shit study is from looking well beyond the level of description in the top article. In particular, the only way you can know is to know how they are going to manage their Bayesian priors and how they are designing the selection process beyond "pulling from top math and physics departments". And statistics (especially Bayesian statistics) is a particular area of study of mine, BTW, so I know more than a bit about what I'm talking about. The quality of their results will depend on how they utilize that which is already known and control for the very biases you list. They CAN almost certainly arrange for their sample to be well-distributed across different races and/or country of origin and gender, depending on how many participants they end up with and how deep they go into the pool of top-ranked Universities even just in the US, as the US has been pulling a large fraction of the best mathematicians and scientists of the world for close to a century just because it is a comparatively nice and safe place to live. Whether they WILL do this depends on the motivations of the people conducting the study, whether they have some hypothesis stated or otherwise that they wish to prove, their competence in statistics and methodology, none of which I can speak to.
But again, all of these things are critical factors in ANY study of genetic traits desirable or undesirable that might be conducted. If you were looking at the genetics of extreme height and had a limited budget and wanted an easy way to get a pool of very tall persons who might be easy to recruit and track, starting out with the pool of basketball forwards and centers in college and/or the pros would likely give you an easy way to get a start. Sure, the pool might be dominated by African-Americans disproportionate to the population, the part of the pool that is female might be much smaller (but growing), one might well miss whole classes of very tall persons with e.g. Marfans or Acromegaly that have associated genetic disorders that keep tall people off of sports teams, and countries like China or India or parts of Europe might be heavily underepresented, but you would have enough of a pool such that -- with care -- you could correct for at least some of those things if you come up with alternative strategies for sampling and recruitment outside of this pool later and make some effort to balance your selection process now.
If your GOAL is to prove that "black males grow taller than white males" then of course you can probably manage it with this pool as well. If you want to prove that males get to be taller than females, you can prove that too. If you want to prove that Aleutians don't genetically grow as tall as Caucasians, I'm sure one can find a way. However, any competent designer of such a study will use Bayesian priors (note well, some of these assertions are TRUE as far as we know from enormous data -- males DO get to be taller, on average, than females, no matter how you control for diet and environment) to avoid making inferences that are not warranted on the basis of what is already known.
You openly acknowledge your ignorance about study methodology, general statistics, and from the sound of it you really don't know of or understand Bayes' theorem and the correct analysis of joint and conditional probabilities in multifactorial reasoning. So perhaps you could either take the time to learn them (which isn't really particularly difficult -- the derivation of Bayes' theorem is typically in chapters one or two in most decent statistics texts and is literally only one or two lines connecting joint, conditional, and marginal probabilites, although studying Bayesian statistics per se would probably require that you buy a decent book on it and skimming through it to get a feel for how prior knowledge alters naive conclusions drawn from conditionally biased data) OR you could accept the possibility that while pulling from top US university physics and math departments COULD introduce biases, "teasing whitey with a Y-chromosome out of the data" could actually fairly easily be done both with additional selection criteria and/or Bayesian corrections afterwards. THAT'S not necessarily the problem with this study.
I'm be a lot more concerned with other weaknesses. For example, no matter how you slice it, the pool will be tiny simply because mathematical "genius" is rare and difficult to identify. Their pool is far more likely to contain representatives than the general population (which is good) just like the pool of basketball forwards and centers, but even if they grazed ALL US universities or all the universities in the world, they might come up with a pool containing only a few dozen actual "geniuses" and another few hundred people (worldwide!) of above average intelligence or mathematical accomplishment (for many reasons and of many types, because intelligence mathematical or otherwise is highly multifactorial). There are many genes. Perhaps there are indeed one or two genes (outside of obvious stupidity, e.g. Y-linked factors or racial factors with inadequate data and possible confounding factors as well) that are "different" for at least some mathematical geniuses, but it will be very difficult to get a pool large enough to be able to make such a statement as a believable conclusion unless the association is so strong it reaches out and smacks you upside the head (which I'm reasonably sure will not be the case, but if somebody wants to fund a looksee, more power to them).
In statistics there is something called "the curse of dimensionality". Even for a well-defined, narrow genetic disorder, identifying the specific genetic cofactors is difficult. For example, finding BRCA1 (one of many genes that are strongly correlated with breast cancer) is all well and good, but my wife's family has three known generations of women who have died or suffered from breast cancer, and they've tested negative for BRCA1. All that means is that they (probably, at this point) have some OTHER gene that is still unidentified in the family tree; the alternative hypothesis being that they are just very unlucky has a very low p-value at this point. Or it might be a constellation of six or seven genes, not one specific mutation. Intelligence in general (at the genetic level) is almost certainly a lot more about gene constellations, not single genes, and in order to identify a constellation of (say) five or ten genes one has to have five or ten POWERS larger sample sizes to achieve equal statistical resolution, or one has to use truly serious (and somewhat fortuitous) math to reduce the effective dimensionality in the event that there are some projective dimensions where one can obtain partial resolution of the constellation and then perhaps refine it with somewhat less.
That's what you really should be criticizing in this study. The claims made in the top article sounded to me to be overweening and extremely unlikely to be accomplished unless (as I said) one or two genes turns out to be strongly correlated indeed and fall right out at their feet, after a WELL DONE statistical analysis that accounts for trivial confounding stuff and well-known Bayesian priors. This would be equivalent to finding the math-genius equivalent of BRCA1 -- "a" causal factor, when it happens, but not sufficient to explain more than a small fraction of all of the cases, and the ones that they cannot find from such a small study with such a limited selection process might well be highly multifactorial. This struck me the instant I read the description.
The second "obvious" problem is one you've alluded to repeatedly and that is quite correct -- it is well known that intelligence is remarkably plastic and strongly correlated with numerous environmental factors, both in animal studies and in human studies. There is little doubt that there are ALSO genetic factors -- you cannot just "create a math genius" by raising a child in some sort of math-genius-optimized environment -- but on the other hand if you take a female child with math-genius-genes and raise them in a poor village in Afghanistan where they are prevented from even learning to read, you are probably not going to end up with a math genius and if you did, it would be difficult beyond all measure to detect it with any sort of test once the child reached adulthood or develop it if you could detect it. It is this that MOTIVATES their selection pool -- all or most of the people in the pool ex post facto had an adequately supportive developmental environment AND at least some genetic predisposition towards math intelligence, allowing them to at least partially control for environment. But this does not truly resolve the curse of dimensionality, because environmental factors are all additional dimensions and hence help dilute the selection pool even for single gene factors. That is, suppose there is a gene, let's call it MGG1 -- math genius gene 1 -- that is perfectly predictive of the capacity to become a math genius. However, let us also suppose that the prevalence in the population is (say) 1 in a billion, so that at the moment there are 7 individuals with this gene worldwide. Suppose further that one requires some degree of affluence in order to be able to "express" this gene and rise to actually become a mathematical genius in an academic setting (and grant that for those individuals, it is highly likely that they will do just this given the opportunity, although that is not really certain). Then you could easily lose (say) four out of the seven to global poverty before the study begins. Note well that you cannot recover these individuals by better selection criteria -- they are truly lost. Of the three that remain, perhaps two will end up in academia and have a comparatively high probability of being in the recruitment pool. So you might well have ONE OR TWO MGG1 individuals in your entire recruitment pool. Or three. Or even four. But you'll never have all seven, and given the degree of shared genetic homology across the population of humans in general, you are most unlikely to discover MGG1 even if you really do end up with (say) fifty or sixty "certified math geniuses" of all races and genders in your final selection pool.
Then there is MGG2, MGG3, and four constellations that we might call MGGs1-MGGs4 (s for spectrum) all in this same population, each with a handful of environmentally enabled representatives in the pool. Resolving them from the general population/control group is going to be insanely difficult. What they are maybe HOPING for is that our imaginary MGG1 is present in 20-30 of the group of 50-60 that they end up with. That's really very close to what it would take to make a convincing case for it, one that would justify screening a control population for MGG1 and seeing if it is predictive for strong math skills if not genius.
So sure, their study could end up being shit. Maybe they have an axe to grind. Maybe they are complete idiots and will only employ incompetent geneticists and biostatisticians. Maybe they will have a hard time recruiting people and end up with a biased sample because the "real math geniuses" in their pool don't give a fuck and decline to participate. Most likely of all, the correct conclusion of the study will be that they cannot find any likely candidate genes for "math genius" given the low prevalence of math genius worldwide, with or without a favorable environment and the probable fact that it is genetically multifactorial (leading to the curse of dimensionality). If so, it will be in the excellent company of countless genetic searches for patterns associated with diseases or syndromes that have come up short (so far). It's a hard problem. But it might not be done badly, it might not try to prove something, it might be done using what is already known about intelligence intelligently, and it might even discover MGG1 if there is such a thing, and there might be such a thing.
So once again, be cynical -- I'm cynical. Be skeptical. I'm skeptical. But I don't think it is a priori obvious that the study is going to be irretrievably flawed just because it draws from a pool of people who have demonstrated that they have SOME constellation of genetics and environment that led them to become mathematically gifted instead of attempting to tackle the impossible task of identifying people who grew up without the environmental factors needed or the difficult task of identifying the people who have both genetic and environmental factors but are working at Starbucks or are selling grilled meats on Asian street corners or even those who became computer scientists, philosophers, or financial analysts instead. Does it introduce certain biases into the selection pool? Without any doubt. But the people running the study know that unless they are so stupid that they are very much in the out group of the study they are trying to conduct. People work with similar handicaps in genetics all the time. That doesn't mean that they don't, or can't, make any progress.