Comment Re:Holy shit, the logic fail here. (Score 1) 38
What you describe is essentially a form of bootstrapping, which is a legitimate statistical method. However, there are important limitations that cannot be overlooked.
First, the constructed data are still being created from real data. Ethics is not just about preserving patient privacy, although that is a very important aspect. It's also about taking into consideration how the data will be used. Does the patient consent to this use, and if they are unable to consent, how should this be taken into consideration? Medical science has not had a stellar track record with respect to ethical human experimentation (e.g., Henrietta Lacks, the Tuskegee syphilis study, MKUltra--and that's just in recent US history). There is a documented history of patient collected data being used in ways that those patients never even conceived, let alone anticipated or consented. Caution must be exercised whenever any such data is used, even indirectly.
Second, this kind of simulated data is problematic to analyze from a statistical perspective, and any biostatistician should be aware of this: there is no such thing as a free lunch. The problem of missing data--in actual patients!--is itself difficult to address, since methods to deal with missingness invariably rely on various strong assumptions about the nature of that missingness. So to make inferences on data that is entirely simulated is, at the very least, as problematic as analyzing partially missing data.
Third, the current state of LLMs, and their demonstrated tendency to distort or invent features from noise (which is arguably the primary mechanism by which they operate), is such that any inferences from LLM-generated data would be questionable and should not be considered statistically meaningful. It could be used for hypothesis generation, but it would not satisfy any kind of statistical review.
It all comes back to what I said in another comment: you can't have it both ways. If you can draw some statistically meaningful conclusion from the data, then that data came from real-world patients and must pass ethical review. If you don't need ethical review because the data didn't come from any real patient, then any inferences are dubious at best, and are most likely just fabrications that cannot pass confirmatory analysis.