Comment Re:AI Incest (Score 2, Interesting) 40
Yes, "you've been told" that by people who have no clue what they're talking about. Meanwhile, models just keep getting better and better. AI images have been out for years now. There's tons on the net.
First off, old datasets don't just disappear. So the *very worst case* is that you just keep developing your new models on pre-AI datasets.
Secondly, there is human selection on things that get posted. If humans don't like the look of something, they don't post it. In many regards, an AI image is replacing what would have been a much crapper alternative choice.
Third, dataset gatherers don't just blindly use a dump of the internet. If there's a place that tends to be a source of crappy images, they'll just exclude or downrate it.
Fourth, images are scored with aesthetic gradients before they're used. That is, humans train models to assess how much they like images, and then those models look at all the images in the dataset and rate them. Once again, crappy images are excluded / downrated.
Fifth, trainers do comparative training and look at image loss rates, and an automatically exclude problematic ones. For example, if you have a thousand images labeled "watermelon" but one is actually a zebra, the zebra will have an anomalous loss spike that warrants more attention (either from humans or in an automated manner). Loss rates can also be compared between data +sources+ - whole websites or even whole datasets - and whatever is working best gets used.
Sixth, trainers also do direct blind human comparisons for evaluation.
This notion that AIs are just going to get worse and worse because of training on AI images is just ignorant. And demonstrably false.