People keep saying this, and meanwhile, AI keeps getting better, because, surprise surprise, (A) the data sources that get weighted the heaviest are those with the highest quality-filters**, (B) trainers impose their own filters, (C) preexisting datasets continue to exist and can be used at will, and (D) it's entirely a myth that synthetic data is harmful; some degree (indeed, increasing degrees) of synthetic data are quite useful, so long as some fresh data continues to enter the system.
Re, D: put a group of scientists from around the world on a well-stocked desert island to debate an issue of interest for a month. Do you think they'll come out dumber? No, of course not. Synthesis, bouncing ideas off each other and learning from that, is absolutely a way to learn, to draw new conclusions from preexisting knowledge. You may know "Blue whales are mammals", and "mammals make milk", and through synthesis deduce "blue whales make milk". Etc.
Now, if you put said scientists on a desert island for millennia (let's pretend they're immortal, and ignore the other issues with the analogy), with writing things down being forbidden, and no new sources of information: yes, there will be loss of information over time, eventually offsetting what they gain from synthesis. Their minds will still be coherent, but facts will slowly leak out of the system. New information input into the system is also important.
Re: **(A), much of the internet is in effect filtered. Look for example at this website, which isn't at all remarkable. Yes, sometimes spam bots make it in in the comments, but they eventually get kicked out. Even when they get in, they get modded down. Article submissions are also moderated by editors. Now, an AI might do such a good job with its comments or submissions that it doesn't get noticed, but if so, so what? If it's doing as good or better than humans - and it's disadvantaged, by probably coming from a limited subset of IPs, maybe having a recognizable personality, etc - then GREAT, sounds like good training data.
Maybe I'm an AI right now that's been given old hacked Slashdot users' accounts as part of a botnet, tasked to try to mimic their past personalities while trying to convince other users to support AI development. And maybe I'm mentioning this fact to try to throw you off the mark so that you don't think it's true.