The same thing was said about personal data, and yet we have GDPR and it is effective at stopping abuse.
Different domain. You're talking about above-board, public, disclosed activity on data. It's (relatively) easy to regulate entities doing that.
This is about non-commercial black activity on data. You should be comparing model generation and sharing to how effectively we've stopped private, illegal generation and sharing of digital music, digital images, digital (text) fiction, and digital video. Even in the case of (nominally) protected content. Look at Youtube, for instance; anything played once on that site can be trivially converted into a local, 100% sharable media file. And that happens constantly, to (nominally) protected Youtube content. Same with audio. Play something on Pandora, Spotify, whatever... trivially recorded, duplicated, shared. Same with text. Creation and sharing of illegal copies of copyrighted books is rampant. There are whole websites out there full of them, too. Technical mitigations? HDCP (of course) failed to stop copying of high resolution performances. The resolution, such as it was, was to stream stuff to consumers so inexpensively that it became not worth the effort; although that's becoming less so. Still, illegal sharing of protection-broken movies exists.
Personal experience here: I'm a musician, and I love music. I also have a good sense of how paying for music, particularly in buying legitimate physical recordings, supports musicians, something I am all in on. So recently, I undertook to convert my thousands of CDs to digital form to allow more ready and direct access to my music library. Huge job, took me quite a few months and I even wore out a CD drive in the process, but it's done now — it's much easer doing the new purchases as they arrive, of course. Anyway, the entire time I was doing this, every person who knew I was doing it asked the same types of questions, and they all boiled down to "why don't you just rip these performances?" And yes, I could have. It's trivial to do so. It could have been automated, which would have saved me literally months of effort swapping CDs, archiving, etc. I even had an up to date database of every song in my library. This wasn't really an eye-opener for me — I know people illicitly copy and share music constantly and think nothing of it — but it's revealing with regard to illicit sharing of desirable digital content of any type, and makes a point that aligns with the ones I'm making here. If it's easy, and there's a perceived benefit, significant numbers of interested people will do it and think nothing of it. Illegal or not.
Back on point: Keeping in mind the cost of making these ML models gets lower the more automated it is, and the fewer processing hoops the modeler is jumping through to keep lawyers from having a feeding frenzy (as in, zero), and the faster and more capable the computers get (although they need go no further than they already have to be practical.) Again, right now, practical model generation can be done on a moderately robust desktop machine. Or several, if deeper pockets are at hand. Distribution can be anything from dark web hosting to entity one handing, or sending, a simple .zip to entity two...n. Online files. Flash drives. Encryption. Etc. Digital files can be and are duplicated with the stroke of a pointing device or keyboard. Put another way, the means of production and distribution simply aren't controllable.
...clearly, we can legislate to control how AI models are developed.
Not effectively in the case of non-commercial / non-public entities. Which is where the actual issues lies.
I am aware of multiple privately-generated models that used decidedly dodgy data already in the wild. They run locally on uncensored, private engines. That's without much "it's illegal" or "oh no, lawsuits" stimulation as yet. They're very high quality. Development continues apace. That's where we are already.
Consider that the drug war (an almost-identical repeat of the complete failure of prohibition one on alcohol, because learning from history is apparently difficult, sigh) not only didn't reduce the various drug issues with legislation and draconian enforcement attempts, it created the cartels, caused gang wars, generated enormous and enormously profitable black markets, and widespread illegal, sub-rosa, drug manufacture everywhere from relatively sophisticated labs to bathtubs and flowerpots. Note the copious (stupid) body of law nominally aimed at controlling this, complete with horrific penalties. Note also the abject failure of said attempts (and the huge waste of money) in trying to control the issue. Note also that setting up drug manufacturing, even low-quality drug manufacturing, is a great deal more difficult, expensive, and skill-dependent than putting a desktop computer to work on data collection and processing. And keep in mind that in large part, the data we're talking about is "the Internet", something designed to be easily and quickly accessible. And that one software engine, created by one programmer or programming team, is all it takes for anyone who wants to get into this to do so. And that software is already out there.
Same for sex workers. Laws that attempt to control this abound. Availability remains high, pervasive, and that without the ease of private digital generation and distribution (although digital advertising and digital sex work are both common.) Informed, consensual and otherwise.
Consider the load of spam that ends up in our emailboxes, both for at least somewhat legal commerce, and comprising phishing, viruses, worms, etc. Even with a completely known, fairly easily to regulate channel, the data collection and use, and the result, has grown from a minor flow to an incredibly pervasive one. Why? Because the people doing it want to do it, and they don't care what anyone else wants. And that's without the compliance of the people on the receiving end, unlike drug users, customers of sex workers, and of course, users of uncensored, broadly trained ML models.
Consider the constant barrage of security attacks on websites; I maintain a number of web servers, and the logs are crammed with attack attempts. Both automated and one-off attempts to hack. They range from hilariously clueless to frighteningly sophisticated. The problem is huge despite it being illegal in most venues. People (and governments) want to do this, because... well, various motivations. It's easy. So they do it. It's really just that simple.
And then there are the national actors. No country with competent leadership will allow itself to fall behind here. There is also the fact that governments are already deeply engaged in data collection from readily available sources. Books, the Internet, financial data, images, motion recordings, live cameras both still and motion from doorbell cameras to every webcam ever to every government surveillance instance... we may be certain that will all go into ML systems and come out as readily utilizable government power.
The beatings will continue until morale improves.