Comment What about the CC licensed exports? (Score 1) 32
StackOverflow had committed itself to give back cc licensed DB dumps. They stop providing them some time ago, justifying it with LLM training when asked and re-enabled them after they got a bit of a shitstorm for no longer giving back to the community. The site content itself is cc-licensed anyway.
This means, that if they continue to provide the dumps, people could just train on the dumps. If they do not continue, people can legally (given that attribution is provided) crawl the site itself.
And finally, one should think that one of the points in "giving back to the community" would be allowing the users to train LLM on their (own) data. The whole deal with that was users saying "We accept SO being a silo as long as the silo regularity provides us dumps of the data".