FSF Threatens Anthropic Over Infringed Copyright: Share Your LLMs Freely (fsf.org) 54

Posted by EditorDavid on Monday March 16, 2026 @01:43AM from the free-as-in-freedom dept.

In 2024 Anthropic was sued over claims it infringed copyrights when training LLMs.

But as they try to settle, they may have a problem. The Free Software Foundation announced Friday that Anthropic's training data apparently even included the book "Free as in Freedom: Richard Stallman's Crusade for Free Software" — for which the Free Software Foundation holds a copyright. It was published by O'Reilly and by the FSF under the GNU Free Documentation License (GNU FDL). This is a free license allowing use of the work for any purpose without payment.

Obviously, the right thing to do is protect computing freedom: share complete training inputs with every user of the LLM, together with the complete model, training configuration settings, and the accompanying software source code. Therefore, we urge Anthropic and other LLM developers that train models using huge datasets downloaded from the Internet to provide these LLMs to their users in freedom.

We are a small organization with limited resources and we have to pick our battles, but if the FSF were to participate in a lawsuit such as Bartz v. Anthropic and find our copyright and license violated, we would certainly request user freedom as compensation.
"The FSF doesn't usually sue for copyright infringement," reads the headline on the FSF's announcement, "but when we do, we settle for freedom."

FSF Threatens Anthropic Over Infringed Copyright: Share Your LLMs Freely

Post Load All Comments

Search 54 Comments Log In/Create an Account

Comments Filter:

- Re:Ducks (Score:5, Informative)
  
  by thecombatwombat ( 571826 ) writes: on Monday March 16, 2026 @02:04AM (#66043602)
  
  That is not a quote from Stallman.
  That is from a statement from Krzysztof Siewicz, and I would assume it's just an odd turn of phrase from someone who mostly speaks something other than English.
  RTFA is alive and well.
  
  Reply to This Parent Share
  Flag as Inappropriate
  - Re:Ducks (Score:5, Funny)
    
    by phantomfive ( 622387 ) writes: on Monday March 16, 2026 @08:52AM (#66043858) Journal
    
    RTFA is alive and well.
    Has right to read, doesn't use it. A true American. I salute him.
    
    Reply to This Parent Share
    Flag as Inappropriate
- Re:Ducks (Score:5, Insightful)
  
  by sg_oneill ( 159032 ) writes: on Monday March 16, 2026 @02:14AM (#66043604)
  
  Presumably it means they are demanding the models be released under a free license.
  Heres the thing with RMS. He's always tended to be the most "extreme" of the free/open source advocates, but he's had a history of being right as well. A lot of those "extreme" predictions have ended up being dead on the money.
  The only place I think the FSF ever really fucked up with the AGPL license which has been basically used as a sort of shareware license by server software devs. But with the gobsmacking amount of contributions the FSF has made to software, you can forgive maybe that one screw up.
  
  Reply to This Parent Share
  Flag as Inappropriate
  - Re:Ducks (Score:5, Insightful)
    
    by Tom ( 822 ) writes: on Monday March 16, 2026 @03:17AM (#66043640) Homepage Journal
    
    That is the problem. "Right to read" was visionary and will really soon be reality.
    Given how much capitalism insists on copyright and prosecution when it comes to THEIR works, how they get custom-made laws like the DMCA passed just to protect their rights... well, let's just say that if the big AI models weren't from the corporate sector but had been created by nerds on github, the copyright police would already have broken down our doors to arrest us all for copyright infringement.
    So please, please, pretty please, let them have a dose of their own medicine. Heck, let the courts classify LLMs as "software" and find just one instance of the training data containing GPL3 content. Whoopsie, all your code belongs to us.
    
    Reply to This Parent Share
    Flag as Inappropriate
    - Re:Ducks (Score:4, Insightful)
      
      by serafean ( 4896143 ) writes: on Monday March 16, 2026 @07:55AM (#66043784)
      
      > Heck, let the courts classify LLMs as "software" and find just one instance of the training data containing GPL3 content. Whoopsie, all your code belongs to us.
      This could get a bit more interesting: considering the current "human authorship requirement" for copyright, which currently stands [1], AI generated code might not be copyrightable at all. Essentially making every vibe coded file part of the public domain.
      One thing I don't think know is whether it is clear yet that LLMs sucking up everything is considered "fair use" and transformative.
      [1] https://www.morganlewis.com/pu... [morganlewis.com]
      
      Reply to This Parent Share
      Flag as Inappropriate
      - Re:Ducks (Score:4)
        
        by FirstNoel ( 113932 ) writes: on Monday March 16, 2026 @03:29PM (#66044558) Journal
        
        The ultimate endpoint of vibe-coding. No AI code is copyright-able. it's all GPL by default. That sounds like a great idea. I would support that.
        You can compile it, use it, copy it, sell it, improve it, release the source...keep going...If people want to compile and use it themselves..so be it.
        
        Reply to This Parent Share
        Flag as Inappropriate
        
        Re: (Score:2)
        
        by kwelch007 ( 197081 ) writes:
        
        Not quite. AI Generated code is not copyright-able at all. It's not GPL or anything else by default, and can't be under any license I'm aware of. It could still be intellectual property/trade secret though. Nobody is required to release AI generated code, but nothing is necessarily stopping the AI/LLM from generating that same code for someone else.
      - Re: (Score:2)
        
        by sg_oneill ( 159032 ) writes:
        
        Well kind of. It becomes *illegal* if its not GPLed. That doesnt give the end user the automatic right to GPL the model though. Its up to the model creator to either withdraw the model or GPL the model.
        It doesnt make the models outputs GPL either, as model outputs cant be copyrighted, and no copyright means no GPL. And public domain puts no obligation at all on the holder.
        Then it gets worse when you consider its all muddled up with GPL incompatible code.
        And to add to that giant hairball, is the result of th
  - Re: (Score:1)
    
    by CAIMLAS ( 41445 ) writes:
    
    Can you name one of his extreme (unreasonable) stances which was correct?
- Re: (Score:2)
  
  by DamnOregonian ( 963763 ) writes:
  
  I imagine you can infer, no?
  Presumably, the ask/demand is to release the weights of the model, and possibly its training regime so that it can be replicated.
  Frankly, it IS kind of a weird ask.
  
  So the lawsuit is about book piracy. It's not that Anthropic used copyrighted data to train it's models- it's that it pirated books (downloaded to a computer without license)
  If they had been legal copies of the books, what Anthropic did with them would have been legal (under current jurisprudence, it's fair use)
  T
- Re:Ducks (Score:4)
  
  by SumDog ( 466607 ) writes: on Monday March 16, 2026 @12:57PM (#66044318) Homepage Journal
  
  The OSI has allowed a lot of these models to be classified as "open source." You can download the weights for free off Hugging Face, and even add additional training and re-upload them. But the trouble is, what does that even mean in terms of "open source?"
  
  For a model to be truly open, you'd need to publish all the training data and the steps needed to reproduce the build (training). NONE of the current models can be called open source because nearly all of them are trained on proprietary data that can't be republished. That's the big issue with all models free (as in beer) and walled off behind subscriptions.
  
  Model developers are trying to claim these are some kind of "clean room." You train the model, it keeps a bunch of weights but can't reproduce the original training data, and it magically produces new stuff based off that data that's not exactly like it. It's what can allow open-source software to be "rewritten" as something less-open.
  
  The trouble is, some of the closed models can make reproductions of copyrighted works that are 90%+ the same as the original, like chapters from Harry Potter:
  
  https://arxiv.org/abs/2505.125... [arxiv.org]
  
  I have a suspicion the Anthropic models are much larger than people think; like 650GB+ in active vram on these megaclusters. If they do have that many weights/nodes ... are we just talking about mp3 compression and that point?
  
  We're in pretty dangerous waters here honestly. The AI-zealots don't seem to understand that, or even what they're using. When I say "Weighted Random Code Generator" a friend of mine refused to continue talking to be because I used a "slur."
  
  Reply to This Parent Share
  Flag as Inappropriate
  - Re: (Score:2)
    
    by Gilmoure ( 18428 ) writes:
    
    Wasn't one of the first open source scrapes of teh internets (text and images I think) around 80TB to download? Can see smaller IT businesses, labs, universities able to use something in that range. Lost of schools had decent size clusters.
  - Re: (Score:1, Insightful)
    
    by innocent_white_lamb ( 151825 ) writes:
    
    "some of the closed models can make reproductions of copyrighted works that are 90%+ the same as the original, like chapters from Harry Potter:"
    Some people can probably quote a chapter from Harry Potter too.
    It doesn't mean those people shouldn't be allowed to read any books.
âoeUse of the work for any purpose without pa (Score:5, Insightful)

by khb ( 266593 ) writes: on Monday March 16, 2026 @02:24AM (#66043612)

I think the relevant language actually is âoe This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free softwareâ because the LLM is a derived work, thus arguably must be free âoein the same. senseâ
If it really was permissive as described thereâ(TM)d be no basis to make the demands described.

Reply to This Share
Flag as Inappropriate
- Re:âoeUse of the work for any purpose without (Score:5, Funny)
  
  by 93 Escort Wagon ( 326346 ) writes: on Monday March 16, 2026 @02:49AM (#66043626)
  
  This License is a kind of "copyleft"
  As opposed to all of the LLMs, which use more of a "copytheft" license.
  
  Reply to This Parent Share
  Flag as Inappropriate
- Re: (Score:3, Interesting)
  
  by JaredOfEuropa ( 526365 ) writes:
  
  The real question is: is the output of an LLM trained on his work really a derivative work? If I read the book and use what I learned from it in my (paid) work, maybe even quoting from it, does that constitute a derivative work? Or did I just violate the terms "use of the work for any purpose without payment"? Neither part seems legally enforceable.
  - Re:âoeUse of the work for any purpose without (Score:4, Interesting)
    
    by Baron_Yam ( 643147 ) writes: on Monday March 16, 2026 @07:25AM (#66043772)
    
    IP has always been an attempt to have it both ways for our general benefit.
    Of course, greedy people had to come along and ruin that, and it would be extremely ironic if the attempt to prevent corporations having all the IP rights and average citizens having none achieved the opposite. Just imagine what happens if the current IP system gets extended into meat; if you study copyrighted material, you can never work again on anything that might be considered a product of that knowledge without paying a license to the IP owner.
    
    Reply to This Parent Share
    Flag as Inappropriate
  - Re:âoeUse of the work for any purpose without (Score:4, Interesting)
    
    by phantomfive ( 622387 ) writes: on Monday March 16, 2026 @08:57AM (#66043870) Journal
    
    If I read the book and use what I learned from it in my (paid) work, maybe even quoting from it, does that constitute a derivative work?
    
    The modern approach is to use the abstraction/filtration/comparison test [zerobugsan...faster.net] to figure out which parts are derived (including the quote) and which parts are original. Once the derived parts are determined, the defendant can assert a "fair use" defense if desired, and the courts will decide.
    
    Reply to This Parent Share
    Flag as Inappropriate
  - Re:âoeUse of the work for any purpose without (Score:5, Insightful)
    
    by Pinky's Brain ( 1158667 ) writes: on Monday March 16, 2026 @09:35AM (#66043928)
    
    That's not the real question, that's a silly distraction. There are a ton of literal copies made long before the LLM outputs anything to users.
    If training is fair use, the final output is too. Bartz v. Anthropic ruled it fair use, which I think was insane ... but what judge will cripple a multi-trillion dollar industry over sanity? Need some pretty big balls.
    
    Reply to This Parent Share
    Flag as Inappropriate
  - Re:âoeUse of the work for any purpose without (Score:5, Insightful)
    
    by DarkOx ( 621550 ) writes: on Monday March 16, 2026 @09:48AM (#66043944) Journal
    
    One thing to consider is when you quote/sample/cite facts from some other work, it static. You might have read the entire thing, but your paper will only ever have those two quotes, in it.
    The model it self continues to be used generate outputs over and over again, and may eventually write out quite a lot of the original work.
    but but but.. the model does not contain the originals works.. Well that is true and it isn't. Yes it might be just a bunch of tokens and weights, but PCM is just a bunch integer representations of amplitude values for a wave form at intervals, not the original wave form, nor can it produce the original analog wave as pickup by say a mic exactly; yet nobody would argue that if I feed my phono outputs to my PC sound card and produced a wav that it is not or less infringing then if I copied a cd directly.
    Just because you crank your mp3 compression down to 32kbps, and it sounds like crap does not magicly make your CD rip non-infringing either, even though it is very loss-y.
    A real question is how loss-y is so loss-y the original is no longer represented, because I think you could argue a lot of these ML models are effectively really really loss-y encodings of the the entire library they are trained on.
    Anyway fingers crossed the FSF wins this one. I can't think of few developments that would be more 'exciting' then for the courts to rule models fundementally infringe on their training content and can't be commercialized unless they are trained entirely on public domain and gratis licensed content, or on content entirely owned or appropriately licensed by the developer. Essentially ending frontier models would sell a ton of pop-corn!
    
    Reply to This Parent Share
    Flag as Inappropriate
    - Re: (Score:1)
      
      by innocent_white_lamb ( 151825 ) writes:
      
      "The model it self continues to be used generate outputs over and over again, and may eventually write out quite a lot of the original work."
      Sure. I might use a textbook over and over again to figure out various things over the course of time too.
      I have a bookcase for that exact purpose.
      - Re: (Score:2)
        
        by DarkOx ( 621550 ) writes:
        
        Right but the point is the model is more like the textbook than the paper or speech or whatever with the citations/quotes.
        If you bought a copy of a given textbook and produced a similar text or even broader text with most of a given topic using that original text as a principle source, you'd almost certainly violate the copyright.
  - Re: (Score:3, Insightful)
    
    by StormReaver ( 59959 ) writes:
    
    If I read the book and use what I learned from it in my (paid) work....
    You are making the common, now-classic, mistake of thinking that LLMs learn rather than copy verbatim. If you "learned" (memorized) Harry Potter, then regurgitate it for profit, that is most definitely a derivative work. That is how LLMs work, despite LLM-sellers' protestations to the contrary. They are storage/retrieval copyright infringement engines.
    - Re: (Score:2, Informative)
      
      by Zero__Kelvin ( 151819 ) writes:
      
      You are making the classic make of believing you have a clue about how LLMs work without having any actual expertise. Ironically you are merely parroting a bunch of misinformation you have read. LLMs are neural networks and they are trained. They don't "copy verbatim" by any stretch of the imagination. It is very clear that you have never actually used them, because nobody who has done so would ever make the egregious mistake of believing the bullshit you are selling. How do you suppose AI powered dron
      - Re: (Score:2)
        
        by StormReaver ( 59959 ) writes:
        
        They don't "copy verbatim" by any stretch of the imagination.
        
        That certainly explains why A.I. researchers were able to get one of the LLMs to emit almost the entirety of a book by prompting it with a few paragraphs. Oh wait. No it doesn't.
        
        Re: âoeUse of the work for any purpose withou (Score:2)
        
        by Zero__Kelvin ( 151819 ) writes:
        
        I never said that they aren't capable of doing that task. A school child can do that given access to the book. That doesn't mean a school child can't think or learn. It does mean that you are having quite a bit of doing so though.
  - Re: (Score:3)
    
    by tlhIngan ( 30335 ) writes:
    
    The real question is: is the output of an LLM trained on his work really a derivative work? If I read the book and use what I learned from it in my (paid) work, maybe even quoting from it, does that constitute a derivative work? Or did I just violate the terms "use of the work for any purpose without payment"? Neither part seems legally enforceable.
    Depends on how much you want to rely on AI to "launder" licensing.
    If I train an AI on the Linux source code, then ask it to produce a Linux-like OS based on what
  - Re: (Score:2)
    
    by Junta ( 36770 ) writes:
    
    You aren't an LLM, so you're reading and learning is *not* the same as LLM ingest of the material, regardless of what the AI companies want to say. Also, quoting is very specifically laid out in terms of what is 'fair use' or not.
  - Re: (Score:2)
    
    by allo ( 1728082 ) writes:
    
    That's not the question here. The question here is, if the model itself is a derivative work.
- Re: Go Anthropic! (Score:2)
  
  by madbrain ( 11432 ) writes:
  
  The lawsuit wasn't brought by Stallman.
  - Re: (Score:1)
    
    by ichthus ( 72442 ) writes:
    
    Stallman wrote the book in question, and is the founder of the FSF. It's entirely appropriate to reference him in commenting on this article. Don't be such an apologist.
    - Re: Go Anthropic! (Score:2)
      
      by madbrain ( 11432 ) writes:
      
      I don't know how you got any kind of apology from my comment.
    - Re: (Score:2)
      
      by unixisc ( 2429386 ) writes:
      
      No, the book was written by one Sam Williams. RMS had inputs into the second edition, but that still didn't make him the author. Not quite the same as ESR's "The Cathedral and the Bazaar"
- Re: (Score:2)
  
  by ichthus ( 72442 ) writes:
  
  Maybe grok didn't freely poach the free book that's free to use for any purpose without payment (you know, free).
- Re:Afraid to go after grok? (Score:4, Funny)
  
  by oumuamua ( 6173784 ) writes: on Monday March 16, 2026 @11:42AM (#66044164)
  
  Given their stand against the Department of War, Anthropic are the Good Guys among the big LLM providers (esp with Ilya Sutskever still in stealth mode). And since the War Department is trying to squash them - supply chain risk - they don't really need any other distractions. You're kicking them while they are down.
  
  Reply to This Parent Share
  Flag as Inappropriate
- Re: (Score:2)
  
  by tizan ( 925212 ) writes:
  
  Because grok is not as good as anthropic in many fields....If i get free anthropic i'll use it most probably ....free grok not so much I suspect.
Good luck with that (Score:1)

by diffract ( 7165501 ) writes:

If proprietary work is stolen willy nilly to train LLMs, what chance does a free foundation have against these AI giants?
- Re: (Score:2, Insightful)
  
  by ichthus ( 72442 ) writes:
  
  If a free work is freely used to freely train these LLMs about freedom, what chance does a "free" foundation have against these AI giants?
  
  ftfy
  
  Don't get me wrong -- I love Linux and all free software, and think the FSF has its place. But, they went a little coo-coo with GPL v3 (Linus is right), and this situation just further illustrates their chronic hypocrisy.
And just like that . . . (Score:4, Interesting)

by quonset ( 4839537 ) writes: on Monday March 16, 2026 @06:38AM (#66043726)

Copyright is good.
The book is over fifteen years old. How much longer should it be protected? At least that's the argument we hear on here all the time.

Reply to This Share
Flag as Inappropriate
- Re:And just like that . . . (Score:5, Insightful)
  
  by serafean ( 4896143 ) writes: on Monday March 16, 2026 @07:59AM (#66043790)
  
  Copyleft has always been about twisting/hacking copyright laws in favour of the end users/people instead of corporations.
  This is a case of playing by using the existing rules to win, even those rules that you campaign against.
  
  Reply to This Parent Share
  Flag as Inappropriate
- Re: (Score:3)
  
  by msauve ( 701917 ) writes:
  
  It doesn't matter what should be, only what is.
- Re: (Score:2)
  
  by SvnLyrBrto ( 62138 ) writes:
  
  Yes. Absolutely. Let's look at that. So... copyright 2002? Then it should have become public domain, 17 years after publication, seven years ago. And no, the term length for patents should NOT have been extended either.
  Any other questions?
  - Re: (Score:1)
    
    by wed128 ( 722152 ) writes:
    
    where do you get 17 years from publication? according to the copyright office, it's "life of the author + 70 years"
    
    https://www.copyright.gov/help... [copyright.gov]
    - Re: (Score:2)
      
      by iNaya ( 1049686 ) writes:
      
      Probably misremembering the original copyright statute: Statute of Anne [wikipedia.org]
      It was 14 years, extendable to 28 if the author was still alive at the end of the first 14 years.
GNU Virility Thought Experiment (Score:5, Insightful)

by thesandbender ( 911391 ) writes: on Monday March 16, 2026 @10:22AM (#66043992)

I'll preface this by saying I don't think LLM creators should be able to use content without permission/license. This is just an interesting discussion.

LLMs generally do not reproduce text. They can be made to do so with specifically crafted prompts but no current LLM is just going to regurgitate "Free as in Freedom" unless asked to do so. Instead it will use statistical matching to apply the text to probable matches, a very crude version of what we do. LLMs are starting to approach the way we meat sacks use books. We take in the information and then we apply it to problems. Where do we cross the line? Where do we say anything (or anyone) who is trained on (has read) this material is now required to do their work for free because they have the knowledge from that book as part of their training set?

It seems a little preposterous but that's where this is headed logically. It's shifting from "You can't reproduce this book." to closer to "You can't use the knowledge in this book except under the conditions we dictate." That's dangerous.

Reply to This Share
Flag as Inappropriate
They might not have a leg (Score:2)

by yurikhan ( 1922146 ) writes:

Taking things literally: GFDL allows the User to use the Original Work for free for any Purpose. Substitute User=Anthropic, Purpose=Training.
The training set, possibly the model weights blob, and maybe even the server that takes API requests and streams the responses back to the clients would be Derived Works. So any User2 who receives them may ask for Corresponding Source.
Problem is, that set of User2 is a singleton, namely, { Anthropic }. Actual users do not receive the weights or the server.
For PR (Score:2)

by allo ( 1728082 ) writes:

That's just PR. To enforce the license, OpenAI would need to be required to respect copyright. That means the "transformative use" defense would have to fall first. And then there are way bigger players who start sueing.
If the purpose is to FDL the model (Score:2)

by Digital Avatar ( 752673 ) writes:

They're going to fail miserably. Reason being that this has already been adjudicated when Facebook got caught hoovering up tons of books to train their own AI. In their case they had torrented a bunch of books so they committed copyright infringement, but the act of incorporating them as training data into an LLM was not copyright infringement, as that was fair use. The same happened with Anthropic where they downloaded a bunch of books and thus engaged in copyright infringement, but the incorporation into
May have epic implications (Score:2)

by Rudd-O ( 20139 ) writes:

If the Free Software Foundation wins this lawsuit, it would be cataclysmically game-changing for open artificial intelligence.
Of course, what is the likelihood that the license (that the lawsuit brings as a cause for dispute) prevails in court, when so many people with so much power and clout *want* copyright not to "be true" when it does not serve them? Another commenter rightfully pointed out that Facebook and Anthropic both committed blatant copyright infringement, but surprise surprise, when THEY do it
- Piracy is killing the industry (Score:1)
  
  by Rudd-O ( 20139 ) writes:
  
  Remember: only the little people go to prison or pay a fine for downloading vidya.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

FSF Threatens Anthropic Over Infringed Copyright: Share Your LLMs Freely More | Reply Login

Re:Ducks (Score:5, Informative)

Re:Ducks (Score:5, Funny)

Re:Ducks (Score:5, Insightful)

Re:Ducks (Score:5, Insightful)

Re:Ducks (Score:4, Insightful)

Re:Ducks (Score:4)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re:Ducks (Score:4)

Re: (Score:2)

Re: (Score:1, Insightful)

âoeUse of the work for any purpose without pa (Score:5, Insightful)

Re:âoeUse of the work for any purpose without (Score:5, Funny)

Re: (Score:3, Interesting)

Re:âoeUse of the work for any purpose without (Score:4, Interesting)

Re:âoeUse of the work for any purpose without (Score:4, Interesting)

Re:âoeUse of the work for any purpose without (Score:5, Insightful)

Re:âoeUse of the work for any purpose without (Score:5, Insightful)

Re: (Score:1)

Re: (Score:2)

Re: (Score:3, Insightful)

Re: (Score:2, Informative)

Re: (Score:2)

Re: âoeUse of the work for any purpose withou (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: Go Anthropic! (Score:2)

Re: (Score:1)

Re: Go Anthropic! (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Afraid to go after grok? (Score:4, Funny)

Re: (Score:2)

Good luck with that (Score:1)

Re: (Score:2, Insightful)

And just like that . . . (Score:4, Interesting)

Re:And just like that . . . (Score:5, Insightful)

Re: (Score:3)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

GNU Virility Thought Experiment (Score:5, Insightful)

They might not have a leg (Score:2)

For PR (Score:2)

If the purpose is to FDL the model (Score:2)

May have epic implications (Score:2)

Piracy is killing the industry (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals