Forgot your password?
typodupeerror
GNU is Not Unix AI

FSF Threatens Anthropic Over Infringed Copyright: Share Your LLMs Freely (fsf.org) 54

In 2024 Anthropic was sued over claims it infringed copyrights when training LLMs.

But as they try to settle, they may have a problem. The Free Software Foundation announced Friday that Anthropic's training data apparently even included the book "Free as in Freedom: Richard Stallman's Crusade for Free Software" — for which the Free Software Foundation holds a copyright. It was published by O'Reilly and by the FSF under the GNU Free Documentation License (GNU FDL). This is a free license allowing use of the work for any purpose without payment.

Obviously, the right thing to do is protect computing freedom: share complete training inputs with every user of the LLM, together with the complete model, training configuration settings, and the accompanying software source code. Therefore, we urge Anthropic and other LLM developers that train models using huge datasets downloaded from the Internet to provide these LLMs to their users in freedom.

We are a small organization with limited resources and we have to pick our battles, but if the FSF were to participate in a lawsuit such as Bartz v. Anthropic and find our copyright and license violated, we would certainly request user freedom as compensation.

"The FSF doesn't usually sue for copyright infringement," reads the headline on the FSF's announcement, "but when we do, we settle for freedom."

FSF Threatens Anthropic Over Infringed Copyright: Share Your LLMs Freely

Comments Filter:
  • by khb ( 266593 ) on Monday March 16, 2026 @02:24AM (#66043612)

    I think the relevant language actually is âoe This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free softwareâ because the LLM is a derived work, thus arguably must be free âoein the same. senseâ
    If it really was permissive as described thereâ(TM)d be no basis to make the demands described.

    • by 93 Escort Wagon ( 326346 ) on Monday March 16, 2026 @02:49AM (#66043626)

      This License is a kind of "copyleft"

      As opposed to all of the LLMs, which use more of a "copytheft" license.

    • Re: (Score:3, Interesting)

      The real question is: is the output of an LLM trained on his work really a derivative work? If I read the book and use what I learned from it in my (paid) work, maybe even quoting from it, does that constitute a derivative work? Or did I just violate the terms "use of the work for any purpose without payment"? Neither part seems legally enforceable.
      • by Baron_Yam ( 643147 ) on Monday March 16, 2026 @07:25AM (#66043772)

        IP has always been an attempt to have it both ways for our general benefit.

        Of course, greedy people had to come along and ruin that, and it would be extremely ironic if the attempt to prevent corporations having all the IP rights and average citizens having none achieved the opposite. Just imagine what happens if the current IP system gets extended into meat; if you study copyrighted material, you can never work again on anything that might be considered a product of that knowledge without paying a license to the IP owner.

      • by phantomfive ( 622387 ) on Monday March 16, 2026 @08:57AM (#66043870) Journal

        If I read the book and use what I learned from it in my (paid) work, maybe even quoting from it, does that constitute a derivative work?

        The modern approach is to use the abstraction/filtration/comparison test [zerobugsan...faster.net] to figure out which parts are derived (including the quote) and which parts are original. Once the derived parts are determined, the defendant can assert a "fair use" defense if desired, and the courts will decide.

      • by Pinky's Brain ( 1158667 ) on Monday March 16, 2026 @09:35AM (#66043928)

        That's not the real question, that's a silly distraction. There are a ton of literal copies made long before the LLM outputs anything to users.

        If training is fair use, the final output is too. Bartz v. Anthropic ruled it fair use, which I think was insane ... but what judge will cripple a multi-trillion dollar industry over sanity? Need some pretty big balls.

      • by DarkOx ( 621550 ) on Monday March 16, 2026 @09:48AM (#66043944) Journal

        One thing to consider is when you quote/sample/cite facts from some other work, it static. You might have read the entire thing, but your paper will only ever have those two quotes, in it.

        The model it self continues to be used generate outputs over and over again, and may eventually write out quite a lot of the original work.

        but but but.. the model does not contain the originals works.. Well that is true and it isn't. Yes it might be just a bunch of tokens and weights, but PCM is just a bunch integer representations of amplitude values for a wave form at intervals, not the original wave form, nor can it produce the original analog wave as pickup by say a mic exactly; yet nobody would argue that if I feed my phono outputs to my PC sound card and produced a wav that it is not or less infringing then if I copied a cd directly.

        Just because you crank your mp3 compression down to 32kbps, and it sounds like crap does not magicly make your CD rip non-infringing either, even though it is very loss-y.

        A real question is how loss-y is so loss-y the original is no longer represented, because I think you could argue a lot of these ML models are effectively really really loss-y encodings of the the entire library they are trained on.

        Anyway fingers crossed the FSF wins this one. I can't think of few developments that would be more 'exciting' then for the courts to rule models fundementally infringe on their training content and can't be commercialized unless they are trained entirely on public domain and gratis licensed content, or on content entirely owned or appropriately licensed by the developer. Essentially ending frontier models would sell a ton of pop-corn!

        • "The model it self continues to be used generate outputs over and over again, and may eventually write out quite a lot of the original work."

          Sure. I might use a textbook over and over again to figure out various things over the course of time too.

          I have a bookcase for that exact purpose.

          • by DarkOx ( 621550 )

            Right but the point is the model is more like the textbook than the paper or speech or whatever with the citations/quotes.

            If you bought a copy of a given textbook and produced a similar text or even broader text with most of a given topic using that original text as a principle source, you'd almost certainly violate the copyright.

      • Re: (Score:3, Insightful)

        by StormReaver ( 59959 )

        If I read the book and use what I learned from it in my (paid) work....

        You are making the common, now-classic, mistake of thinking that LLMs learn rather than copy verbatim. If you "learned" (memorized) Harry Potter, then regurgitate it for profit, that is most definitely a derivative work. That is how LLMs work, despite LLM-sellers' protestations to the contrary. They are storage/retrieval copyright infringement engines.

        • Re: (Score:2, Informative)

          You are making the classic make of believing you have a clue about how LLMs work without having any actual expertise. Ironically you are merely parroting a bunch of misinformation you have read. LLMs are neural networks and they are trained. They don't "copy verbatim" by any stretch of the imagination. It is very clear that you have never actually used them, because nobody who has done so would ever make the egregious mistake of believing the bullshit you are selling. How do you suppose AI powered dron
          • They don't "copy verbatim" by any stretch of the imagination.

            That certainly explains why A.I. researchers were able to get one of the LLMs to emit almost the entirety of a book by prompting it with a few paragraphs. Oh wait. No it doesn't.

      • by tlhIngan ( 30335 )

        The real question is: is the output of an LLM trained on his work really a derivative work? If I read the book and use what I learned from it in my (paid) work, maybe even quoting from it, does that constitute a derivative work? Or did I just violate the terms "use of the work for any purpose without payment"? Neither part seems legally enforceable.

        Depends on how much you want to rely on AI to "launder" licensing.

        If I train an AI on the Linux source code, then ask it to produce a Linux-like OS based on what

      • by Junta ( 36770 )

        You aren't an LLM, so you're reading and learning is *not* the same as LLM ingest of the material, regardless of what the AI companies want to say. Also, quoting is very specifically laid out in terms of what is 'fair use' or not.

      • by allo ( 1728082 )

        That's not the question here. The question here is, if the model itself is a derivative work.

  • If proprietary work is stolen willy nilly to train LLMs, what chance does a free foundation have against these AI giants?
    • Re: (Score:2, Insightful)

      by ichthus ( 72442 )
      If a free work is freely used to freely train these LLMs about freedom, what chance does a "free" foundation have against these AI giants?

      ftfy

      Don't get me wrong -- I love Linux and all free software, and think the FSF has its place. But, they went a little coo-coo with GPL v3 (Linus is right), and this situation just further illustrates their chronic hypocrisy.
  • by quonset ( 4839537 ) on Monday March 16, 2026 @06:38AM (#66043726)

    Copyright is good.

    The book is over fifteen years old. How much longer should it be protected? At least that's the argument we hear on here all the time.

  • by thesandbender ( 911391 ) on Monday March 16, 2026 @10:22AM (#66043992)
    I'll preface this by saying I don't think LLM creators should be able to use content without permission/license. This is just an interesting discussion.

    LLMs generally do not reproduce text. They can be made to do so with specifically crafted prompts but no current LLM is just going to regurgitate "Free as in Freedom" unless asked to do so. Instead it will use statistical matching to apply the text to probable matches, a very crude version of what we do. LLMs are starting to approach the way we meat sacks use books. We take in the information and then we apply it to problems. Where do we cross the line? Where do we say anything (or anyone) who is trained on (has read) this material is now required to do their work for free because they have the knowledge from that book as part of their training set?

    It seems a little preposterous but that's where this is headed logically. It's shifting from "You can't reproduce this book." to closer to "You can't use the knowledge in this book except under the conditions we dictate." That's dangerous.
  • Taking things literally: GFDL allows the User to use the Original Work for free for any Purpose. Substitute User=Anthropic, Purpose=Training.

    The training set, possibly the model weights blob, and maybe even the server that takes API requests and streams the responses back to the clients would be Derived Works. So any User2 who receives them may ask for Corresponding Source.

    Problem is, that set of User2 is a singleton, namely, { Anthropic }. Actual users do not receive the weights or the server.

  • That's just PR. To enforce the license, OpenAI would need to be required to respect copyright. That means the "transformative use" defense would have to fall first. And then there are way bigger players who start sueing.

  • They're going to fail miserably. Reason being that this has already been adjudicated when Facebook got caught hoovering up tons of books to train their own AI. In their case they had torrented a bunch of books so they committed copyright infringement, but the act of incorporating them as training data into an LLM was not copyright infringement, as that was fair use. The same happened with Anthropic where they downloaded a bunch of books and thus engaged in copyright infringement, but the incorporation into

  • If the Free Software Foundation wins this lawsuit, it would be cataclysmically game-changing for open artificial intelligence.

    Of course, what is the likelihood that the license (that the lawsuit brings as a cause for dispute) prevails in court, when so many people with so much power and clout *want* copyright not to "be true" when it does not serve them? Another commenter rightfully pointed out that Facebook and Anthropic both committed blatant copyright infringement, but surprise surprise, when THEY do it

By working faithfully eight hours a day, you may eventually get to be boss and work twelve. -- Robert Frost

Working...