Forgot your password?
typodupeerror
AI Open Source Software

AI Tool Rips Off Open Source Software Without Violating Copyright (404media.co) 101

A satirical but working tool called Malus uses AI to create "clean room" clones of open-source software, aiming to reproduce the same functionality while shedding attribution and copyleft obligations. "It works," Mike Nolan, one of the two people behind Malus, who researches the political economy of open source software and currently works for the United Nations, told 404 Media. "The Stripe charge will provide you the thing, and it was important for us to do that, because we felt that if it was just satire, it would end up like every other piece of research I've done on open source, which ends up being largely dismissed by open source tech workers who felt that they were too special and too unique and too intelligent to ever be the ones on the bad side of the layoffs or the economics of the situation." 404 Media reports: Malus's legal strategy for bypassing copyright is based on a historically pivotal moment for software and copyright law dating back to 1982. Back then, IBM dominated home computing, and competitors like Columbia Data Products wanted to sell products that were compatible with software that IBM customers were already using. Reverse engineering IBM's computer would have infringed on the company's copyright, so Columbia Data Products came up with what we now know as a "clean room" design.

It tasked one team with examining IBM's BIOS and creating specifications for what a clone of that system would require. A different "clean" team, one that was never exposed to IBM's code, then created BIOS that met those specifications from scratch. The result was a system that was compatible with IBM's ecosystem but didn't violate its copyright because it did not copy IBM's technical process and counted as original work.

This clean room method, which has been validated by case law and dramatized in the first season of Halt and Catch Fire, made computing more open and competitive than it would have been otherwise. But it has taken on new meaning in the age of generative AI. It is now easier than ever to ask AI tools to produce software that is identical in function to existing open source projects, and that, some would argue, are built from scratch and are therefore original work that can bypass existing copyright licenses. Others would say that software produced by large language models is inherently derivative, because like any LLM output, it is trained on the collective output of humans scraped from the internet, including specific open source projects.

Malus (pronounced malice), uses AI to do the same thing. "Finally, liberation from open source license obligations," Malus's site says. "Our proprietary AI robots independently recreate any open source project from scratch. The result? Legally distinct code with corporate-friendly licensing. No attribution. No copyleft. No problems." Copyleft is a type of copyright license that ensures reproductions or applications of the software keep it free to share and modify.

AI Tool Rips Off Open Source Software Without Violating Copyright

Comments Filter:
  • support (Score:5, Funny)

    by awwshit ( 6214476 ) on Wednesday April 22, 2026 @01:14PM (#66106976)

    When my Malus clone fails, can I buy support from the original project? From Malus? What do you mean I'm on my own?

  • Honesty (Score:5, Insightful)

    by Himmy32 ( 650060 ) on Wednesday April 22, 2026 @01:14PM (#66106978)
    They sure don't mince words about their ethics: [malus.sh]

    Some will argue that what we do is exploitative, that we are extracting the ideas from open source while leaving behind the people who contributed them. To this I say: yes, that is a reasonably accurate description of our business model. It is also a reasonably accurate description of every company that has ever used open source software without contributing back, which is to say, virtually every company that has ever used open source software. We are simply being honest about it, and charging a fee for the privilege.

    This service is provided "as is" without warranty. MalusCorp is not responsible for any legal consequences, moral implications, or late-night guilt spirals resulting from use of our services.

    • by DarkOx ( 621550 )

      I would have to counter that argument that using a FOSS project without contributing for a for profit activity isn't great but the people who are behind project always knew that was a possibility, depending on what license they chose.

      However just being a user, especially as a corporate entity, means more exposure from the project. Even if you don't publish the fact you use it, you end up with employees who know that might recommend it to others, move on use it elsewhere, contribute themselves, provide usef

      • Re:Honesty (Score:4, Informative)

        by ceoyoyo ( 59147 ) on Wednesday April 22, 2026 @02:50PM (#66107222)

        This does none of those things, this is purely parasitic it borrows all the ideas and robs the original project of mind share.

        There is a legal mechanism for protecting ideas. It's a patent. Open source software is almost never patented, Slashdot generally comes out very much against software patents, and actual policy differs by country and goes back and forth over time. This is because the "ideas" implmented in software are usually not very novel.

        Copyright, on the other hand, protects the actual work. Copyright protects the text of Harry Potter. It does not protect the concept of kids with magic powers.

        Now you *could* argue this is true of FOSS clones, of commercial applications...

        Not can, must.

    • by drnb ( 2434720 ) on Wednesday April 22, 2026 @01:45PM (#66107060)

      Some will argue that what we do is exploitative, that we are extracting the ideas from open source while leaving behind the people who contributed them.

      How is that different from those who create a FOSS project to create a FOSS alternative to a commercial product? The process is simply less formal for these FOSS devs. Neither sides looks at original source code, both sides rely on observed behavior and reimplements that in a new way. FOSS having "noble" intentions and MalusCorp having "less-than-noble" intentions does not change this fact.

      • Re: (Score:2, Insightful)

        by Himmy32 ( 650060 )
        Weird to argue that ethical considerations shouldn't consider intent.
        • by drnb ( 2434720 )

          Weird to argue that ethical considerations shouldn't consider intent.

          It's more that ethical considerations are not necessarily relevant to legality.

          And then there is the irony.

          • by Himmy32 ( 650060 )

            As far as reverse engineering and clean room legality goes, we even got to see that play out with Google and Oracle duking it out. LLMs just reduce the barrier, add a layer of insulation, but also an extra question of how much of the "training data" is transformed.

            But if you want truly ironic in this category, that's definitely the post-leak Claude Code clones [github.com]. Anthropic has got to let it live otherwise they'll make an argument against using their tool.

            • by drnb ( 2434720 )

              but also an extra question of how much of the "training data" is transformed.

              I think that is the core question, perhaps the only meaningful question from the legal perspective.

      • It's different in the same way that mass surveillance by law enforcement is somehow legal. The law simply hasn't caught up with reality. In mass surveillance, the premise is that if the government could have put a cop there, they can put a camera there. Yet this is totally different in scale, expense, and ease of use -- making it not the same thing at all. These three factors put a natural limit on the scale of surveillance, limiting the reach of the government to high profile crimes. Cheap and pervasive ca

        • by HiThere ( 15173 )

          Why would I want a workalike of Windows? (I haven't used it for over two decades now, so I'm not sure. Linux is superior to the MSWindows that I remember...and it doesn't force updates at their convenience rather than mine.)

        • In a FOSS project, the code is out in the wild (unlike most commercial software) and it would be incumbent on the LLM to prove it wasn't trained on that software, or derivative software, to be clean.

          I don't think the former implies the latter. Especially since one of the "benefits" of FOSS is that aspiring and new coders can study it to see how the more experienced do things. In other words there is an educational component. Would you apply the same logic to textbooks, to academic research, that includes source code?

          Another way to evaluate this would be to ask if the LLM can do the same thing with closed source software. If it can, I would call that a legal work-alike, for any code base.

          That would seem to stronger evidence. Assuming all other things being equal. Both FOSS and commercial being well known, well used. With ample materials teaching users how to use it. With se

    • Well why stop at open source. Let do this for.windows, macOS, Photoshop, freedom forever.
      • by HiThere ( 15173 )

        It will happen. The question is "Will it continue to be legal?".

      • BusinessBros would continue to throw money at Microsoft for genuine Windows. It is like a cargo cult or something to them, other companies used microsoft and exploited their workers and polluted everything and became rich so they must use microsoft and exploit their workers and pollute everything in the hope that they become rich. Insane, but if they had thinking skills or ethics they wouldn't be BusinessBros.

      • Friend of mine is an IP lawyer and he's working on an office clone to see how far he can get. It's actually important for digital sovereignty. Thank Trump for the urgency on that topic.

  • by Pseudonymous Powers ( 4097097 ) on Wednesday April 22, 2026 @01:16PM (#66106986)

    "Malus [...] is modeled after the IBM case and uses one AI agent to write the specifications and a different agent to produce the code, creating that 'clean room' effect. [...] Blanchard also conceded that Claude, which like all LLMs, was trained on vast amounts of data scraped indiscriminately from the internet and was exposed to the original chardet in its training, but maintains his version is not derivative."

    So, it's not a clean room at all: they're just calling it that.

    • Seeing as how most if not ALL if the AIs had used open source software for their training, I would find it hard to believe it was a clean room approach. Perhaps if they can prove the original open source software wasn't used in it's training, then I could be convinced it was done in a clean room.
    • by JBMcB ( 73720 )
      Compare the code, if it's similar then Claude is relying on stuff it's been trained on. If not, it's generating novel code that does the same thing.
      • Re:Code (Score:5, Insightful)

        by TheNameOfNick ( 7286618 ) on Wednesday April 22, 2026 @01:59PM (#66107090)

        Nope, those are two compilers. One transforms code into an intermediate language in which the program is expressed as a specification that contains all the functionality of the original program, i.e. is a derived work. Then another compiler takes the program in the intermediate language and creates code from it (source or binary doesn't matter). Contrary to what AI evangelists want you to believe, it does matter whether something is an automatic process or involves creative thought. Also, what's with the focus on Open Source software? You could do the exact same thing with binary code.

        • Closed source binaries have companies who own the copyright and would sue the pants off of anyone who used this tool to try and "clean room" engineer a replacement. With open source there's not always a monolithic entity that can exercise copyright claims against an infringing party and perhaps generally less of a desire to do so even if the money and desire to pursue legal action were there.

          The hope is that by targeting open source there's people infringing on the copyright of the authors will be able t
          • by dfghjk ( 711126 )

            "Closed source binaries have companies who own the copyright and would sue the pants off of anyone who used this tool to try and "clean room" engineer a replacement. "

            Given that the entire claim is that this technique does not infringe copyright, you are saying nothing. You need a legal basis to sue.

            "With open source there's not always a monolithic entity that can exercise copyright claims against an infringing party and perhaps generally less of a desire to do so even if the money and desire to pursue leg

            • Given that the entire claim is that this technique does not infringe copyright, you are saying nothing. You need a legal basis to sue.

              The legal basis to sue is that "this new software appears to infringe on the copyright of this existing software".

              The defense is that "it was created using a clean room technique".

              The complaint then alleges that "the established methodology of a clean room reproduction was not followed. The copy was not created with clean hands, as the AI was trained with the original source code."

              Since this would be a civil lawsuit, the standard that must be proven to the jury is a "preponderance of evidence" -that this i

              • I doubt that just that argument some would win, because there have been several cases where it didn't. Like in the case where it was obvious the AI was trained on images from a given painter, the plaintiff still could not show reproduction of the works.

                Good luck with that lawsuit.

          • by unrtst ( 777550 )

            The hope is that by targeting open source there's people infringing on the copyright of the authors will be able to get away with it more easily. ...

            ... or do they hope others will use Malus on leaked proprietary code, and that will kick off the necessary legal proceedings to bring a close to this somehow. Like, now that this is bound to happen, let's get it settled. Can you legally use the results of LLM generated code or not, and WTF are we going to do about it now?

      • Compare the code, if it's similar then Claude is relying on stuff it's been trained on. If not, it's generating novel code that does the same thing.

        Insufficient. There have been cases where an infringing company had to rewrite their code and have it examined and signed off on by the original copyright holders attorneys. The lawyers basically did what you describe, "this looks similar."

        When the developer could convince the judge that the code in question was basically a straight forward implementation, basically what would be "expected", the attorney was overruled and the code allowed. Similarity is insufficient if the code is simple and straightforw

      • by Junta ( 36770 )

        Well, no, that assumes exactly one implementation for a given feature in the wild.

        Imagine generating a random string. Hundreds of codebases will have that same function. So this process may pull that from any of those codebases and not necessarily from the source codebase.

        It's never generating fundamentally novel code, but it is drawing from a huge training data that includes the same thing done dozens or hundreds of times with technically distinct code.

        • But wouldn't it be fun if these lawsuits led to the situation that the first published implementation gets copyright? I bet that would paralyse the entire IT industry for years, allowing Europe to catch up (unless they're just as stupid, in which case China handily wins the race).

    • Humans have used copyrighted software (which would include FOSS) for their training. I think there is a legal concept that if copyrighted material is used to attain general learning, that general knowledge can be applied to new original works. These new works are not derivative if based solely on the general knowledge extracted from the original copyrighted material.

      Some similar concept could be applied to ML models. That the models contain extracted general knowledge.
      • A human who has seen the source code cannot be part of the team producing a clean room implementation. They will taint the project with their general knowledge of the system they try to rebuild.

        • by drnb ( 2434720 )

          A human who has seen the source code cannot be part of the team producing a clean room implementation. They will taint the project with their general knowledge of the system they try to rebuild.

          Apologies if I was not clear. I am NOT talking about the clean room based clones. I am talking about every day software development. That too is sometimes based on having studied copyrighted code. Mere similarity is insufficient to claim infringement. It has to go beyond well known and discussed general knowledge. I expect that concept will be successfully applied to AI too.

      • by dfghjk ( 711126 )

        No one can claim ownership of your knowledge and experience. You cannot, though, duplicate other people's work.

        If AI duplicates code it was trained on, it may be a violation of copyright of the work used in training, not a violation of copyright of a product it is trying to duplicate.

        • by drnb ( 2434720 )

          No one can claim ownership of your knowledge and experience. You cannot, though, duplicate other people's work.

          If AI duplicates code it was trained on, it may be a violation of copyright of the work used in training, not a violation of copyright of a product it is trying to duplicate.

          The point I am trying to get at is that the concept of "duplicating code" is fuzzy. General knowledge, straightforward and obvious implementations, etc can make work non-infringing even if the code looks similar. An AI that is using some sort of logic to build a solution is quite different than an AI that is searching the internet from similar examples. Similar code being more legally acceptable for the former. Just like a human who studied a textbook or academic literature use open source code as examples

    • by Junta ( 36770 )

      But it's meant as a proof point of our current interpretation of LLM and copyright. So far this "counts" as clean room because courts have not said LLM ingest is a violation, and they are using the LLM to launder the code to an intermediate form and then to code based on the 'clean room' finding.

      So while you are right in a sense, the point is from a court perspective this is "equivalent" to clean room unless new laws/court cases amend the status quo.

      • The article is frustratingly hard to parse for me, but it does say that Malus is meant as satirical. In that light, I can see why they would make a claim meant to be outrageous ("Claude, despite having read the original source code, can nonetheless reproduce it without violating the clean-room principle,") in the hopes that it will spark a re-evaluation of that claim when it is made sincerely, or at least seriously.

        The problem with that interpretation is that they're making real money ripping off real soft

    • by dfghjk ( 711126 )

      Sure, if you don't understand what "clean room" means.

    • The target system has no copyright claim; every other system does.

    • Indeed. It's the same trash Malus article that was already commented on multiple times before here on slashdot. Repeating a lie about AI doing clean room reverse engineering doesn't make it true, but it does get tiring to read it again and doing so increases prejudice against it.

      This Mike Nolan guy is just full of shit, he's no researcher doing anything worthwhile,.just an attention whore at this point.

  • by rbrander ( 73222 ) on Wednesday April 22, 2026 @01:20PM (#66106998) Homepage

    There can't be any bit of software in the world more documented as to the requirements of every singe function, every menu item, every bit of behaviour, than Excel.

    And it's the only thing tying so many people to Microsoft. Windows and Word sure are the hell not.

    • Excel is off-topic (Score:2, Insightful)

      by TurboStar ( 712836 )

      Excel isn't open source. I know it's tradition to not read the article, and some people don't even read the summary, but now we're not even reading the headline?

      • The GP is suggesting that closed source software might also be up for grabs and explains it in terms that the specification and behavior is already written down, even if not as code.

        I'd suggest though that it's up for grabs anyway. The difference between open source and closed source is that you have access to the original human-readable source code for the former. But looking at the wider picture, the code is available for both if you don't need a human-readable version, as binary code is also computer cod

      • by Junta ( 36770 )

        IBM BIOS wasn't source available either. The precedent for 'clean room' involved reverse engineering binary code.

        So while the current story emphasizes the loss of open source protections, the same principles would apply to LLM transforming binary and test cases to a specification.

        • by Jerrry ( 43027 )

          IBM BIOS wasn't source available either. The precedent for 'clean room' involved reverse engineering binary code.

          Not true. IBM published the source code to their BIOS in their technical reference manuals, which were available to the public.

  • https://tos.md/ [tos.md] is my answer to this problem: It's just my personal AI harness, everyone's got one. It's more of a harness for humans really, but anyway, I've been watching people ingesting and replicating repos in bulk. So - a no-license license, no downloadable software, a maze of nonsense for bots to navigate, no Github repo, no published spec - just instructions that only a human can actually complete, because it involves literally talking to me before I'll give you a copy. I dare your AI bot to in
  • It is not so bad if a user used it only for personal use without distributing their modified software to anyone, I don't use AI but I do some unusual and wild tweaks to Slackware and I don't tell anyone what I do because it's just for me on my laptop alone so depending on how this AI tool is used it's not as bad as it seems unless some company is looking to steal other people's ideas for a profit
  • by thedarb ( 181754 ) on Wednesday April 22, 2026 @02:28PM (#66107154)

    Ok, fine. Do ZFS and make a GPL version that can be included in the kernel and all the distributions. Two can play this game.

    • by ceoyoyo ( 59147 ) on Wednesday April 22, 2026 @02:58PM (#66107242)

      I think the authors of this particular project, open source people and Slashdot have got this entirely backwards.

      The point of open source isn't that the code is super unique and awesome, it's that you can see it and modify it. The whole idea was born out of Stallman's frustration that Xerox wouldn't give him their code so he could add a feature to a buggy printer.

      Sure, someone can take an open source project and clone it. But anybody can also take a closed source project and clone it using the same technology. The AI doesn't need source code, it can work from the compiled version no problem. It's also infinitely patient so it could also write a clone just by interacting with the running system without access to any code at all.

      ZFS is a bad example. There are already open ZFS clones. The difficulty with ZFS is not that it is closed, but that it is patented.

  • Will the same work for making open source clean room versions of closed source applications? AI is pretty good at disassembly/decompilation.

  • by bill_mcgonigle ( 4333 ) * on Wednesday April 22, 2026 @02:35PM (#66107180) Homepage Journal

    The Chinese Wall legal strategy is to have Team A produce a specification and Team B produce an implementation.

    If these guys can't show a specification they're screwed.

    Claiming there must have been one in abstract Platonic space inside the LLM network black box isn't going to convince a Court.

    So do the work of making an actual specification generator. Then write a coder. It's not impossible. You still won't get updates, fixes, support, community, or features added. The guys who just steal ffmpeg won't even bother. The AGPL haters might bite.

    Also, he seems quite angry.

  • Prove It (Score:4, Interesting)

    by StormReaver ( 59959 ) on Wednesday April 22, 2026 @02:46PM (#66107210)

    This will be believable if they can do the same thing for Closed Source software. If they can't, then they are lying and infringing on copyright. If they can, then they will be the biggest software company in the history of software.

    • by gweihir ( 88907 )

      If they can, then they will be the biggest software company in the history of software.

      Not at all. It starts with the "product" being static. You have no dev-team at all and you cannot even do small changes. Security, performance and reliability will suck. The product has no copyright. And proving that this was "clean room" is almost impossible as that would require a thorough and careful examination of all training data that even remotely looks like code.

      This is really just useable as satire to point out a problem.

  • by Dagmar d'Surreal ( 5939 ) on Wednesday April 22, 2026 @02:51PM (#66107228) Journal

    Good luck getting a judge to agree they had a "clean room" implementation performed by an AI that was trained on the very code it's supposed to be "re-inventing".

    ...and any minute now the same ruling about AI-generated art is likely to come down pertaining to programming, because copyright was meant to provide actual human artists with encouragement and protection for their craft by giving them the exclusive right to exploit their work throughout their lifetime and generally the lifetime of their children. Bots don't get afforded that same protection because they can't starve to death and they can never actually die, and programming is still both science and art (which is the only reason code is copyrightable).

    • by gweihir ( 88907 )

      Since this is satire, that is not a problem.

      But for an actual clean room reimplementation, you need that the implementation team never has looked at the original code. Since a lot of FOSS went into the training data for this "demo", that is very likely not the case and hence this is not "clean room" at all. Oh, and also note that the product generated this way has no copyright at all ...

      • "that is very likely not the case "

        That is the plaintiff's case to prove. The legal standard is "preponderance of evidence", IIRC. Good luck with that,

    • If you're right, then all open source is safe, but all closed source is up for grabs. That would actually be quite amusing.
  • by SoftwareArtist ( 1472499 ) on Wednesday April 22, 2026 @02:51PM (#66107230)

    I already have a program to do this. It came with my computer. It's called zip. Run it on the source tree of any program and it creates a new file with a specification for the program. Run unzip on the specification file and you get a new source tree, free from any copyrights. Problem solved!

    You say a compressed version of the source code doesn't count as a specification? What do you think this person's program does? I'm not going to pay him money to find out, but I'd bet the "specification" it produces is nothing like what was used in the IBM case.

    Now he needs to read up on inducement [numberanalytics.com], because he's opening himself up to enormous liability. By explicitly advertizing it as a tool to get around copyrights, he's pretty much waived the most common defenses in these cases, claiming you didn't intend it to be used for that purpose. By charging it for it, he's probably ruled out any kind of fair use defense. If anyone actually uses his tool to do what he says, he's personally liable.

  • And that is the people developing and maintaining it. Definitely a commendable satiric effort though that shows we cannot continue the lawlessness we currently have. It will destroy too many things.

    Obviously, on the side of security, performance and stability, these clones will also not be worth much, so the "threat" is probably really small.

  • I've seen workflows where people use a image description model to create a detailed description and an image model to produce an image. Sometimes it reproduced quite similar versions of their photos, which are newer than the models. For audio I've seen at least songs that have similar style. The headline may be a bit provocative, but maybe AI generation is now really challenging how good the concept of "intellectual property" really is.

  • by Tony Isaac ( 1301187 ) on Wednesday April 22, 2026 @03:36PM (#66107324) Homepage

    Anybody can fork an open source repository. But not just anybody can keep it going. LibreOffice survives not because it got its code from Open Office, but because of the community that keeps it alive.

    I think we programmers often obsess too much about who can see or get copies of our code, as if that were the magic sauce. It's not. It's the people behind the code, that is the magic sauce.

  • The same approach can (and is) being done to closed-source software. This can result in more open-source software. It's just a matter of choosing what you want to tackle.

  • Find something popular that's free and open-source
    Clone it in order to change the license
    Sell it as closed source
    Profit
    I do wonder who their customer base is expected to be? People who want to pay a scam artist instead of getting it for free from its developers?

    • by PPH ( 736903 )

      Fine if you (a meat-sack) do it with a clean room process. But the product of the LLM has no legitimate copyright. And so the "Sell it as closed source" is in error. Sure, you can sell it. But I can copy it and there's nothing that ca be done.

  • The chances that the AI was trained on any of the open source packages is non-zero, if they publish the packages, I'd suggest that they are opening themselves up to litigation.

  • Not sure I see the point of ripping off an open-source project. Said project remains, and will be significantly cheaper, probably free, compared to any commercial rip-off thereof. While you might be able to copyright the rip-off, you can't patent it, and the open-source version remains under whatever licence it has, so the rest of us can just raise a finger to the rip-off merchant.

  • the best way to protect our open source projects is to make them closed source! Finally, no one can misuse our source code, fork it or clean-room it. because no one can see it at all. problem solved!

  • How is this any different from any unimaginative but very elaborate software spec? VLC is arguably the best media player, but it does have some really bizarre quirks. I wouldn't ripp off VLC if I wanted to rebuild it, I would simply ask the AI to build a media player with the features I wanted. Problem solved.

    My current software project is FOSS but it's built with AI. It's the same thing, just the other way around. I really don't get the hype.

    This thing is just a very fringe use-case of AI-built software, t

"It's what you learn after you know it all that counts." -- John Wooden

Working...