Artists May 'Poison' AI Models Before Copyright Office Can Issue Guidance

Artists May 'Poison' AI Models Before Copyright Office Can Issue Guidance (arstechnica.com) 66

Posted by BeauHD on Friday November 03, 2023 @06:40PM from the few-options-left dept.

An anonymous reader writes: Artists have spent the past year fighting companies that have been training AI image generators—including popular tools like the impressively photorealistic Midjourney or the ultra-sophisticated DALL-E 3—on their original works without consent or compensation. Now, the United States has promised to finally get serious about addressing their copyright concerns raised by AI, President Joe Biden said in his much-anticipated executive order on AI, which was signed this week. The US Copyright Office had already been seeking public input on AI concerns over the past few months through a comment period ending on November 15. Biden's executive order has clarified that following this comment period, the Copyright Office will publish the results of its study. And then, within 180 days of that publication—or within 270 days of Biden's order, "whichever comes later"—the Copyright Office's director will consult with Biden to "issue recommendations to the President on potential executive actions relating to copyright and AI."

"The recommendations shall address any copyright and related issues discussed in the United States Copyright Office's study, including the scope of protection for works produced using AI and the treatment of copyrighted works in AI training," Biden's order said. That means that potentially within the next six to nine months (or longer), artists may have answers to some of their biggest legal questions, including a clearer understanding of how to protect their works from being used to train AI models. Currently, artists do not have many options to stop AI image makers—which generate images based on user text prompts—from referencing their works. Even companies like OpenAI, which recently started allowing artists to opt out of having works included in AI training data, only allow artists to opt out of future training data. [...] According to The Atlantic, this opt-out process—which requires artists to submit requests for each artwork and could be too cumbersome for many artists to complete—leaves artists stuck with only the option of protecting new works that "they create from here on out." It seems like it's too late to protect any work "already claimed by the machines" in 2023, The Atlantic warned. And this issue clearly affects a lot of people. A spokesperson told The Atlantic that Stability AI alone has fielded "over 160 million opt-out requests in upcoming training." Until federal regulators figure out what rights artists ought to retain as AI technologies rapidly advance, at least one artist—cartoonist and illustrator Sarah Andersen—is advancing a direct copyright infringement claim against Stability AI, maker of Stable Diffusion, another remarkable AI image synthesis tool.

Andersen, whose proposed class action could impact all artists, has about a month to amend her complaint to "plausibly plead that defendants' AI products allow users to create new works by expressly referencing Andersen's works by name," if she wants "the inferences" in her complaint "about how and how much of Andersen's protected content remains in Stable Diffusion or is used by the AI end-products" to "be stronger," a judge recommended. In other words, under current copyright laws, Andersen will likely struggle to win her legal battle if she fails to show the court which specific copyrighted images were used to train AI models and demonstrate that those models used those specific images to spit out art that looks exactly like hers. Citing specific examples will matter, one legal expert told TechCrunch, because arguing that AI tools mimic styles likely won't work—since "style has proven nearly impossible to shield with copyright." Andersen's lawyers told Ars that her case is "complex," but they remain confident that she can win, possibly because, as other experts told The Atlantic, she might be able to show that "generative-AI programs can retain a startling amount of information about an image in their training data—sometimes enough to reproduce it almost perfectly." But she could fail if the court decides that using data to train AI models is fair use of artists' works, a legal question that remains unclear.

Artists May 'Poison' AI Models Before Copyright Office Can Issue Guidance

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 66 Comments Log In/Create an Account

Comments Filter:

dumb argument (Score:1)

by martin-boundary ( 547041 ) writes:

It seems like a dumb argument to claim that a tool can allow new works to be produced when the issue is can that tool allow old works to be produced within epsilon accuracy?
- Re: (Score:2)
  
  by timeOday ( 582209 ) writes:
  
  Can it though? I think not, perhaps excluding certain ubiquitous images like the Mona Lisa.
  Anyways copyright per se is certainly not the issue here, even if a model can be coaxed into reproducing a recognizable image, it's about the least effective and most difficult way to do so - as opposed to, say, "save as..."
  - Re: (Score:2)
    
    by narcc ( 412956 ) writes:
    
    I think not, perhaps excluding certain ubiquitous images like the Mona Lisa.
    
    That's generally the case as there's not room in these models for more than a few bytes of information from each training image. Still, it is possible to get "memorized" images with less redundancy, though it's vanishingly rare. Link [pfd] [arxiv.org]
    - Re: (Score:1)
      
      by martin-boundary ( 547041 ) writes:
      
      Claiming the information content to be a few bytes per image does not lead to a believable argument. The analogy is claiming that anyone who owns 1 million pesos must be rich. In both cases, the error is thinking of the absolute number as meaningful by itself. It isn't.
      The information content is a function of the simplicity of the dataset and the appropriateness of the representation. How many bits of information does an image model need to represent the Japanese flag? If the features are disks then a few
      - Re: (Score:2)
        
        by narcc ( 412956 ) writes:
        
        Claiming the information content to be a few bytes per image does not lead to a believable argument.
        It's basic math. size of model / number of images. This isn't complicated.
        How many bits of information does an image model need to represent the Japanese flag?
        You seem to have some very confused ideas about what these kinds of models encode.
        although I don't think it proves what you claim.
        It does. I expect, however, that you'll need at least some background to understand what you're reading.
        
        Re: (Score:2)
        
        by martin-boundary ( 547041 ) writes:
        
        Your basic math is flawed, as I said in my original comment the information content of the size of model / number of images is meaningless. Images are correlated in various ways, and you're not taking this into account. You're also not taking into account the representational capacity of the model. And that's just for starters.
        I can see that you don't get my point about features and Japanese flags though. Fair enough, maybe I'll find a better way to explain it sometime, sorry.
        As to the paper, there are
        
        Re: (Score:2)
        
        by narcc ( 412956 ) writes:
        
        So you're just going to double-down on your ignorance, eh? In that case, there's only one way to reply: You're an idiot. Stop wasting my time. Go do some reading. You're not qualified to have this discussion.
      - Re: (Score:3)
        
        by Gibgezr ( 2025238 ) writes:
        
        That's not how the model is designed. And the paper looked at 350k images in Stable Diffusion's training set that were over-fitted, but still only found 94 images that could be recreated closely enough to match up. Ninety-four grossly overfitted images isn't much considering the data set size."Vanishly rare" is a good description of that, and pretty much all that needs to be done to fix even that tiny number is use the same tech they did to determine what images in the training set are duplicates, and remov
        
        Re: (Score:2)
        
        by iAmWaySmarterThanYou ( 10095012 ) writes:
        
        Sounds like 94 lawsuits. That isn't a lot?
        
        Re: (Score:2)
        
        by Gibgezr ( 2025238 ) writes:
        
        Someone would have to generate one of them and then use it in an infringing manner, and then *they* would be breaking the law (not the AI). Remember, these images all came off the public internet in the first place: just HAVING one isn't infringing any laws, it's all in what you do with it. It's the equivalent of me repainting a copyrighted image: I'm actually allowed to do that, as long as it's not then used in some manner that would infringe copyright. Hang it on my wall in my house in a nice frame? Not b
        
        Re: dumb argument (Score:2)
        
        by nasch ( 598556 ) writes:
        
        Just because it's on the Internet doesn't mean you're allowed to do whatever your want with it.
        
        Re: (Score:2)
        
        by Gibgezr ( 2025238 ) writes:
        
        Correct, but it does mean you have posted it for public consumption. It's like putting up posters on telephone poles around town: you can't then complain that they were looked at, or complain that "someone has one and I didn't authorize it" (as long as they then don't do anything infringing on copyright).
        Imagine you are a famous photographer, and you have a PUBLIC (no password) website where you post your photos and offer to license them for commercial use. You don't get to sue the Wayback machine for archi
    - Re: (Score:2)
      
      by LostMyBeaver ( 1226054 ) writes:
      
      Neural networks don't work like that. When assigning weights, all pixels of all images fed to the network will shape the neurons so to say. As such, so long as the images are labelled appropriately and training and transformation appropriately assign weights, and there is enough manual training, then in theory, there is considerably more data than you suggest stored. Bytes isn't really a good quantifier for this though. It's more of an issue of how the overall weights are effected.
      
      As for poisoning, none of
      - Re: (Score:2)
        
        by CaptQuark ( 2706165 ) writes:
        
        Newbie question: Let's use the clover shape mentioned previously. If the NN has only two examples of clovers, one light green and one dark green, then is the model storing that both colors are valid or the average of the two? If we add in another 10,000 images of clovers, some of which are Kelly green, then how does the model change? Not every image is retained, are they? The NN just uses the additional 10,000 images to refine its internal model of a "clover shape" and expecting to output an exact matc
        
        Re: (Score:2)
        
        by Rei ( 128717 ) writes:
        
        That's exactly it: it's learning the concept of a clover, not any specific work of a clover. And when generating an image, if you invoke "clover" and the other metadata from one specific image of the 10002 images in the training data (say, "By Artist X", "Ireland", "Field", "Rainbow", or whatnot) - it's not going to generate *that* clover, but rather it's going to use its knowledge of all clovers, along with its knowledge associated with all images associated with the other tags in the metadata - everythin
        
        Re:dumb argument (Score:4, Interesting)
        
        by narcc ( 412956 ) writes: on Saturday November 04, 2023 @08:31AM (#63979246) Journal
        
        The model isn't storing facts like that. That's just not what they do. Remember that feed-forward NNs, the kind used by these whiz-bang image generators, are just a simple function. All they can do is map inputs to outputs. In terms of computational power, NNs are equivalent to the humble lookup table.
        To get a sense of how NNs work, lets look at a very simple one with just a single input and hidden node. We'll have an input (x), the connection between the two will have a weight (w), and our hidden node will have a bias (b). Imagine making a graph of the output of our hidden node over a range of possible inputs. No matter what values you pick for w and b, you'll get a curve, sigma(wx+b), centered over some value, more or less flat or steep, determined by w and b.
        Now, you might have heard that NNs are 'universal functions' or that they can 'approximators any function to an arbitrary degree of precision'. This is true, though to get more interesting output, we'll need to add more nodes. Right now, all we can do is push our curve around and make it more or less steep. To get it to change direction, we'll need to add another node. To our example, add a second hidden node and an output node that combines the weighted output of our hidden nodes together. What does our graph look like now? We can move it around a bit more, but what's really interesting is that we can change direction one time. We can make a bump or a divot.
        Add another hidden node and we can get our curve to change direction a second time. Add a fourth and we can change direction again, letting us make curves with two bumps. Once you spot the pattern, you can do something cool. Pick any function with a single input and single output that you'd like and, graphed over some range, count the number of times that our curve changes direction. That's the number of hidden nodes you'd need to make a NN that approximates that function. If you haven't already, write a quick little program to prove this to yourself, adjusting the bias and weights to make all sorts of curves.
        Whatever our selected function represents, that's something that we are imposing on it. In your clover model, there is no essential cloverness or anything like that. What would that even mean? It's just a big function that maps inputs to outputs that we hope correlate in some way with whatever it is that we'll need to produce recognizable clovers. As for what is encoded, you'll want to take a look at how images are generated and what it means to train the model on an image. Given an input, if we can compute how much our output differs from ideal, we can nudge our model closer to what we want. For our image generator, we're trying to predict noise so that we can generate an image by removing 'noise' from static until an image emerges. Training on an image, then, means adding noise and trying to predict what was added so that it can be removed.
        How the model changes as you train it isn't something that can meaningfully be put into relatable terms. While we're obviously retaining some information about our training images, it would be ridiculous to say that we're keeping copies, particularly given how little room there generally is in these models for unique information about each image.
        
      - Re: (Score:2)
        
        by narcc ( 412956 ) writes:
        
        Neural networks don't work like that.
        
        Yes they do. Do you think they're magic or something?
        then in theory, there is considerably more data than you suggest stored.
        
        That is very obviously impossible. Are you just not familiar with information theory at all?
        Bytes isn't really a good quantifier for this though.
        Would you prefer 'libraries of congress'?
      - Re: (Score:2)
        
        by Rei ( 128717 ) writes:
        
        The way so much data is "encoded" is by learning concepts, not images. You cannot learn images at a couple bytes per image, but you can learn the underlying unifying concepts behind them and tie those to labels. Which is what diffusion models do. It's the inverse of image recognition, which you can visually see how data is encoded into the models here [distill.pub].
        The issue is that when encode conceptual data rather than direct graphical data, you're no longer creating specific works. If Artist X painted "Warrior rid
- Re: dumb argument (Score:2)
  
  by Z00L00K ( 682162 ) writes:
  
  Ir's not really a full reproduction that's at hand. It's a reproduction of the style a certain artist has.
  Some artists have been very broad, so cloning that style would be hard, like Leonardo da Vinci. Others, like H.P. Lovecraft, have been very specific in style and can be useful in image generation of that style.
  When generating new images the best result comes from mixing styles.
  Based on the ability to cook some new work that looks like the works of an existing artist I do see that it's maybe good that AI
No educational uses either? (Score:2)

by myowntrueself ( 607117 ) writes:

Presumably, these artists also wouldn't want people looking at their works to learn how to draw or paint, etc?
- Re: (Score:2)
  
  by taustin ( 171655 ) writes:
  
  And presumably, they have written permission from every artist who influenced their own work.
  - Re: (Score:2)
    
    by Rei ( 128717 ) writes:
    
    The funny thing is, the artists I know commonly look at a number of "inspiration images" (usually copyrighted images) directly while they're painting, and their outputs look a lot more like their inspiration images than anything AI models generate relative to their training data.
I am at the opinion that if you publish digitally. (Score:2)

by Fly Swatter ( 30498 ) writes:

Then you are no longer in control of your 'property' and have no recourse to how it may be used, or even if it is used. Controlled ownership of anything digital is completely against the design of making something digital. You can make like for like copies indefinitely; stop trying to treat it like real physical property.

When we can as easily duplicate homes, cars, and that phone in your pocket (or maybe glued to your face) then ownership of unique property will also be a thing of the past. We will have
- Re: (Score:2)
  
  by Shakrai ( 717556 ) writes:
  
  Intellectual property too is an idea that really only works, or perhaps even just has any merit, in a capitalist society.
  Did you watch The Orville? They have a better explanation for the moneyless utopia than Star Trek ever came up with. To paraphrase, "Human ambition didn't go away, the predominant currency just became reputation."
  If replicators became a thing tomorrow and money went the way of the dodo creatives would still want (and deserve IMHO) attribution for their work. How does that work without something resembling copyright?
  - Re: (Score:2)
    
    by UnknownSoldier ( 67820 ) writes:
    
    > How does that work without something resembling copyright?
    You mean like the Fashion Industry [ted.com] ?
    Creativity and Reputation become the currency.
    i.e. The evolution of money is always the same:
    * Barter
    * Token for things
    * Token for time, knowledge, and skill
    * Energy
    * Creativity, Honor, Reputation
- Re:I am at the opinion that if you publish digital (Score:5, Insightful)
  
  by LainTouko ( 926420 ) writes: on Friday November 03, 2023 @07:42PM (#63978350)
  
  It only makes sense in a capitalist society because copyright and patents are the transformation of creativity and invention into capital. They are not solutions to the problem "how can we get people to create and invent stuff", they are solutions to the problem "how can the rich own culture and science itself, rather than just the machinery it drives?" "How can creativity and invention function as capital accumulation?" A book is personal property, it is something you possess for its use-value. A printing press, and copyright on the book are both private property, means of production, they're both things you need in order to make books. (The difference being that a printing press is a natural requirement, whilst copyright is an artificially imposed one.) Capitalism is, at heart, the system by which means of production are owned as capital by capitalists, who employ workers for a wage and take the products made with them. Without that, there's nothing for these types of "intellectual property" to slot into.
  The difference between the past and now being that production which used to require specialised equipment and a non-trivial amount of labour can now be done by millions of individuals almost effortlessly. Copyright is now in many cases the only type of capital which meaningfully exists. It's the only thing keeping the production of copies of computer files capitalist (the capital-owner tells you what you can and cannot make and gets the product) rather than post-capitalist (you can make what you want and keep it, because you control the means of production you use.)
  Which is more or less equivalent to observing "we've made this incredible copying machine, which is getting more and more powerful, but we can't unleash its potential because the economic system has not kept up so it isn't allowed."
  
  - Re: (Score:2)
    
    by iAmWaySmarterThanYou ( 10095012 ) writes:
    
    I love seeing modern day Marxists posting about how the rich own the printing presses so they own the only means of producing a book.
- Re: (Score:2)
  
  by znrt ( 2424692 ) writes:
  
  Then you are no longer in control of your 'property' and have no recourse to how it may be used, or even if it is used.
  will nobody think of the nft? bunch of insensitive clods ...
  now seriously, this isn't the case. the whole copyright system rests on the idea of controlling how "intellectual property" is used, and "digital" is only different in that the medium is much more accessible (less physical barriers), and the fundamental irrationality becomes much more apparent. it is though the same asinine idea:
  stop trying to treat it like real physical property.
  exactly. well, this fundamental aberration is the cornerstone of the whole intellectual property rigging.
  then again this
  - Re: (Score:2)
    
    by iAmWaySmarterThanYou ( 10095012 ) writes:
    
    Have you ever invented or created anything?
    Have you ever enjoyed the inventions or creations of others?
- Re:I am at the opinion that if you publish digital (Score:4, Insightful)
  
  by tlhIngan ( 30335 ) writes: <slashdot@worf . n et> on Saturday November 04, 2023 @02:52AM (#63978950)
  
  Then you are no longer in control of your 'property' and have no recourse to how it may be used, or even if it is used. Controlled ownership of anything digital is completely against the design of making something digital. You can make like for like copies indefinitely; stop trying to treat it like real physical property.
  When we can as easily duplicate homes, cars, and that phone in your pocket (or maybe glued to your face) then ownership of unique property will also be a thing of the past. We will have to completely rethink what it is to have ownership of something, I don't think the way we are doing it with the internet is going so well.
  Intellectual property too is an idea that really only works, or perhaps even just has any merit, in a capitalist society.
  Flag as Inappropriate
  OK, that means I can go and use Linux in my product as I wish, violating the GPL as well. Because it was published digitally and thus no one has control over it.
  I mean, why bother with things like the AGPL when anyone can copy and use the code in their products? Why even bother? Why not just admit F/OSS is a failed experiment that can not work because people will steal code?
  What is keeping Microsoft from replacing the Windows kernel with the LInux kernel? They don't need to release the source code to their version, Linus and everyone gave up control of it.
  If you say "the GPL keeps you from doing it" no, it doesn't. If there is no copyright, the GPL is worthless. It needs copyright, as does every open and free software license out there.
  You see, as a user, you don't have to ever agree to the GPL. You don't. No matter what anyone says. What happens? The software falls under standard copyright coverage, aka "all rights reserved". That limits my ability to copy and distribute the software - but hey, as a user, you don't intend to do that.
  But if you develop something using that software, you must agree to the GPL, otherwise you can't distribute it. Modifying that software creates a derivative work, which is not allowed under copyright law without permission. So you could go to the copyright holder and get permission. Or, the software has a license attached like the GPL, which gives you MORE rights, in exchange for obeying the license - as in, create your derivative work software as long as you also agree to these terms and conditions.
  Other copyleft schemes also rely on copyright - Creative Commons often specifies what you can do with the work and goes above and beyond what copyright law would give you if you didn't agree to it.
  But if you take away copyright protection, all that flies out the window - why would I use the added rights of the GPL if I'm no longer bound by copyright law preventing me from making derivative works?
  
- Re: (Score:2)
  
  by pauljlucas ( 529435 ) writes:
  
  Then you are no longer in control of your 'property' and have no recourse to how it may be used, or even if it is used. Controlled ownership of anything digital is completely against the design of making something digital.
  Does that include open-source licenses on downloaded source code? The authors should have no say? Some company could come along and incorporate their code into their own product with not even credit? Or must they go back to distributing software on DVDs to have rights?
Poison The Poison. (Score:2)

by zenlessyank ( 748553 ) writes:

Do those 2 cancel out with net zero poison? How about everything is just a black box with a warning sign so we can't see anything?
- Re: (Score:2)
  
  by Rei ( 128717 ) writes:
  
  The thing is that these poisoning attacks are garbage. They're brittle, easily detected, easily removed, don't affect preexitsing datasets, and you can even make generalizable countermeasures to them (TL/DR: you train a model on a preexisting dataset, then test the impact of a new image on preexisting concepts. If the new image starts making radical swings away from preexisting concepts and towards others, it's poisoned).
  All using them does is hurt your quality. All those artists using Glaze did nothing t
identifying the specific images (Score:2)

by ZipNada ( 10152669 ) writes:

"if she fails to show the court which specific copyrighted images were used to train AI models and demonstrate that those models used those specific images to spit out art that looks exactly like hers"
This seems like a difficult thing to establish. How will she be able to know which images were ingested as training data and then be able to get the AI to emit identical art? I guess it could be possible to get a list of the input data set through discovery, but how to sift through zillions of images to identi
- Re: (Score:2)
  
  by alvinrod ( 889928 ) writes:
  
  Unless it's a modified image, comparing hashes would make it pretty easy to find. Even if it has been modified, I would t be surprised if there were already forensic tools that use special algorithms to try to create shortlists of possible matches. The bigger issue is always the company lying about the training set. There's no good way to reverse engineer these types of AI. Even more complex is that any poisoning could be parodied by an actual artist who is selling their work to be used for training data wh
  - Re: (Score:2)
    
    by Shakrai ( 717556 ) writes:
    
    Unless it's a modified image, comparing hashes would make it pretty easy to find. Even if it has been modified, I would t be surprised if there were already forensic tools that use special algorithms to try to create shortlists of possible matches.
    Technology that does exactly that already exists for CSAM detection. It's far from perfect [nytimes.com] but the general principles should be the same, you don't need an exact file level hash match to detect known CSAM, and algorithms can detect novel CSAM (and non-CSAM as in the linked story).
  - Re: (Score:2)
    
    by ZipNada ( 10152669 ) writes:
    
    It may be that the court would be satisfied if she could just show where the AI produced a reasonable facsimile of her work, in which case it wouldn't have to be digitally equivalent. It would be a shame if all art could be ingested regardless of copyright and used to make close derivatives without any attribution or compensation.
    - Re: (Score:2)
      
      by Rei ( 128717 ) writes:
      
      But she should also have to show how many attempts she took to get it to produce a reasonable facsimile of her work. I mean, if she tasked a cluster of 10000 GPUs to churn out images as fast as possible nonstop for weeks on end until one hit a reasonable similarity metric, then you're getting into the "Million Monkeys With A Million Typewriters" problem.
      - Re: (Score:1)
        
        by iAmWaySmarterThanYou ( 10095012 ) writes:
        
        Is that similar to a million Tesla robotaxis crossing the country and one of them making it by end of 2019?
        Asking for a friend....
        
        Re: (Score:2)
        
        by Rei ( 128717 ) writes:
        
        Whoever you're talking about, it's not me.
        
        Re: (Score:2)
        
        by Rei ( 128717 ) writes:
        
        Source: here [slashdot.org] and here [slashdot.org].
        
        Re: (Score:2)
        
        by iAmWaySmarterThanYou ( 10095012 ) writes:
        
        I already posted the link last time showing yes it's you.
- Re: (Score:2)
  
  by Gibgezr ( 2025238 ) writes:
  
  Some models publish their training sets, like they say "we used LAION-2B-EN" or whatever.
  In this case, the odds that one of her images was grossly overfitted in training are pretty much nil, so even if her paintings are in a training set, it is unlikely she could get it to recreate one.
A little more information on the 'poison'... (Score:5, Interesting)

by NimbleSquirrel ( 587564 ) writes: on Friday November 03, 2023 @07:40PM (#63978338)

I have found a bit more information on some of the methods of 'poison': tools called 'Glaze' and 'Nightshade' [technologyreview.com]
My guess is that, in addition to false image metadata, they use a form of steganography to embed a picture within a picture. This is done by adjusting the mathematical relationship between the bits of each pixel and their adjacent pixels. What may appear to be a solid block of color to the human eye could be an arrangement of pixels with subtely different values. While we see the image, or lack of, the AI training sees the mathematical relationship between pixels. So I can see how creating a strong mathematical relationship within pixel information could impact AI training. Steganography is not new, but it is interesting to see this being positioned as a method against AI.
However, I don't see how this could survive things that would destroy subtle relationships between pixel values: running a filter that adds low level random bit noise; reducing the bit depth of an image; or re-encoding with a significantly lossy codec (like JPEG).
I think some people will see it as a challenge to break this. Much like the constant battles seen with other DRM systems. However, a tool actively seeks to poison an AI dataset, is only going to serve to make people far more motivated to break it or workaround it. I don't think it would be long before someone creates a tool that can detect these images. Then it could either remove the image from the training dataset (which I can see the artists preferring) or altering either the image or training process so that the steganography is negated.

- Re:A little more information on the 'poison'... (Score:5, Interesting)
  
  by Rei ( 128717 ) writes: on Saturday November 04, 2023 @06:31AM (#63979136) Homepage
  
  First off, these attacks are pure hype [slashdot.org], and secondly, that's not how they work.
  * They don't use false metadata.
  * They don't use steganography.
  What they do is that they take a diffusion training algorithm and a human visual algorithm. They then simultaneously maximize the two at once: HIGH diffusion toward the wrong target, with LOW visual deviation from the source image. This is not simply things like "solid blocks of colour vs. alternating pixels", but a whole slew of things that humans and AI may weigh differently.
  Just to pick a random example: there are two main ways you can define a boundary line. The more obvious one is a difference in colour. But another is a difference in noise frequencies. For example, a bokehed background will have only low-frequency noise, while the foreground will often also contain high-frequency noise, so you can compare with-high-frequency noise vs. without (so a poison attack may actually *make* solid blocks of colour that used to have more pixel variation). If the brain preferred to determine boundary lines more based on colour than noise, or vice versa, and the AI found it easier to do the opposite, then that represents an avenue for attack. And there are many such examples.
  But you can probably already see problems with this approach. First off, you have to pick a specific diffusion algorithm, but they're not all the same, and they're constantly changing. Secondly, diffusion algorithms tend to get closer to human visual similarity over time, thus, more damage you have to do to your image relative to the amount of poison you're causing. Third, this maximization process is slow. Fourth, everything in the link above - the attacks are brittle, easily detected, easily removed, don't affect preexisting datasets, and you can even make generalizable countermeasures to them.
  
Commercial use (Score:2)

by Tyr07 ( 8900565 ) writes:

If someone copies your drawings it's copyright. If they learn how to draw from your work, as an individual, that's fair use.
If you take the copyrighted works of a whole bunch of people, put it into a book, and use that book as material in a class you sell to teach people how to draw, I.E profit off their works, you're now infringing their copyright.
One of the differences we don't restrict rights to what human eyes can see. Individually people have different capacities and understanding, these are machines t
- Re: Commercial use (Score:2)
  
  by dpille ( 547949 ) writes:
  
  It's not fair use- fair use is a defense to copyright infringement. If you've merely looked at a work to learn how to paint, you've either created your own work or created a derivative work. If it's a derivative work, no court will give the slightest shit whether you were learning how to paint.
- Re: (Score:1)
  
  by f00zbll ( 526151 ) writes:
  
  it's not that simple. The models memorize a significant amount of the data it was trained on. So it is violating copyright since it has a copy of the original artwork in the neural network weights. To get around this, the technology has to progress to the point where engineers can clearly prove the model didn't memorize the training data. We still can't interpret the weights. Don't take my word, go search arxiv.org for papers.
  - Re: (Score:3)
    
    by taustin ( 171655 ) writes:
    
    It's not the simple, either. The legal question is whether the output is derivative (or which requires permission) or transformative (which does not). Google won their book scanning lawsuit at trial because scanning and OCRing millions of books and keep full, verbatim copies to produce a searchable full-text index was deemed, by the court, to be transformative. They took the copyrighted works, and turned them into something else.
    I suspect that case will be cited by the defense, and not by the prosecution.
    - Re: (Score:2)
      
      by iAmWaySmarterThanYou ( 10095012 ) writes:
      
      That was a terrible ruling and a great example of how money=power in courts today.
      It was copyright violation on a grand scale. They didn't transform anything. By the same logic, if you ran a software or music piracy website with an index you are legally ok because you transformed the original works. It's a nonsense argument.
      - Re: (Score:2)
        
        by taustin ( 171655 ) writes:
        
        They weren't serving complete works, only short excerpts (which is specifically allowed under fair use). So your analogy is bullshit.
        Good ruling or bad it's the law of the land right now. Don't like it? Write your congressman, and tell him to change it. Congress can do that, you know. But you won't bother, because you don't give a shit about copyright, only about having something to whine about.
  - Re: (Score:1)
    
    by wfj2fd ( 4643467 ) writes:
    
    To get around this, the technology has to progress to the point where engineers can clearly prove the model didn't memorize the training data
    Actually, the proof needs to be made by the artists, who are the ones making the claim/assertion that the models are memorizing the images.
- Re: (Score:2)
  
  by drinkypoo ( 153816 ) writes:
  
  One of the differences we don't restrict rights to what human eyes can see.
  For stills the standard for copyright is recognizably copied elements... recognizable to a human.
Kill all plagiarism bots (Score:2)

by DarmokandJalad ( 10147593 ) writes:

Article 1, section 8, Clause 8 of the US Constitution specifically asks Congress to protect original works and inventions through copyright and patents. The AI industry has perverted Asimov's vision of an omniscient AI caring for humanity into Napster.
Generative AI has appropriated trillions of dollars of intellectual property without any compensation to rights holders. Don't let the thieves win this one. It will be the end of civilization and the world economy if they remain unchecked.
Leave your comments a
- Re: (Score:2)
  
  by JMZero ( 449047 ) writes:
  
  This won't work... you have to know that, right? This technology is fully out of the barn, and will just get better over time.
  Like.. yeah OK, you could stop US companies from publicly training off random images. You could sue the current crop of AI companies into oblivion for their sins of trying to teach computers how to draw and talk. Sure. But that isn't going to make a million copies of Stable Diffusion, installed and working on individual computers around the world, disappear. Or stop people aroun
  - Re: (Score:2)
    
    by Accordion Noir ( 1256202 ) writes:
    
    I'd agree that it's hard to put this back in the box.
    But it's probably going to hollow out jobs, not make lots of new ones any time soon. Automation put massive amounts of people out of work in industry, and those jobs never came back. And now it looks like that will happen in a lot of white-collar jobs too. If you work in tech or elsewhere, you'd better have specialized skills, because as much as you rely on AI, you're training your replacement.
    That said, for the sake of this article on art. It seems to me
  - Re: (Score:2)
    
    by DarmokandJalad ( 10147593 ) writes:
    
    People said Napster was here to stay, and the music industry sued it into oblivion. Content IP holders outnumber AI thieves by orders of magnitude. Once George RR Martin and John Grisham wins their suits, expect trillions of dollars of liability for the industry.
    - Re: Kill all plagiarism bots (Score:2)
      
      by JMZero ( 449047 ) writes:
      
      Yeah Napster died and then piracy ended. Right? Of course not.
      Like... If you read my comment, you'll notice I agree OpenAI might get shut down. But then you'll notice the rest of my comment, where I argue why that isn't probably going to change much (or at least not for the better).
- Re: (Score:2)
  
  by Gibgezr ( 2025238 ) writes:
  
  AI didn't appropriate anything. It learned how to make images, that's all.
  Welcome to the future, it's sort of cool.
- Re: (Score:1)
  
  by Neuroelectronic ( 643221 ) writes:
  
  At least you can say you tried.
Downside (Score:2)

by timeOday ( 582209 ) writes:

If they do this and it has no noticeable effect, it will weaken the basis for their objections and legal actions by showing that they were over-estimating the AI's reliance on their particular artwork.
There should be enough pre-2023 data (Score:1)

by migos ( 10321981 ) writes:

For the gen-AI to work with. However, if the artists start to poison new images, it's possible that gen-AI might not be able to generate with the new styles of art.. until someone open source their art, that is.
Unaddressed seedcorn problem (Score:4, Interesting)

by a5y ( 938871 ) writes: on Saturday November 04, 2023 @09:34AM (#63979350)

All arguments for government non-interference in matters of training LLMs on the unauthorised and uncompensated work of others ignore the fact that those LLMs cannot operate WITHOUT that work and that in operating those LLMs to provide a service they undermine the ability of the creators of original works to earn a living making them.
They run, they interfere with planting seedcorn they depend on and then there's a failed harvest problem, because without novel input of a known high quality, the process cannot hope to create novel output of a high quality.
Maximalist positions on LLMs (which maximalists will always call AI because if you're not hyping - and scamming - you're not trying as hard as you can) are inherently unsustainable business practices. They're a foreseeable rugpull.
No amount of "but artists copy other artists work too!" whataboutism addresses the fact that artists aren't a scaleable threat to other artists livelyhood.

- Re: (Score:2)
  
  by WaffleMonster ( 969671 ) writes:
  
  All arguments for government non-interference in matters of training LLMs on the unauthorised and uncompensated work of others ignore the fact that those LLMs cannot operate WITHOUT that work and that in operating those LLMs to provide a service they undermine the ability of the creators of original works to earn a living making them.
  What else is new? People routinely spend their entire professional lives working only to have the rug pulled from under them as their job is rendered obsolete by change. That technology is disruptive and routinely puts people out of work does not automatically justify enlisting the government in a futile and pointless attempt at picking winners and losers.
  Nobody is entitled to control what can be learned from others. Copyrights don't work that way and shouldn't work that way.
  Given the proliferation of hi
The whole thing is silly (Score:2)

by sjames ( 1099 ) writes:

Human art students learn by examining existing works all the time. It's about the only way to do it. This has been the accepted norm throughout the history of art.
If you argue that AI shouldn't be allowed to learn from art, you argue that nobody with the goal of being an artist should be allowed to look at art, or conversely, if you have ever looked at art, you should not be permitted to create art. I'm guessing the latter would fully end the career of anyone complaining about AI.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

dumb argument (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: dumb argument (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:dumb argument (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: dumb argument (Score:2)

No educational uses either? (Score:2)

Re: (Score:2)

Re: (Score:2)

I am at the opinion that if you publish digitally. (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:I am at the opinion that if you publish digital (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:I am at the opinion that if you publish digital (Score:4, Insightful)

Re: (Score:2)

Poison The Poison. (Score:2)

Re: (Score:2)

identifying the specific images (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

A little more information on the 'poison'... (Score:5, Interesting)

Re:A little more information on the 'poison'... (Score:5, Interesting)

Commercial use (Score:2)

Re: Commercial use (Score:2)

Re: (Score:1)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Kill all plagiarism bots (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Kill all plagiarism bots (Score:2)

Re: (Score:2)

Re: (Score:1)

Downside (Score:2)

There should be enough pre-2023 data (Score:1)

Unaddressed seedcorn problem (Score:4, Interesting)

Re: (Score:2)

The whole thing is silly (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals