Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
AI

New Data Poisoning Tool Lets Artists Fight Back Against Generative AI (technologyreview.com) 173

An anonymous reader quotes a report from MIT Technology Review: A new tool lets artists add invisible changes to the pixels in their art before they upload it online so that if it's scraped into an AI training set, it can cause the resulting model to break in chaotic and unpredictable ways. The tool, called Nightshade, is intended as a way to fight back against AI companies that use artists' work to train their models without the creator's permission. Using it to "poison" this training data could damage future iterations of image-generating AI models, such as DALL-E, Midjourney, and Stable Diffusion, by rendering some of their outputs useless -- dogs become cats, cars become cows, and so forth. MIT Technology Review got an exclusive preview of the research, which has been submitted for peer review at computer security conference Usenix.

AI companies such as OpenAI, Meta, Google, and Stability AI are facing a slew of lawsuits from artists who claim that their copyrighted material and personal information was scraped without consent or compensation. Ben Zhao, a professor at the University of Chicago, who led the team that created Nightshade, says the hope is that it will help tip the power balance back from AI companies towards artists, by creating a powerful deterrent against disrespecting artists' copyright and intellectual property.

Zhao's team also developed Glaze, a tool that allows artists to "mask" their own personal style to prevent it from being scraped by AI companies. It works in a similar way to Nightshade: by changing the pixels of images in subtle ways that are invisible to the human eye but manipulate machine-learning models to interpret the image as something different from what it actually shows. The team intends to integrate Nightshade into Glaze, and artists can choose whether they want to use the data-poisoning tool or not. The team is also making Nightshade open source, which would allow others to tinker with it and make their own versions. The more people use it and make their own versions of it, the more powerful the tool becomes, Zhao says. The data sets for large AI models can consist of billions of images, so the more poisoned images can be scraped into the model, the more damage the technique will cause.

Nightshade exploits a security vulnerability in generative AI models, one arising from the fact that they are trained on vast amounts of data -- in this case, images that have been hoovered from the internet. Nightshade messes with those images. Artists who want to upload their work online but don't want their images to be scraped by AI companies can upload them to Glaze and choose to mask it with an art style different from theirs. They can then also opt to use Nightshade. Once AI developers scrape the internet to get more data to tweak an existing AI model or build a new one, these poisoned samples make their way into the model's data set and cause it to malfunction. Poisoned data samples can manipulate models into learning, for example, that images of hats are cakes, and images of handbags are toasters. The poisoned data is very difficult to remove, as it requires tech companies to painstakingly find and delete each corrupted sample.
Nightshade is going to make AI companies think twice, "because they have the possibility of destroying their entire model by taking our work without our consent," says Eva Toorenent, an illustrator and artist who has used Glaze.

"I'm just really grateful that we have a tool that can help return the power back to the artists for their own work," added Autumn Beverly, another artist.
This discussion has been archived. No new comments can be posted.

New Data Poisoning Tool Lets Artists Fight Back Against Generative AI

Comments Filter:
  • by aduxorth ( 450321 ) on Monday October 23, 2023 @11:36PM (#63947969)

    Great! Another way to piss off our AI overlords.. Intentional poisoning, I am sure that will go down great in it's mindset.

    • by YetAnotherDrew ( 664604 ) on Monday October 23, 2023 @11:51PM (#63947995)
      Yes, don't fight back. Defending yourself might irritate them. And we can't have that. They're overlords!
      • by ls671 ( 1122017 ) on Tuesday October 24, 2023 @01:19AM (#63948093) Homepage

        I must have been using data poisoning since 1995, especially for pseudo-surveys. Most of the accounts I have with web sites or what not is populated with silly fake data, names, etc. and a dedicated email address for each account.

        • by saloomy ( 2817221 ) on Tuesday October 24, 2023 @02:50AM (#63948219)
          If the changes in the images are invisible, wont AI represent them similarly? Also, this whole "my stuff is being used to train AI" horseshit is nuts. Every author you've read who you like their work, from JK Rowling, to George RR Martin, to Tolkien, to Frank Herbert, Isaac Asimov, etc... have all trained using the works of people prior. You think they learned story arcs, character development, symbology, nuance that made them great authors from birthright without ever reading novels and studying? Why should AI be under a handicap that no human learner is under? If my child reads their books, and uses that as insight and inspiration to craft creative works that follow the same patterns, that is the same thing as an AI learning similar inputs.
          • But that's the thing; humans can still see, watch, listen, & read creative works. Glaze & Nightshade don't stop them. AI isn't seeing, watching, listening, & reading in the sense that we understand or it wouldn't be having these issues with small changes in the images where humans don't, so it's clearly something different.
            • by Vintermann ( 400722 ) on Tuesday October 24, 2023 @08:57AM (#63948695) Homepage

              It's all fun and games until the models learn to poison biological neural networks in the same way. You thought you were simply looking at nice pictures of someone's gardening project, then five minutes later suddenly you decide to run out into the street with no pants for no discernible reason.

            • by Rei ( 128717 ) on Tuesday October 24, 2023 @09:42AM (#63948783) Homepage

              Except it doesn't.

              1) Here's how effective their last tool (Glaze) was: garbage [reddit.com].

              2) In particular, they claim it's cropping and scaling invariant, except that it very demonstrably isn't. And cropping and scaling are the very first steps of training.

              3) They made this tool because just the release of a new version of Stable Diffusion (SDXL) broke their last one [github.com] - if you can ever claim that the last one worked at all.

              4) Creating new corrupt images has no impact on *existing datasets*, which again, *already exist*.

              5) Trainers are not morons who just blindly use all data and weight it all the same. If models inclusive of certain data produce worse results than ones not inclusive of it, *they just toss that data*. They don't even have to understand *why* it's bad.

              6) It is vastly easier to defeat attacks than create and mass deploy them. For example, 16 lines of python [github.com].

              7) Indeed, there are easy inherent defeats to these types of attacks, because if something is corrupting, then you can inherently test it just by seeing if it has a corrupting impact to known concepts, which is a quick, easy test [arxiv.org]. So you don't even have to come up with a defeat for a specific attack, because there are inherent defeats to all attacks.

              By all means, run these (very slow) algorithms on your image if it tickles your fancy. All you're doing is hurting your quality.

              It's also worth noting that the team behind this are ideologues. They're releasing press releases before their work even goes into preprint, let alone peer review. They also use loaded language like "theft" and "pilfered" and the like (for the record, most copyright attorneys I've seen weigh in are strongly in agreement that using copyrighted data in AI training is perfectly legal under fair use, on the same grounds Google uses copyrighted data for in developing its services - indeed, trainers could arguably go after the authors for libel). They clearly have a bone to pick, but the reality is, they have no power. Even if every new image added to the internet had some flawless, irremovable, processing-invariant, undetectable adversarial coding (which as mentioned, is literally impossible), they still wouldn't stop the advancement of AI diffusion models on preexisting datasets, which contain billions of images.

              • Even if every new image added to the internet had some flawless, irremovable, processing-invariant, undetectable adversarial coding (which as mentioned, is literally impossible), they still wouldn't stop the advancement of AI diffusion models on preexisting datasets, which contain billions of images.

                Even if you could somehow retroactively imbed all existing images with corruption, AI training can adapt. Is the goal to break the way AI works so it stops out-competing you, or just to make sure your data isn't in its training mix? Seems like the goal is the former, and sorry. Math works, and works well.

              • 5) Trainers are not morons who just blindly use all data and weight it all the same. If models inclusive of certain data produce worse results than ones not inclusive of it, *they just toss that data*. They don't even have to understand *why* it's bad.

                This point argues that the tool is not "garbage". That it is effective with respect to the artist's wishes that their art not be part of AI training.

            • AI isn't seeing, watching, listening, & reading in the sense that we understand

              Yet. - FTFY

          • by Registered Coward v2 ( 447531 ) on Tuesday October 24, 2023 @07:03AM (#63948493)

            If the changes in the images are invisible, wont AI represent them similarly? Also, this whole "my stuff is being used to train AI" horseshit is nuts. Every author you've read who you like their work, from JK Rowling, to George RR Martin, to Tolkien, to Frank Herbert, Isaac Asimov, etc... have all trained using the works of people prior. You think they learned story arcs, character development, symbology, nuance that made them great authors from birthright without ever reading novels and studying? Why should AI be under a handicap that no human learner is under? If my child reads their books, and uses that as insight and inspiration to craft creative works that follow the same patterns, that is the same thing as an AI learning similar inputs.

            As you point out, a human uses insight and inspiration to create a work; al AI does is do an analysis and decides what is likely to appear after a word and fills in the blank. A man can read Save the Cat and write a story even if they haven't read many; and if they are really gifted actually write a good one. AI, absent lots of training data to build on, can't and that's the difference. AI isn't being creative; all it's doing is reassembling existing works into something similar.

            • Repeating it won't make it true.

              AI models need more data because they're more general learners than humans. They have less built-in priors about what they should be doing, they could potentially learn all sorts of patterns that humans can't, so naturally it takes more data to constrain them to one particular pattern.

              We make them more powerful by making them less general. Making them worse at fitting all the patterns we won't need, is a fine price to pay for making them better at the patterns we do need (In

              • AI models need more data because they're more general learners than humans.

                It's the other way around. LLM's need more data because their statistical models are very specific. Without a humongous body of very specific data, LLM's are useless. In my experience, they are largely useless even with that humongous body of very specific data.

                Humans are the more general learners. Given a small set of knowledge, humans can extrapolate and interpolate that knowledge to create new knowledge.

                • Without a humongous body of very specific data, LLM's are useless. In my experience, they are largely useless even with that humongous body of very specific data.

                  Your second sentence cancels the insight of your first sentence, leaving the result "a humongous body of very specific data is irrelevant, since LLMs are largely useless with or without that body of data."

                • Horseshit. How much does a human read and hear before it can create the sort of responses an AI can? Hint: A lot. Go and grab a Singalese child (or adult for that matter), and stick a knife to his throat. Then demand he produce a creative work with only a small set of information. Hint: they can not.

                  Repeat this experiment with a person who doesnt speak your languages preferred output. They already have the advantage of knowing how to be creative in another language, yet they will have to learn a LOT of inf
              • by Rei ( 128717 )

                This.

                Indeed, there's an interesting general trend where (assuming you're fully and optimally training your models) as you increase the number of parameters, output quality increases linearly, until it starts to level off, then goes into reverse, and further increases in the number of parameters makes the model much worse than lower-parameter models. Why? Because the extra parameters lets it increasingly focus on learning the training data rather than developing general understandings of the underyling con

              • Repeating it won't make it true.

                AI models need more data because they're more general learners than humans. They have less built-in priors about what they should be doing, they could potentially learn all sorts of patterns that humans can't, so naturally it takes more data to constrain them to one particular pattern.

                Which is my point - they are not creating original works but simply using lots of data to identify and repeat a pattern.

                • Which is my point - they are not creating original works but simply using lots of data to identify and repeat a pattern.

                  Hint: So do we.

                  • by whitroth ( 9367 )

                    Human *vary* the pattern, and add new things. LLMs do not, no more than your typeahead does.

                    • Human *vary* the pattern, and add new things. LLMs do not, no more than your typeahead does.

                      LLMs do vary the pattern by explicitly injecting randomness during inference. It's why you can tell it to write a bedtime story and each time it is a different story.

                    • Human *vary* the pattern, and add new things. LLMs do not, no more than your typeahead does.

                      LLMs do vary the pattern by explicitly injecting randomness during inference. It's why you can tell it to write a bedtime story and each time it is a different story.

                      If you have to inject randomness in order to vary output it's not creating since it would without randomness always yield the same result without a decision to change it.

            • Re: (Score:2, Informative)

              by Rei ( 128717 )

              al AI does is do an analysis and decides what is likely to appear after a word and fills in the blank

              You realize that your brain does that exact same thing [google.com], don't you?

              Stop and ask yourself how your brain can learn. Learning requires some sort of assessment of error levels in order to adjust network weights. The generally accepted method is that brain learning requires on the brain making predictions on what its senses are going to encounter, and then measuring error between the prediction and the senses. S

              • al AI does is do an analysis and decides what is likely to appear after a word and fills in the blank

                You realize that your brain does that exact same thing [google.com], don't you?

                Among other things.

                The LLMs (which are misleadingly labeled "artificial intelligence") for the most part only do the pattern matching.

          • by IDemand2HaveSumBooze ( 9493913 ) on Tuesday October 24, 2023 @07:06AM (#63948503)

            I see this argument used a lot, but it's false equivalence. Humans don't 'train' on other people's works the same way AI does. Humans may get inspired by something and use an idea in a completely different context or take an isolated example and generalise it into a concept or deal with the input in a number of ways. Since all 'AI' is is basically a very complicated algorithm, all it can do is add the input into a giant pot of input, then fish out bits of that on demand. We want art to be creative. Humans can produce something creative after learning ideas from others, AI cannot.

            At the very least, artists may benefit from humans visiting websites where their art is shown and viewing the ads there. It's debateable how much they can actually benefit from it, but there is definitely no benefit from AI scraping such websites.

            • by Potor ( 658520 )

              Indeed. A human can understand; AI cannot, and thus just predicts and reproduces accordingly. The "generative" epithet is incorrect for there is nothing new.

              In fact, this is why it is so easy to catch students who submit an AI essay - they cannot explain what they submitted. They're no better than the machine.

              • You can't define "understand" in a non-circular way. Should not a Slashdot poster understand that we, too, can be seen as executing algorithms - and that whatever hard constrains about what algorithms can and cannot do, also apply 100% to us?

                • You can't define "understand" in a non-circular way.

                  I'm not sure that's necessarily true, but I can't come up with a definition myself. At least, while you may not always positively tell that someone undestands something, you can in many cases say that someone or something definitely doesn't understand something. While it may be hard to definitively say whether a particular Slashdot poster understands quantum physics, we can safely assume that an ant or a piece of discarded plastic wrapping does not understand quantum physics.

                  we, too, can be seen as executing algorithms - and that whatever hard constrains about what algorithms can and cannot do, also apply 100% to us?

                  You can look at it that way, cer

                  • following this thread of discussion, I see someone pretending that we know exactly how human thought works, exactly how human brains learn, etc. I am under the impression that we still have NO IDEA how these things work precisely, so someone stating that humans learn exactly the way an LLM learns (as I see one person on here appearing to state) is false. We don't know.
            • I see this argument used a lot, but it's false equivalence. Humans don't 'train' on other people's works the same way AI does.

              This is correct training works differently between humans and machines. The modalities of training are dissimilar.

              Humans may get inspired by something and use an idea in a completely different context or take an isolated example and generalise it into a concept or deal with the input in a number of ways.

              Since all 'AI' is is basically a very complicated algorithm, all it can do is add the input into a giant pot of input, then fish out bits of that on demand. We want art to be creative. Humans can produce something creative after learning ideas from others, AI cannot.

              Is it possible to differentiate a creative works from machine works? If so how can this be objectively evaluated to determine whether or not a work is "creative"? If it is not possible what is the objective basis of the assertion that AI cannot produce creative works?

              People seem to be way too caught up in anthropomorphism and anthropocentrism. There are lots of cheap ass simple "algorithms"

    • by Kisai ( 213879 ) on Tuesday October 24, 2023 @03:30AM (#63948253)

      I hate to say it, but while it is the "right idea" it's the wrong approach.

      What AI learns, is how to "auto-complete" something. So if you start poisoning the data, that just means the data has to be treated as potentially tainted.

      - Negative training. So images for dogs are also "not cats". Images for cakes "not hats". This is presently what is going on with stable diffusion and their forks, where there are "negative searches"
      - Making the AI actually learn the data poisoning technique and to treat it as negative. Which is easily done by grabbing the OSS Nightshade and just running every image it reads with and without the algorithm.

      The right approach is to actually assign proper tags and meta data to images with unique watermark data (eg artist*VIRUS102938487) and source (source*DEVIANTART-VIRUS102938487) that isn't "easily" searched for, so that you can type in those unique watermarks and have the model tattle on itself.

    • Buggy Whip manufacturer starts putting spike strips down to pop car tires, but horses step right over it. News at 11!

  • Now it's a war of training AI to find the "poisoned" images and "cure" them to be used in training but if it makes people feel safe lets do it! :P

    • Companies don't really need to, they are using RLHF, which will dilute whatever signal they got from the pretrained "poisoned" images in the first place, most companies such as midjourney are training using a method of refining the outputs of previous generation, and adding in a mixture of original data, sort of analogous of using a turbo charger to increase the pressure, but in this case using RLHF to the outputs back into the model.

  • by crunchygranola ( 1954152 ) on Tuesday October 24, 2023 @12:04AM (#63948017)

    A common thread wending through all the GenAI tools of various sorts is that they have no understanding of their own output in any sense, and this leads to weird brittleness that is incomprehensible to people. Sure, we don't expect the system to be conscious or genuinely rational, but they don't have any internal model of their domain. In imaging it would be understanding the physical structure of space and matter and thus be able to identify meaningful structural features like humans do. We have cases of people who lose this ability - it is called visual agnosia. The fact that an deep learning AI can mistake an AK-47 for a teddy bear (and actual example I have seen) means that the recognition capability it does have is somehow unrelated to the physical structure that defines the object - a pixel level only phenomenon. They need to identify physical structure from pixels, and they can't really do that (even image heatmaps aren't really that).

    • A common thread wending through all the GenAI tools of various sorts is that they have no understanding of their own output in any sense, and this leads to weird brittleness that is incomprehensible to people. Sure, we don't expect the system to be conscious or genuinely rational, but they don't have any internal model of their domain. In imaging it would be understanding the physical structure of space and matter and thus be able to identify meaningful structural features like humans do. We have cases of people who lose this ability - it is called visual agnosia. The fact that an deep learning AI can mistake an AK-47 for a teddy bear (and actual example I have seen) means that the recognition capability it does have is somehow unrelated to the physical structure that defines the object - a pixel level only phenomenon. They need to identify physical structure from pixels, and they can't really do that (even image heatmaps aren't really that).

      Honestly? I'm pretty sure that humans have no understanding of the underlying nature of time and space, reality. And that this leads to weird brittleness and is incomprehensible.

      I'm not really sure what it is that we understand.... and I'm not really sure that, underneath it all, we are actually any better off than these AI's are.

      • Yep. Consider how alien things like quantum physics seem to us, with the nonlocality and pilot waves and probability and all that. Evolution has given us a framework to percieve the world that doesnt make it easy to intuit whats going on under the hood, because up until the last hundred years we never needed to have an intuition about it.

        And theres certainly ways to expose the limitations in how we see and understand things. Theres a number of illusions, that straight up have us hallucinating things that ar

        • Theres a shadow illusion that in particularly always does my head in because even though I can prove to my self two different regions of the illusion are literally the same color and shade, my brain just flat out refuses to percieve it.

          This is because your (and my) brain tries to process the image as if it was a photo or something you see in real life. If two objects reflect the same amount of light, but one is in bright light, while the other is in a shadow, the one in a shadow must be brighter and just appear darker because it is in a shadow.
          A good example is here:
          https://en.wikipedia.org/wiki/... [wikipedia.org]
          It is a computer generated image, but our brain interprets it as if it was a photo (or something in reality). So, what do we see? A checkerboa

          • Agreed. Although 3-D illusions also exist the many picture illusions that we all know and love are principally due to the artificial mapping of 3-D space on to a 2-D surface which breaks many rules about how lighting and image structure work in the physical world.

      • Honestly? I'm pretty sure that humans have no understanding of the underlying nature of time and space, reality. And that this leads to weird brittleness and is incomprehensible.

        I'm not really sure what it is that we understand.... and I'm not really sure that, underneath it all, we are actually any better off than these AI's are.

        Reminds me of WIlly Wonka "Oh, you should never, never doubt what nobody is sure about."

        What you are "pretty sure" of is an obvious absurdity. No human ever mistakes an AK-47 for a teddy bear barring a severe organic brain defect. Why is that? Because they do have an excellent innate understanding of how 3-D space works and what physical features of an object define it as being that particular class of object. No young child, old enough to learn the names of things, would ever make the AK-47/teddy bear mist

        • naw, most people label any old SKS, AKM's, RK 62's Type 81's, etc, as an AK, when they are not one. they have internalized a particular look and then label everything they see as one... I imagine that an LLM would do the same.
    • The fact that an deep learning AI can mistake an AK-47 for a teddy bear (and actual example I have seen) means that the recognition capability it does have is somehow unrelated to the physical structure that defines the object - a pixel level only phenomenon.

      May I be so bold as to suggest an alternate explanation...
      https://www.bing.com/images/se... [bing.com]

  • Clearly warning the bots to avoid the material is fine
    Vandalism that can destroy valuable data is either a crime or cause for civil action

    • by excelsior_gr ( 969383 ) on Tuesday October 24, 2023 @01:32AM (#63948109)
      You can't "vandalize" your own property. If I damage the brakes of my car on purpose and you get injured while stealing it, you can't blame me for your injury. At least that's what I would expect. I'm not a lawyer and the law can be totally crazy sometimes.
      • by silentbozo ( 542534 ) on Tuesday October 24, 2023 @01:49AM (#63948129) Journal

        Re crazy laws... I'm reminded of cases where thieves have sued jobsites for being injured when they snuck in to steal tools or materials. Or sued homeowners for defending themselves...

        https://www.cbsnews.com/news/b... [cbsnews.com]

        "Burglar sues Calif. homeowner, 90, who returned fire"

        https://www.pennlive.com/news/... [pennlive.com]

        "Man who got 13,000-volt shock while trying to steal copper can't get money from building's owner"

      • If I damage the brakes of my car on purpose and you get injured while stealing it, you can't blame me for your injury.

        Actually he can. To be clear it shouldn't, and I agree with you in principle, but the foundation of many rulings on tort law include people getting hurt during criminal activities unintentionally by something that they have assumed were in otherwise working order.

        Now if you pulled the plates off the car to make it clear that the car was not roadworthy you'd have a defence, but the way the law has been ruled on time and time again would make you liable, not for sabotage, not for vandalism, but for negligence

        • Me reading your post does not involve the creation of a copy, even if I was gifted enough to memorize it after having read it once. I think this simple rule can be applied to the AI-copyright question: if the AI gets trained on the fly, fits and stores its parameters while the text is loaded in RAM or other type of volatile memory (this I distinction believe is important, since otherwise every browser would be performing copyright infringement), then that's OK in my book. No copy has been made, no copyright
          • ACTUALLY...
            It does involve the creation of a copy. Think about CDNs and how the post text gets transmitted to you from the server.
            But AIs don;t store the training data, they store the results of training. Stable diffusion stores training for all of the hundreds of millions of images it is trained on in a 2-4GB file (depending on the model used). That means every image gets about 1-2 bytes of the set: hardly "storing a copy", eh?

      • You can't "vandalize" your own property. If I damage the brakes of my car on purpose and you get injured while stealing it, you can't blame me for your injury. At least that's what I would expect. I'm not a lawyer and the law can be totally crazy sometimes.

        I am not a lawyer either, but one quick counter example is that deliberately booby trapping your unsupervised property with lethal traps will actually almost certainly land you in trouble if someone injures or kills themselves on it. It’s fairly well-established case law:

        https://en.wikipedia.org/wiki/... [wikipedia.org]
        https://www.youtube.com/watch?... [youtube.com]

        I’d guess the same would be true of your example, or if you put razors in the apples on your apple tree that you knew the neighborhood kids were picking against y

      • You can't "vandalize" your own property. If I damage the brakes of my car on purpose and you get injured while stealing it, you can't blame me for your injury. At least that's what I would expect. I'm not a lawyer and the law can be totally crazy sometimes.

        It sounds like the AI companies are moving towards allowing at the very least opt-out. If they made this convenient enough, ie, a metadata tag, it should make it pretty easy for artists to exclude their work.

        If there was an easy to use opt-out mechanism that the artist knew about, but they neglected to use it and instead deliberately put out 'poisoned images' to break the model, then I could actually see grounds for suing the artist.

        If they do use the opt-out then poisoning the image is just adding a bit of

    • by znrt ( 2424692 )

      chill, they are mostly after the money of enraged "creators" who think their entitlement is justified and worth it. ofc it isn't but cash will change hands anyway.

      the again i'm unsure they have the correct marketing for juicy fees, judging by the one "creator" they use as a figurant in the article ....

  • Encouraging (Score:4, Informative)

    by ZipNada ( 10152669 ) on Tuesday October 24, 2023 @01:24AM (#63948097)

    "these poisoned samples make their way into the model's data set and cause it to malfunction".
    "The poisoned data is very difficult to remove"
    Others can "tinker with it and make their own versions".
    Nightshade infects not only the word “dog” but all similar concepts, such as “puppy,” “husky,” and “wolf.”

    This is good news! It sounds like they have come up with a way to subtly spike a relatively small amount of data and cause an outsized problem. Maybe this 'poisoning' technique can be extended to protect other aspects of human creativity and intellectual property that are vulnerable to unauthorized ingestion by AI companies. In a best case, developers of a model will leery of scraping content that hasn't been vetted and approved.

    • Re: (Score:2, Insightful)

      by thegarbz ( 1787294 )

      Maybe this 'poisoning' technique can be extended to protect other aspects of human creativity and intellectual property

      To what end? The history of human creative arts have largely been based on learning from other artists. Human creativity doesn't exist in a vacuum, it is founded on the creativity which came before it. Even Leonardo da Vinci ultimately learnt his craft indirectly by working in the studio of Andrea del Verrocchio, who himself was a student of Donatello (clearly the superior turtle ;-) )

      • This is industrialized "inspiration" and production. We're not talking about one human learning from others and generating work at a human scale.
        • This is industrialized "inspiration" and production. We're not talking about one human learning from others and generating work at a human scale.

          Except the industrialised portion here is just a tool. Fundamental creativity is still dependent on humans and continues to be produced at a human scale. AI doesn't think for itself, despite the industry mis-use of the terms.

          If you've ever tried to use any of these tools you'd realise that in some cases you can spend more times refining your prompts and processing the results than creating something from scratch in the first place. The actual volume of it being "industrialised" is absolutely no relevant her

      • What we are talking about here, in case you completely missed it, is the unauthorized copying of works of art and other human creations, and using it to generate derivative works.

        Clearly there are intellectual property rights and artistic ownership of work product. If an AI company wants to ingest your personal property they should have to get permission and make compensation. If they fail to to so and their model gets spiked, they deserve it.

    • This is good news! It sounds like they have come up with a way to subtly spike a relatively small amount of data and cause an outsized problem. Maybe this 'poisoning' technique can be extended to protect other aspects of human creativity and intellectual property

      You're right, it is good news. If this actually works, it means that the models will be adapted to be less brittle. Or is that not what you meant? /s

      Seriously, I don't have a lot of sympathy for people who put stuff online, and then say "don't look at that!". Do you know what budding (human) artists do? They look at other artists' work. They copy it. They adapt it. All without actually paying anyone for anything.

      If you don't want someone (human) or something (AI) to examine, copy and adapt your work, al

      • The issue isn't people looking at it online.

        The issue is using it for some other purpose without license or compensation.

        Why can't I get a high quality printer to make my own money? I own the money in my pocket. The government printed the original bills and released them to the world. I own them and can copy them. Right?

    • Artistically, I've been very eager to see this in action. I've compared it do the "dazzle" camouflage of ships. Meant to confuse, not conceal so much.

      I see a sort thing where we use the edges of the sensory inputs of the machines against themselves. It can mess with the soft programming, or even the hardware, perhaps using visual tricks to fool the sensors into misreporting. Creating "open spaces" in the operation of the system in which to insert data of the attackers choosing using coded sensory in

      • I like your thinking, lol. As for watermarks, I assume there are techniques for embedding them in an image in a way that isn't visible unless you know how to extract and decode the watermark. In principle you could use this as a way to identify thieves without them being aware of it.

        But in this case they appear to be going further. The embeddings pollute the entire corpus of stolen material. That's new.

  • Will these transformations survive a resize, blur, or rotate filter?

    • by necro81 ( 917438 )

      Will these transformations survive a resize, blur, or rotate filter?

      No, such tools can defeat the poison. On the other hand, zoom-and-enhance is totally broken.

  • Does anyone have a quick explanation of how this is being done? Are the transparency bits of the image being altered slightly (bits 24 through 31)? I can't image the RGB bits are being altered as that would make a noticeable change to the image. Is the image's metadata being altered in a way that causes this? This is all very confusing but fascinating.

    • by jbengt ( 874751 )

      I can't image the RGB bits are being altered as that would make a noticeable change to the image.

      Assuming 8 bits each for Red, Green, and Blue, the least significant bits of each color can usually be altered without a noticeable change to the image. Hell, a lot of consumer-grade monitors only use 6 bits per color, anyway.
      That said, your main point is good - what do they do to actually "poison" the images? And why didn't TFA talk about that - that'd be the actually interesting part.

  • BLIT (Score:3, Interesting)

    by kkoo ( 4352157 ) on Tuesday October 24, 2023 @02:44AM (#63948217)
    These folks need to read their sci-fi and see the unintended consequences of poisoning AIs. https://en.m.wikipedia.org/wik... [wikipedia.org]
  • Just insert the image poisoning algorithm into the GAN training engine, and AI will be able to avoid it.
  • by stevenm86 ( 780116 ) on Tuesday October 24, 2023 @03:42AM (#63948275)
    You may be exploiting errors / shortcuts / inefficiencies in the model by introducing careful noise, but to a human the "poisoned" image still looks correct. Cool, sounds like a great training set for building a more robust model, after someone labels these.
    • You are misunderstanding how these models work. They exploit deep reinforcement learning. This algorithm basically turns the image into small blocks, trains partial nets on the blocks and reintegrates those nets into one and copies it to all the parts. Perhaps you remember that before deep learning was invented the image recognition programs could recognise for example a cat only when it was in the same spot of an image as it learnt from its training data. By copying the trained net to all positions this wa

  • by Rosco P. Coltrane ( 209368 ) on Tuesday October 24, 2023 @04:50AM (#63948333)

    But this is rather pointless: at some point - and I think it's already started - AI wil train on AI-generated data, thereby quickly degenerating into a useless mess all on its own.

    The poisoning will hasten the process though, so I'm all for it.

    • Since AI apparently also can't properly discern between AI generated and human generated content (as we have seen in so many failed attempts to remove AI content from academia) and the fact that AI can generate content far faster than humans can, it's to be expected that at some point, most of what is used to train AI is AI generated content.

      Which initiates the question, what is truth? What reality? What is a $thing? In many cases, it is what general consensus agrees upon. Sure, for some things, it would be

      • For some things such as beauty, yes, there are ever changing standards that differ over time or in different places or among people in the same time and place.

        But other things are simply true. 2+2=4 is always true even if a bunch of knuckleheads declare it otherwise. But you can train an AI to agree that 2+2=5 while any child will figure it out on their own even if mis-taught originally. The AI will _never_ figure it out if mistrained.

        • And that's the point.

          I am fairly sure you will by now have encountered more than just a few people who believe some kind of bullshit conspiracy. And I don't mean one of those that are harder to debunk like, say, Covid was created in some lab (or is a plot of $nwo_group_du_jour), but something so blatantly stupid that it should not be possible that a grown person believes that crap. Pick one, flat earth, moon landing, 5G magnetism, free energy, it's all good.

          Can you imagine what would happen if someone went

  • Even though this might sound great, it can have very negative impacts. It should therefore be illegal to use such practises. There are many realworld examples of stuff you can't do because it damages others.
    But then again, now that it is known, AI can be trained to ignore these kinds of patterns. To be honest, I already find it weird that adding a few extra random pixels would fubar AI learning.

  • It's time we push back against the data miners.

  • This record that cannot be played on that record player... https://en.wikipedia.org/wiki/... [wikipedia.org]
  • by oumuamua ( 6173784 ) on Tuesday October 24, 2023 @09:49AM (#63948805)
    They are already training off of user feedback. Soon they will surpass human artists because they are generating *exactly what people like*. Yeah it may be pop art but it sells - which causes many artists themselves to 'sell-out'.
    Time to start thinking how we want to run a society where only a fraction of the population needs to work.
  • Image data is almost always manipulated or preprocessed in a way that would break this kind of watermarking.

    For instance, most images are scaled to a fixed size before they are used for model training; I highly doubt that these techniques can survive a bicubic resample. If your eye can see it, you can train on it; that's about all there is to it.

  • Posting to undo a down mod that should have been an up mod

  • .. By just screenshotting the image (probably as a JPEG since that'll likely mess up a lot of hidden data) and feeding the result into the ai training set.
  • but society at large should reduce copyright back to 20 years or remove it all togeher after akk the artists are insist on their side of the deal but reducing the public domain side.

God doesn't play dice. -- Albert Einstein

Working...