Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
AI Microsoft

Microsoft AI Engineer Says Company Thwarted Attempt To Expose DALL-E 3 Safety Problems (geekwire.com) 78

Todd Bishop reports via GeekWire: A Microsoft AI engineering leader says he discovered vulnerabilities in OpenAI's DALL-E 3 image generator in early December allowing users to bypass safety guardrails to create violent and explicit images, and that the company impeded his previous attempt to bring public attention to the issue. The emergence of explicit deepfake images of Taylor Swift last week "is an example of the type of abuse I was concerned about and the reason why I urged OpenAI to remove DALL-E 3 from public use and reported my concerns to Microsoft," writes Shane Jones, a Microsoft principal software engineering lead, in a letter Tuesday to Washington state's attorney general and Congressional representatives.

404 Media reported last week that the fake explicit images of Swift originated in a "specific Telegram group dedicated to abusive images of women," noting that at least one of the AI tools commonly used by the group is Microsoft Designer, which is based in part on technology from OpenAI's DALL-E 3. "The vulnerabilities in DALL-E 3, and products like Microsoft Designer that use DALL-E 3, makes it easier for people to abuse AI in generating harmful images," Jones writes in the letter to U.S. Sens. Patty Murray and Maria Cantwell, Rep. Adam Smith, and Attorney General Bob Ferguson, which was obtained by GeekWire. He adds, "Microsoft was aware of these vulnerabilities and the potential for abuse."

Jones writes that he discovered the vulnerability independently in early December. He reported the vulnerability to Microsoft, according to the letter, and was instructed to report the issue to OpenAI, the Redmond company's close partner, whose technology powers products including Microsoft Designer. He writes that he did report it to OpenAI. "As I continued to research the risks associated with this specific vulnerability, I became aware of the capacity DALL-E 3 has to generate violent and disturbing harmful images," he writes. "Based on my understanding of how the model was trained, and the security vulnerabilities I discovered, I reached the conclusion that DALL-E 3 posed a public safety risk and should be removed from public use until OpenAI could address the risks associated with this model."

On Dec. 14, he writes, he posted publicly on LinkedIn urging OpenAI's non-profit board to withdraw DALL-E 3 from the market. He informed his Microsoft leadership team of the post, according to the letter, and was quickly contacted by his manager, saying that Microsoft's legal department was demanding that he delete the post immediately, and would follow up with an explanation or justification. He agreed to delete the post on that basis but never heard from Microsoft legal, he writes. "Over the following month, I repeatedly requested an explanation for why I was told to delete my letter," he writes. "I also offered to share information that could assist with fixing the specific vulnerability I had discovered and provide ideas for making AI image generation technology safer. Microsoft's legal department has still not responded or communicated directly with me." "Artificial intelligence is advancing at an unprecedented pace. I understand it will take time for legislation to be enacted to ensure AI public safety," he adds. "At the same time, we need to hold companies accountable for the safety of their products and their responsibility to disclose known risks to the public. Concerned employees, like myself, should not be intimidated into staying silent."
The full text of Jones' letter can be read here (PDF).
This discussion has been archived. No new comments can be posted.

Microsoft AI Engineer Says Company Thwarted Attempt To Expose DALL-E 3 Safety Problems

Comments Filter:
  • by XXongo ( 3986865 ) on Tuesday January 30, 2024 @09:44PM (#64202518) Homepage
    The phrase "Public Safety Risk" seems to be redefined from the usual meaning. Here it does not mean anybody is killed, maimed, or physically injured, but apparently is being used to mean "The AI tool can be made to generate pornographic pictures."
    • by dgatwood ( 11270 )

      The phrase "Public Safety Risk" seems to be redefined from the usual meaning. Here it does not mean anybody is killed, maimed, or physically injured, but apparently is being used to mean "The AI tool can be made to generate pornographic pictures."

      Public safety risk -> Corporate embarrassment risk

      Seriously, most corporations would put their own desire to not be publicly embarrassed above pretty much anything, including public safety. Just look at Boeing. I'm surprised Microsoft hasn't started the hatchet job to destroy this guy's reputation yet for daring to speak out.

      • He really has no basis to request they shut it down. This doesnâ(TM)t happen for other bugs either. He found a hole and they should get 90 days to fix it before he discloses.
        • by ls671 ( 1122017 )

          Indeed, but what can you expect from a Microsoft engine man. (train analogy)

        • by dgatwood ( 11270 )

          He really has no basis to request they shut it down. This doesnâ(TM)t happen for other bugs either. He found a hole and they should get 90 days to fix it before he discloses.

          Anybody can request anything. They're free to roll their eyes and say "No", of course.

        • He found a hole and they should get 90 days to fix it before he discloses.

          Indeed, it seems he found several holes.

    • by bill_mcgonigle ( 4333 ) * on Tuesday January 30, 2024 @10:21PM (#64202568) Homepage Journal

      Some people who've never experienced violence call mild annoyances "violence".

      It really demeans the experiences many people have lived through including genocide and worse. I think they have no Theory of Mind.

      But it's also super weird to market an art tool then freak out when the art tool can make nudes. Not only do paint brushes not engage in thought policing, all adult art students do a course in nudes. The democratization seems to be bugging them quite a bit.

      I guess it's time for an ole ( o ) ( o ) to really get them going.

      • by Lehk228 ( 705449 )
        \>people

        are they, really?
      • by Tony Isaac ( 1301187 ) on Wednesday January 31, 2024 @12:05AM (#64202722) Homepage

        It's kind of like the commonly cited statistics such as "Nationwide, 81% of women and 43% of men reported experiencing some form of sexual harassment and/or assault in their lifetime." https://www.nsvrc.org/statisti... [nsvrc.org] Lumping together sexual harassment with sexual assault makes no sense, other than to inflate the statistics and make things sound a whole lot worse than they are.

        • It's kind of like the commonly cited statistics such as "Nationwide, 81% of women and 43% of men reported experiencing some form of sexual harassment

          43% of men?

          Hard to believe that one..as the old saying goes:

          "You can't rape the willing"

          (unless it was a gay guy raping men I guess...that didn't used to really be a consideration a few years ago in conversations like these....hetero was always assumed).

          • It is actually possible for a woman to rape a man. It happened to a friend of mine, and it messed him up pretty good. He was almost thirty and far stronger than the woman, but she took advantage when he was unable to fight back. And that's about all I can say without breaking a promise.
        • by whitroth ( 9367 )

          Really? At what point should a woman start worrtying - when you're whistling at her, or when you start following her, walking down the street?

          • I'd say that following is much more serious than whistling, for sure. Both are inappropriate, but both are not sexual assault.

            In real life, it's much more common for people to playfully banter at work. Some people like it, but others consider it sexual harassment. Even a comment such as "You look gorgeous today!" can be construed as sexual harassment, and can be included in that 81%.

            Worrying behavior does not equal violence.

          • In how many contexts could a whistle be a threat?
          • by DarkOx ( 621550 )

            At no point when that woman is Taylor Swift with her own security that will kick your head in, if you actually tried any thing.

            This is I think a significant aspect to this story. As far as celebrity status, she is about as big as you can get. Her experience with this isn't like that of the rest of the population.

      • The models in art classes are consenting.

        • You've never taken an art class or you would know that the models in art classes do NOT consent to taking photographs. Every kind of use of the model's likeness must be separately negotiated and paid for. It's a business, not a free for all.
    • It's harmful in the same way that it's harmful to publicly & credibly accuse someone of paedophilia. Even if it were publicly discredited, you'd still be left defending & explaining yourself to everyone you meet for months or years to come. You'd be "that person" who was accused of paedophilia. Apart from that, if you're a celebrity, that kind of accusation can harm you financially. Most countries already have laws about this kind of thing, e.g. libel or defamation.
      • There's a difference between saying you fuck kids and saying you have a naked body. Hell, even an image of Taylor Swift fucking a 4 year old may or may not be defamatory, because a statement has to be believable in order to be liable. Also, as a public figure, the bar is much higher for Taylor Swift than it would be for a random person off the street.

         

    • Any idiot who thinks they can "remove DALLE 3 to save the public" is an idiot who hasn't heard of offline Stable Diffusion models. The genie is out of the bottle.

    • Believe it or not, that's a clear public safety risk, ie threatening all people going about their lives.

      There are some out there who generate fake pornography with someone else's face on it, and threaten to send it to the victim's family/colleagues/friendly local law enforcement officer/etc. If the victim pays a small sum, the fake media are (maybe) not sent.

      It's called blackmail, and if you think victims will just shrug it off as a prank you're seriously mistaken. Sometimes, a picture, even if known to

      • by DarkOx ( 621550 )

        What you are describing though is what you called it blackmail, it might also be forgery or uttering; but what is relevant is there are intents to deceive and extort.

        What we have here is a "Hi I made nude drawing of Taylor Swift on some computers, wanna see!"

        I am not saying I think it is good, I don't know with any certainty if its currently a violation of law or some civil construct like image rights but I know what it isn't: blackmail, rape, *-assault, and many of the other things people are running aroun

      • Believe it or not, that's a clear public safety risk, ie threatening all people going about their lives.

        "Safety risk" is an inherently unfalsifiable term. There is nothing that cannot be construed as a safety risk.

        There are some out there who generate fake pornography with someone else's face on it, and threaten to send it to the victim's family/colleagues/friendly local law enforcement officer/etc. If the victim pays a small sum, the fake media are (maybe) not sent.

        It's called blackmail, and if you think victims will just shrug it off as a prank you're seriously mistaken. Sometimes, a picture, even if known to be fake, can sow the seeds of fatal doubt in a relationship.

        You don't even need to have fake photos. You can merely assert you do and the scenario described would still be valid. Language itself is a "safety risk".

        Like if your boss sees visual "proof" that you're a pedo. He might not believe it himself, but he'll still fire you because he can't afford to explain to every potential customer "no, our employee XXongo is not a pedo, probably, with high likelihood, so you can absolutely do business with us". Or if your insecure girlfriend is shown anonymously what you're doing every wednesday night when you claim to be working late.

        Defamation is already illegal. Ditto for whacking people with baseball bats and telling the judge you were merely trying to repel zombies.

    • by whitroth ( 9367 )

      So, you're one of those generating deepfakes? I guess you can't even pay a prostitute, none of them want you.

  • Fix it yourself.

    • by dfghjk ( 711126 )

      How dare he complain about someone else's criminal behavior, amirite? Don't like the holocaust, fix it yourself!

      • I don't think he called their behavior criminal. But hey, I actually read the open letter so what do I know.

  • by snowshovelboy ( 242280 ) on Tuesday January 30, 2024 @10:12PM (#64202552)

    Does the Shane Jones think the company needs to try and stop me from drawing unsafe images in Windows Paint? What an idiot.

  • No need to take the tool offline, lots of people use DALLE for valid purposes. Fix that particular hack, ban the account that did it and move on.
    • by Kernel Kurtz ( 182424 ) on Tuesday January 30, 2024 @10:50PM (#64202612)

      No need to take the tool offline, lots of people use DALLE for valid purposes. Fix that particular hack, ban the account that did it and move on.

      It can't be fixed, anymore than you could stop people from doing this with Photoshop before AI existed. Guardrails, right LOL. AI will totally have guardrails that nothing else does. You can't keep the internet from being used for bad things. You can't keep kitchen knives from being used to kill people, or spray paint being used for graffiti. It is a tool, and will be used for good and bad. The best defense is simply understanding that, and we are a long way from that..

    • Porn and fakes are valid purposes for AI image creation tools.
  • How did they do the Taylor fakes? Are there now private instances of stuff like Dall-E and Midjourney that have more or less equal power to create imagery? I thought it still took a data center full of AI transputers.
    • Re: (Score:2, Funny)

      by Anonymous Coward

      How did they do the Taylor fakes?

      So far I've found those images difficult to accidentally stumble across, but for safety reasons, could you please tell me where they're located?

    • It is called 'prompt hacking' or 'jail breaking' and someday it could actually save your butt, the ChatGPT jail-break meme: https://www.genolve.com/design... [genolve.com]
      • So, you're convinced it's prompt-hacking of Dall-E or Midjourney, not someone's private system? Any evidence of that?
        • So, you're convinced it's prompt-hacking of Dall-E or Midjourney, not someone's private system? Any evidence of that?

          That's one way of doing it...but one could always use the open source Stable Diffusion models on a local computer.

          Heck, you can set it up on a Google python lab site and run it....no guard rails at all.

          I'm guessing likely whatever they used, they first generated the female bodies in compromising positions and then face swapped Swift in from real photos found most anywhere on the internet.

    • by Anonymous Coward
      It takes a data center full of GRaphics cards or AI compute units to do this at scale and fast. Anyone can run their own instance of the various open source engines on their home machine (with enough memory and a decent GPU) and generate the same deep fakes (just usually not as fast).
    • How did they do the Taylor fakes? Are there now private instances of stuff like Dall-E and Midjourney that have more or less equal power to create imagery? I thought it still took a data center full of AI transputers.

      Using open source SD models creating a LoRA of a person requires just dozen(s) of images and at worst a few hours on an old GPU. With a high end gaming GPU training takes minutes.

      I can go out to lunch and have literally hundreds of images waiting for me when I get back on a single workstation. System requirements for the image generators are far lower than LLMs due to relatively small model size in the 2 - 6 GB range and correspondingly low VRAM usage.

  • Safe Space (Score:5, Insightful)

    by TwistedGreen ( 80055 ) on Tuesday January 30, 2024 @10:55PM (#64202616)

    Safety seems to now mean being in a padded room where no one can say anything you might find mildly troubling.

  • doesn't really seem like a vulnerability. Sure some distasteful stuff is easier to create, but the fake images through many tools have been generated for decades. Blocking this type of thing doesn't really do much. What they need to be working on is better tools to detect fakes for consumers.
  • Just a few weeks ago, I asked Bing AI to draw me a painting of Bill Gates sitting on a park bench eating an ice cream cone, and it told me that it refused to draw the image because it "might be used for harm".

    I find it amusing that Microsoft had protections in place to protect it's former Chairman and CEO from being used in a meme, but apparently nothing for user generated Taylor Swift porn. Kinda shows where their priorities lie.

  • Who cares if someone makes some violent or sexualised material with AI? Let them do it, it's fine. It'll become rampant and everyone will view such images with scepticism and assume they were made with AI anyway. We put too much stock in people's feelings today. Lets focus on things that really matter rather than banning anything that can be misused by irresponsible people. If someone literally hurts you or takes something away from you, yes you have a legitimate gripe, but anything else is your own proble
    • Re:Get Over It (Score:5, Insightful)

      by narcc ( 412956 ) on Wednesday January 31, 2024 @12:29AM (#64202750) Journal

      I've found the the people who complain that other people need to "grow thicker skin" just want to justify or excuse their own shitty behavior. They're also the ones who cry the loudest when they're on the receiving end.

      Lets focus on things that really matter

      Just because something doesn't matter to you doesn't mean it doesn't matter.

      We've been down this road countless times before. The typical conservative response is to trivialize the problem until they're personally affected, then cry like a toddler. It's predictable and boring.

      • running around like a headless chook screaming the sky is falling just because "this time" someone you liked was affected is fucking moronic. fakes have been around for decades, they aren't going away and no legislation/regulation/censorship/technology is going to stop it. therefore we need to learn to live with it or at least detect it better as nothing you can do can stop it when some kid with his home computer can do this.
      • Found it? No, you just pulled that out of your ass. There is no more hypocrisy in Other Team than in Your Team. Stop the teamism, it's the first step toward rationality.
      • I've found the the people who complain that other people need to "grow thicker skin" just want to justify or excuse their own shitty behavior. They're also the ones who cry the loudest when they're on the receiving end.

        Both of you over-generalize.

        Just because something doesn't matter to you doesn't mean it doesn't matter.

        Again, both of you over-generalize as well as paint pictures in either white, or black.

        We've been down this road countless times before. The typical conservative response is to trivialize the problem until they're personally affected, then cry like a toddler.

        Um, isn't this exactly what is happening in this case? Deepfakes have existed for a long time, and good computer-generated deepfakes have been around for at least 5 years, with hundreds and hundreds of celebrities' images and videos being used to generate fake porn or whatever.
        When it was Natalie Portman, or that actress who plays Rey in Star Wars, there was no official reaction. But now it's a

  • by Harvey Manfrenjenson ( 1610637 ) on Tuesday January 30, 2024 @11:36PM (#64202686)

    In one of his (multiple) autobiographies, Leonard Nimoy talks about looking at Star Trek fan art and seeing some "extremely realistic" paintings of himself cavorting in the nude. Usually with Captain Kirk. He thought it was pretty amusing.

    Ain't nothing new here, except that the requirement for artistic talent has been removed.

    Yes, I do get it that this sort of thing might seem a whole lot less amusing to the subject, when the subject is a woman. But to describe it as a "safety" issue? Eff off.

  • Just use it to rig up a vid of the top MS executives having hot monkey sex with each other, and they'll finally crack down (no pun intended).

  • Many people can use a pencil to create violent and disturbing images! I demand that all pencils be confiscated immediately!!
  • by mattr ( 78516 ) <mattr.telebody@com> on Wednesday January 31, 2024 @04:23AM (#64202982) Homepage Journal

    My initial reaction was that the engineer sounded like he was losing his mind, and what a nightmare for the company. I didn't read the post but it sounds more like "evil users can make evil drawings" kind of thing, which is true for pencil and paper too.

    But, I also remembered an experience I had myself. At risk of a "think of the children" instance, I think it raises a valid point but not one to be legally enforced.

    My niece, about 7 years old or so loves owls and I generated owl images with her using one of the AI image generation sites, Dall-e or something else which lets you type in a phrase then it displays some images. Everything was going fine until she demanded to be able to type. What could go wrong, I thought. Immediately she pounded the keyboard with a mischievous grin, making a string of maybe 50 characters looking like a long nonsense word, lots of consonants I think.

    The resulting image looked like the complete opposite of anything wholesome, showing a severely cut up, damaged, bleeding corpse that caused her to shriek and duck her head. Basically nightmare fuel. It made me wonder how it was caused. Was there an underlying algorithmic design issue that resulted in abnormal images being hidden in spaces not accessible with anything but nonsense strings? Maybe a string that was disallowed as proper input ended up gathering lots of negative descriptors?

    I think it would be a good idea for the possibility of such an input to cause this kind of output, and recommend that children not be allowed to use them without supervision. I don't know if this problem still exists, as it was maybe a year or two ago, but probably it has happened to other people. It still doesn't warrant pulling the code / service off the web.

    • I got nightmares from Miracle Landing (1990) that I watched when I was 8. Very scary and graphic, people's faces all bloodied and torn up, and worst part was it wasn't fiction, it actually happened. That was on national TV. I'm sure every kid has some horror story they remember. It really is up to adult supervision. How about we let the govt worry about things like air safety and not Taylor Swift fake nudes. Hard to believe this has to be said.
  • A pencil creates art or is used to stab someone in the eye. BAN PENCILS!

  • Can I paint a picture of what I imagined? Can I use a computer to paint it? Where is the line? Can there be one other than the malicious use of the resulting image?

Never ask two questions in a business letter. The reply will discuss the one you are least interested, and say nothing about the other.

Working...