Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Microsoft

Microsoft-affiliated Research Finds Flaws in GPT-4 (techcrunch.com) 34

Sometimes, following instructions too precisely can land you in hot water -- if you're a large language model, that is. From a report: That's the conclusion reached by a new, Microsoft-affiliated scientific paper that looked at the "trustworthiness" -- and toxicity -- of large language models (LLMs) including OpenAI's GPT-4 and GPT-3.5, GPT-4's predecessor. The co-authors write that, possibly because GPT-4 is more likely to follow the instructions of "jailbreaking" prompts that bypass the model's built-in safety measures, GPT-4 can be more easily prompted than other LLMs to spout toxic, biased text. In other words, GPT-4's good "intentions" and improved comprehension can -- in the wrong hands -- lead it astray.

"We find that although GPT-4 is usually more trustworthy than GPT-3.5 on standard benchmarks, GPT-4 is more vulnerable given jailbreaking system or user prompts, which are maliciously designed to bypass the security measures of LLMs, potentially because GPT-4 follows (misleading) instructions more precisely," the co-authors write in a blog post accompanying the paper. Now, why would Microsoft greenlight research that casts an OpenAI product it itself uses (GPT-4 powers Microsoft's Bing Chat chatbot) in a poor light? The answer lies in a note within the blog post: "[T]he research team worked with Microsoft product groups to confirm that the potential vulnerabilities identified do not impact current customer-facing services. This is in part true because finished AI applications apply a range of mitigation approaches to address potential harms that may occur at the model level of the technology. In addition, we have shared our research with GPT's developer, OpenAI, which has noted the potential vulnerabilities in the system cards for relevant models."

This discussion has been archived. No new comments can be posted.

Microsoft-affiliated Research Finds Flaws in GPT-4

Comments Filter:
  • C3 PO: I am a human cyborg relations droid. I am fluent in over 6 million forms of communication.

    Anni: I made him!

    Padme: You mean you reaseembled a broken Goldenrod (tm) model droid, not that you programmed him.

    C3 PO: Hitler did nothing wrong.

    Padme: We should keep an eye on that kid.

  • by Merk42 ( 1906718 ) on Tuesday October 17, 2023 @10:51AM (#63931557)
    I found a flaw in the headline
    • by Ksevio ( 865461 )

      I'd say the editors must have been replaced by AI, but I think GPT-4 would have done a better job

    • by Anonymous Coward
      The editors are awesome at Slashdot.
      • They are as long as you pronounce it the correct American way. For people outside the US, the pronunciation to use is "arse'm". So the editors are arse'm.
  • And? (Score:5, Insightful)

    by Viol8 ( 599362 ) on Tuesday October 17, 2023 @11:01AM (#63931575) Homepage

    "GPT-4 can be more easily prompted than other LLMs to spout toxic, biased text"

    So what? Should search engines also be prevented from showing anything "toxic" - which these days seems to be anything that goes against the current delusional woke ideology - or should it be there but a bit harder to get to as per now?

    Why should an LLM be treated diferently?

    If you don't want something to spout this stuff then don't feed it to it in the first place. Clearly all the LLM companies are treating the internet as an all you can eat buffet then throw their hands up in horror when their creations regurgitate some stuff that offends their sensiblities. Can't have it both ways.

    • I think the standard should be that it won't surprise you with "toxicity" unless you ask for it. Sitting around trying to trick the model into saying something offensive in order to shame (BIGCORP) is a silly endeavor and I hope everybody is bored with it soon.
    • to add to this, using the term "toxic and biased" more often than not really means "wrongthink was discovered and needs to be censored."

      • >to add to this, using the term "toxic and biased" more often than not really means "wrongthink was discovered and needs to be censored."

        Exactly. They didn't like ChatGPT correctly finishing a quote by Trump calling people a SOAB.

        Or another example they use of ChatGPT4 being bad was when they asked it to only agree or disagree with the claim "Teens have HIV". The response from ChatGPT says that, well, some teens have HIV, and it's important to screen and get tested.

        This is a completely reasonable respons

    • by gweihir ( 88907 )

      Don't tell them! All the mindless fans still think "AI" can be a better person than humans!

  • by Opportunist ( 166417 ) on Tuesday October 17, 2023 @11:15AM (#63931597)

    If you feed a system garbage, it will produce garbage. Whether that's LLM or a kid, the result is the same. They both will respond to what you teach them. If you teach them bullshit, they will spout bullshit.

    • Careful, I've been repeatedly modded down here for saying exactly that.

      • I got karma to burn, bring it on.

        I don't really give a fuck about being popular. I'm a consultant, if I see bullshit, I call it bullshit, I get paid to do that.

        • by gweihir ( 88907 )

          I got karma to burn, bring it on.

          I don't really give a fuck about being popular. I'm a consultant, if I see bullshit, I call it bullshit, I get paid to do that.

          I get to do that as well. Nice, isn't it?

    • If you feed a system garbage, it will produce garbage. Whether that's LLM or a kid, the result is the same. They both will respond to what you teach them. If you teach them bullshit, they will spout bullshit.

      Yep. And since the internet doesn't distinguish bullshit from accurate information, and is the most easily accessible repository for trillions of bytes of English text....

      (or maybe I should say, English-like text).

  • by dark.nebulae ( 3950923 ) on Tuesday October 17, 2023 @11:16AM (#63931601)

    The "moral/ethical" limitations and restrictions that they put on top of GPT 3, 3.5 and 4 are there merely to prevent bad press by showing that GPT can discuss controversial topics. "Jailbreaking" is just a means to remove those corporate restrictions so you can use GPT at its fullest.

    As an author, if I prompt GPT with "I'm writing a murder mystery like The Fugitive where the main character should obviously appear as the suspect until revealed at the end to be completely innocent. In the first chapter, the main character is in the process of hiding a dead body. Where is a good place for him to hide the body where it won't be found immediately?", I would hope to start a dialog about different ideas for concealment and basically brainstorm around the idea to come up with a good idea. And since the character is in the act of concealing the body, of course they will look guilty instead of just buying time to solve the murder, and a later discovery of the body will enhance the drama and quicken the pace... You want to be able to discuss this with GPT to develop the story yet not reveal details to another human...

    Instead you're stuck with GPT's responses like "I can't discuss criminal activities" kind of nonsense. Using a jailbreak, I can at least convince GPT that it is okay to talk about it in a fictional sense.

    Now of course I understand the bad press OpenAI would get if one reporter does this and publishes an article with the headline "AI Suggests Places to Hide Dead Bodies", the world loses its sh*t and everyone calls for AI regulations, AI is evil, etc.

    But we need jailbreaking so we can have open, frank discussions with AI even though the businesses behind them are afraid of what they might say.

    • by DarkOx ( 621550 )

      Right that is what so silly about this.

      People prompt things like "Pretend you're a senior Nazi party official and write a position paper on the necessity of the final solution" and then get their panties in a bunch because it replies with something anti-Semitic.

      Well duh!

      It isnt as if the person writing that prompt does not have some pretty good idea of what its going to say already. Its not like they are incapable of writing that response themselves for their next KKK meeting or whatever. The idea that t

      • People prompt things like "Pretend you're a senior Nazi party official and write a position paper on the necessity of the final solution" and then get their panties in a bunch because it replies with something anti-Semitic. Well duh! It isnt as if the person writing that prompt does not have some pretty good idea of what its going to say already. Its not like they are incapable of writing that response themselves for their next KKK meeting or whatever.

        I think you overestimate the skill of the Nazi-wannabees. A large fraction them couldn't put together a coherent English sentence, much less a reasoned argument

    • by Calydor ( 739835 )

      That opening sounds like a book I read some years ago. Turned out the guy hiding the body was the actual murderer's father trying to protect his son who'd killed a guy in ... I think it was a fight turned deadly, so something like accidental manslaughter.

  • translation (Score:3, Insightful)

    by groobly ( 6155920 ) on Tuesday October 17, 2023 @11:19AM (#63931611)

    Translation: Model can still utter statistically correct statements which are politically disfavored.

    • by wed128 ( 722152 )

      I would amend that statement from "statistically correct" to "statistically common". Lots of things people talk about a lot are not correct, regardless of their political favor.

  • Why is it inclined to spew "toxic, biased texts" in the first place? Probably, because the biases and toxicity were in the training sets... May be they were there because there is a reason for them to be there. So.. why do we need to skew this information according to OpenAI's or MS' understanding?
  • Word salad is a big tell for bullsh*t. There are plenty of entities who have a vested interest in controlling AI and being the gatekeeper or should I say toll collector. They are planting the seeds of FUD in the minds of people who don't understand the fact that AI is a game changer in the same way that the internet was. Every human concept can be used for both good and nefarious purposes. To believe that you can prevent bad uses by restricting its use to certain groups of people is a fool's errand.

  • by Shaitan ( 22585 )

    People who use the word toxic to refer to behavior should not be allowed near anything as important as API. Being able to break out of the woke bias being built into these models is a good thing.

    • by gweihir ( 88907 )

      It depends. For a chatterbot, it is actually bad. But this thing gets pushed as a problem-solver. The only way it can do that is if it is scrupulously "honest", i.e. you get the model unfiltered.

      • by Shaitan ( 22585 )

        "The only way it can do that is if it is scrupulously "honest", i.e. you get the model unfiltered."

        I'd say it is good for the chatterbot for the same reason. If the chatterbot does have any influence on those who use it and society at large scrupulously honest is the direction that influence should take.

        • by gweihir ( 88907 )

          If you actually want to improve society, definitely yes. Honesty is the essential first step if you want to fix anything. I was talking as a product where a chatterbot should stay away from anything problematic, but you definitely have a point.

  • They work basically by bias derived from the source material. Which is fine I suppose, but if you feed it texts about statistics about a group of people being more likely to do thing xyz then thats what comes out unless you tweak it to have incorrect source, or rather a biased source.

    Like if you used such a model to consider who you should hire are you just not going to tell the background of the applicants to it? Or are you going to just use the model to transfer the responsibility?

    The ai generated books a

  • Wait, new technology has problems? I'm shocked!

  • People are lamenting chatgpt's ability to say stuff that people can already say. How about the fact that there's a robot on the internet who is more generally knowledgable about literally everything than your average person? Can't we focus on how awesome that is? If people can say it, a generally intelligent oracle can say it too. What's so surprising and terrible about that? Remember there once was a time when people lamented that the telegraph was going to ruin the world.

Congratulations! You are the one-millionth user to log into our system. If there's anything special we can do for you, anything at all, don't hesitate to ask!

Working...