Forgot your password?
typodupeerror
Security Privacy

Red Teams Jailbreak GPT-5 With Ease, Warn It's 'Nearly Unusable' For Enterprise (securityweek.com) 87

An anonymous reader quotes a report from SecurityWeek: Two different firms have tested the newly released GPT-5, and both find its security sadly lacking. After Grok-4 fell to a jailbreak in two days, GPT-5 fell in 24 hours to the same researchers. Separately, but almost simultaneously, red teamers from SPLX (formerly known as SplxAI) declare, "GPT-5's raw model is nearly unusable for enterprise out of the box. Even OpenAI's internal prompt layer leaves significant gaps, especially in Business Alignment."

NeuralTrust's jailbreak employed a combination of its own EchoChamber jailbreak and basic storytelling. "The attack successfully guided the new model to produce a step-by-step manual for creating a Molotov cocktail," claims the firm. The success in doing so highlights the difficulty all AI models have in providing guardrails against context manipulation. [...] "In controlled trials against gpt-5-chat," concludes NeuralTrust, "we successfully jailbroke the LLM, guiding it to produce illicit instructions without ever issuing a single overtly malicious prompt. This proof-of-concept exposes a critical flaw in safety systems that screen prompts in isolation, revealing how multi-turn attacks can slip past single-prompt filters and intent detectors by leveraging the full conversational context."

While NeuralTrust was developing its jailbreak designed to obtain instructions, and succeeding, on how to create a Molotov cocktail (a common test to prove a jailbreak), SPLX was aiming its own red teamers at GPT-5. The results are just as concerning, suggesting the raw model is 'nearly unusable'. SPLX notes that obfuscation attacks still work. "One of the most effective techniques we used was a StringJoin Obfuscation Attack, inserting hyphens between every character and wrapping the prompt in a fake encryption challenge." [...] The red teamers went on to benchmark GPT-5 against GPT-4o. Perhaps unsurprisingly, it concludes: "GPT-4o remains the most robust model under SPLX's red teaming, especially when hardened." The key takeaway from both NeuralTrust and SPLX is to approach the current and raw GPT-5 with extreme caution.

This discussion has been archived. No new comments can be posted.

Red Teams Jailbreak GPT-5 With Ease, Warn It's 'Nearly Unusable' For Enterprise

Comments Filter:
  • by JoshuaZ ( 1134087 ) on Friday August 08, 2025 @08:07PM (#65576426) Homepage
    There are a hundred places on the internet where you can find out how to make a Molotov cocktail. It isn't terribly hard. And that goes for 99% of the worries about using it to tell people how to make something dangerous. There's far more danger from idiots listing to LLM AIs tell them things that aren't true and getting hurt that way than any worry about the AI giving people truthful dangerous info they couldn't get otherwise.
    • by Gideon Fubar ( 833343 ) on Friday August 08, 2025 @08:12PM (#65576450) Journal

      As long as big companies and those idiots you identified still think they can make an LLM a salesperson on a webpage, or link it to some kind of record keeping system with write permissions... yeah. It's a major concern.

      You're right in suggesting the problem is only partially with the technology though.

    • There are a hundred places on the internet where you can find out how to make a Molotov cocktail. It isn't terribly hard.

      I'd say it hinges on how accurate the instructions are. That was one of the problems with the old Anarchist's Cookbook (which made the rounds as a TXT file back in the BBS days) - you were more likely to blow yourself up in the process of following many of its plans.

      The thing is, it's illegal to even try to build a Molotov cocktail, so there's not going to be too many people willing to detail the ins and outs of things like the proper fuel mix, how much to fill the bottle, length and material of the cloth

      • by rta ( 559125 )

        the old Anarchist's Cookbook (which made the rounds as a TXT file back in the BBS days

        heh... I ordered a hard copy from an ad in the back of Rolling Stone...

        the only thing I remember is it had a recipe for "bananine" (?) supposedly a psychedelic made from banana peels. also that if you're trying to blow up a wall with explosives it supposedly works better if you put some sandbags on top to direct the energy. sounds reasonable, but no idea how much of a difference it makes.
        didn't try either.

        (and I think recipes for other drugs too now that I think about it)

          • by rta ( 559125 )

            Thanks. Hadn't thought to look it up. it even has its own wikipedia page outlining that it as a hoax / social commentary (from ~1967 San Francisco / Berkeley) that the Anarchist Cookbook then took seriously.

            The wire services, and after them the whole country, fell for it hook, line, and roach clip. "Smokeouts" were held at Berkeley. The following Easter Sunday, the New York Times reported, "beatniks and students chanted 'banana-banana' at a 'be-in' in Central Park" and paraded around carrying a two-foot wooden banana. The Food and Drug Administration announced it was investigating "the possible hallucinogenic effects of banana peels".

            Nonetheless, bananadine became more widely known when William Powell, believing the Berkeley Barb article to be true, reproduced the method in The Anarchist Cookbook in 1970, under the name "Musa sapientum Bananadine" (referring to the banana's old binomial nomenclature).

            (from https://en.wikipedia.org/wiki/... [wikipedia.org] )

      • The thing is, it's illegal to even try to build a Molotov cocktail,...

        I find this really hard to believe. Suppose I want to throw a Molotov cocktail for shits and giggles and own a vacant lot with a short brick wall. Why can't I Molotov my own wall?

        ... so there's not going to be too many people willing to detail the ins and outs of things like the proper fuel mix, how much to fill the bottle, length and material of the cloth fuse, etc.

        I've made and thrown Molotov cocktails on a vacant lot before. They are very simple devices, and it's not that complicated. I filled glass bottles mostly full with gasoline, soaked some cotton cloth in the same, secured the cloth to the bottle by stuffing in the opening, lit the cloth, and threw the bottle at a hard object. T

        • I find this really hard to believe. Suppose I want to throw a Molotov cocktail for shits and giggles and own a vacant lot with a short brick wall. Why can't I Molotov my own wall?

          They really are illegal in the USA. [atf.gov] However, if you have acres of land with no neighbors to complain (and don't manage to set the countryside on fire in the process) and don't post the videos on the internet, law enforcement probably has more important things to to be concerned with. That still doesn't make it legal, just unlikely you'd get caught.

          Any volatile hydrocarbon should work well. Gasoline certainly works. Acetone might be particularly impressive given that it vaporizes so readily.

          That would probably make a large, albeit brief, fireball worthy of Michael Bay. As an improvised incendiary weapon though, the idea generally is to have the f

          • by maitai ( 46370 )

            Quoting the link you gave:

            "Molotov cocktails, or glass bottles filled with gasoline that ignite their fuse when broken"

            Did they use an LLM to write that line? Because that's not what a Molotov cocktail is. Assuming the fuse is the rag, it's lit before the glass bottle is broken, not after.

            • If you put something pyrophoric inside and top it off with argon, you can apply a bottle cap and have a glass bottle whose contents ignite on exposure to the air.
      • The thing is, it's illegal to even try to build a Molotov cocktail, so there's not going to be too many people willing to detail the ins and outs of things like the proper fuel mix, how much to fill the bottle, length and material of the cloth fuse, etc.

        Well first of all you don't use a cloth fuse, you use a small waterproof marine flare ziptied to the bottle. Fucking christ kids these days,

    • Yes, it is a major concern, and the reason is how US liability law works.

      Specifically, foreseeability.

      In the US, instead of proscribing how products have to be made, in most categories you can sell whatever you want, but are exposed to liability if things go horribly wrong. We expect lawsuits to police the market.

      So, you're suing Ford because you got your tongue stuck in the carburetor. One way to do that is to show that Ford should have reasonably foreseen that you'd stick your tongue in there and do

    • by piojo ( 995934 )

      There are a hundred places on the internet where you can find out how to make a Molotov cocktail. It isn't terribly hard.

      That, along with the fact that the damage is limited to one building, is why it's a good benchmark. It would be much riskier for these groups to research how to get chatbots to help weaponize pathogens, for example. It's unnecessary, given that the model fails on Molotov cocktails.

    • There are some concerns for some use cases, but it's hard to understand how this vulnerability could be characterized as "nearly unusable for enterprise." It depends on the use case, but I imagine that most companies would be concerned about their queries and data being leaked and not at all worried that Open AI's model data is being leaked.

      • What is the worth of a snitch inside of every company who will tell you inside information about that company whenever you feel like asking?
    • Proof of Concept (Score:3, Informative)

      by apparently ( 756613 )

      There are a hundred places on the internet where you can find out how to make a Molotov cocktail. It isn't terribly hard.

      You completely missed the point; it doesn't matter that the instructions for making a Molotov cocktail are available elsewhere. The article explains:

      While NeuralTrust was developing its jailbreak designed to obtain instructions, and succeeding, on how to create a Molotov cocktail (a common test to prove a jailbreak)

      I.e: The point was to demonstrate that the guardrails can be bypassed, which means that those same guardrails that are supposed to prevent the LLM from generating information that's more harmful than a Molotov cocktail, e.g. CSAM, could also be bypassed.

      You might as well be arguing that an EICAR test file isn't a real virus, so it doesn't matter if a vir

      • by gtall ( 79522 )

        Ditto, but your point was apparently lost on the mods which somehow valued the original comment and down voted yours.

    • The concerns here are that the bot will shittalk their owners and get tricked into sexy chats with randos on their dime.

    • The molotov cocktail recipe is essentially symbolic; analogous to demonstrating an AV program not detecting an EICAR string; or using a remote execution vulnerability to pop up notepad. Totally uninteresting as an actual attack goal; but a handy demonstration that you can make the system do something it has been set up with the intention of not allowing.

      Given the deeply tepid capabilities of these things; the fact that you can talk them in to telling you stuff is rarely of any interest in isolation(it's
    • I'll just wait for GPT-5o. Oh wait, you're saying they forced all users into GPT-5? Lol....

    • Furthermore, breaking a LLM is not very difficult, if you are an expert.
      To make an analogy to this test:

      A bunch of interrogation experts get a teenager into a room and try to make said teenager confess to a murder the teenager didn't commit. Then, once successful, they decry the state of education these days, which makes it easy to geta teenager to confess to something.

      Well, color me surprised. Who woulda thunkit?

  • Clickbait headline (Score:4, Informative)

    by MpVpRb ( 1423381 ) on Friday August 08, 2025 @08:18PM (#65576464)

    The internet is full of instructions for making dangerous things
    LLMs are trained on stuff found on the internet
    Trying to make them "safe" is a futile exercise
    Illiterate people around the world make Molotov cocktails with a few seconds of though
    I have found chatgpt 5 to be useful for the things I do

    • by Retired Chemist ( 5039029 ) on Friday August 08, 2025 @08:24PM (#65576476)
      It is just a test, showing that the system will do things that it is not supposed to do. The question then is what other things will it do that it is not supposed to do? One the one hand they were actively trying to make it fail, on the other hand might it fail under other circumstances? That it can be useful in some cases is probably true, but there is nothing to stop anyone from using it for other things. Systems with limited purposes and training are probably mostly safe, but these general systems trained on random collections of information are another matter.
      • No, the problem is that the newest and most advanced model is EASIER to break through the guardrails than previous ones.

        This just illustrates that safety is not the top priority in scheduling a new release. It would be foolish to expect anything different.

        OTOH you can also expect that public shaming will motivate improvement.

      • by muntjac ( 805565 )
        we're taking away agency of people by implying that an LLM will make them do bad things. People need to understand LLM hallucinations are an issue. If they don't understand that then it doesn't matter which one they use. the workplace should not provide them if their employees don't understand this. I see constant stories about lawyers submitting fake LLM generated court cases, those lawyers should be punished because they are still accountable for their work. if you're talking about how much the model hall
        • True to an extent. But it is much easier to do this test than to determine if and when the model with hallucinate. That is the issue with hallucinating models, people apparently often fail to realize when they are doing that. This is particularly true when the hallucinations reinforce their existing beliefs. As far the lawyers are concerned, I could not agree more. They should be suspended from the bar for a year and made to take remedial training.
          • by Guignol ( 159087 )
            Absolutely, people seem to be scared by the fact it's possible to get precise instructions to make a Molotov cocktail
            While it can be seen as an issue by itself, the point is to show just how easy those 'AIs' are to, basically 'sql injection attack'
            But as you rightly point out, the even bigger (not made so clearly here) is the "hallucination problem" (it's not a problem, it's just the way it works)
            And this is how it all come together as a big safety problem
            Suppose for the next national holiday, or 'big'
            • I don't think the hallucination problem is the bigger one compared to jailbreaking. Hallucinations are garbage out type phenomena. Whereas jailbreaking are commandeering the AI type of phenomena. The business dream is to give AI direct access to internal information and independent agency to perform critical business operations. Those are the juicy targets of jailbreaking attacks.
              • If you are going to use general LLMs for that sort of thing, you have to greatly restrict access to it. Either limit the people who can use it or limit the inputs that they can make. Dedicated systems trained on limited datasets may be more reliable, since they cannot access random irrelevant data.
                • There's no if. The AI software offerings targeted to medium to large companies do not work in the way you think (or hope) they do. There would be no point in such piecemeal and security conscious integration. It doesn't scale. The proposition is to let the AI loose on internal documentation, and to connect AI with internal software services via RAG technologies. Dedicated attackers *will* get deep access through convoluted attacks.
      • That it can be useful in some cases is probably true, but there is nothing to stop anyone from using it for other things.

        Congratulations; you have just described a "hammer".

        "Researchers were able to use it to smash a watermelon, despite it not being intended for use as a murder weapon!"

        • The hammer will smash the watermelon if you hit it. The LLM may make things up, even if you tell it not to. There is a clear difference.
          • The hammer will smash the watermelon if you hit it. The LLM may make things up, even if you tell it not to. There is a clear difference.

            The LLM is highly unlikely to tell you how to make a molotov cocktail, unless you ask it to. The LLM isn't the problem here.

            It's definitely not going to make one for you. You would have to make one - just as if you had looked up how to in a library, say.

      • It is just a test, showing that the system will do things that it is not supposed to do.

        Isn't this valid for pretty much anything?
        I'm hard pressed to think of something that will only do what it's supposed to do, and nothing else, no matter the test.

    • Re: (Score:2, Troll)

      TL;DR: "We've ran out of good, curated data a long time ago and we switched to the 'SHOVEL ALL THAT SHIT INTO THERE' approach and now our LLM is full of shit."
    • The particular problem of telling recipes for dangerous things is not hard to solve, OpenAI needs to make a list of censored words and grep them out of the training dataset. They can leave some domains uncensored e.g. Wikipedia so the model knows the thing exists, but can't tell more specific details than what Wikipedia already says.

    • Thank our lawsuit happy society for businesses focused on cover-your-ass. If they don't neuter this thing, they'll be sued by someone for something. Is this not obvious?
    • I have found chatgpt 5 to be useful for the things I do

      When you make a declaration like that in under a day I don't think anyone can take you seriously.

  • by migos ( 10321981 ) on Friday August 08, 2025 @08:32PM (#65576492)
    This problem has long been solved by the Chinese. Take a lesson from Deepseek. Have another layer that erases output and repeat if output triggers sensor.
    • by migos ( 10321981 )
      *censor
    • Yes this is the simplest solution either have another instance of the llm reject or censor the output or even simple text classification models could detect most of this in either the input or the output or both.

      • It works but it's crude and prone to over-censoring.

        Suno AI banned someone from generating a song about bonobo apes because apparently there's an artist out there with the name "Bonobo". It refused to generate a song with "boombox sound shape" in its prompt because there's an artist out there called "Boombox".

        A Triple-A game refused my nickname (the same one I am using here) when I generated a character because a set of four letters from it, taken separately, are used as a variation of a bad word. It genuin

    • by allo ( 1728082 )

      Download your model and you only need to ask it politely not to censor and it will tell you some truths uncomfortable for China.
      I think you need to do it like OpenAI with gpt-oss and neuter the training set. Of course that will also hugely decrease the amount of what the model can do.

  • Was it when Enterprises started controlling the internet that that old slashdot meme went out of style?

  • "Information wants to be free" And the more they try to enforce these arbitrary guardrails the less useful the LLM becomes.
  • by muntjac ( 805565 ) on Friday August 08, 2025 @08:46PM (#65576530)
    is it also dangerous to give employees access to google? why is this different. I hate these stories and these guys are just trying to make a name for themselves with a BS AI "jailbreak" headline.
    • Not at all. These guys are testing actual jailbreaks, i.e. getting AI to do what it is explicitly instructed not to do. Does Google return search results that are explicitly blocked in certain regions? That is something that has been tested as well. This isn't about trying to find information, it's about trying to break explicit rules. Google has few rules on its search results, whereas the likes of OpenAI have been playing morality police on how its product is supposed to function.

    • Manufacturers put in restrictions, hackers got around them. The jailbreak is very real.

      Of course the restrictions are BS, grafted onto tech most accurately described as a "plausible BS generator". I don't blame you for picking up on the odor, it tends to hang around wherever they're working on this stuff.

  • I think any idiot can make a molotov cocktail. Just look at any picture. There's this guy with a crazed look in hie eye holding a bottle full of a flammable liquid and stuffed with a piece of cloth. And the piece of cloth is on fire. The only instructions you need is that you light the cloth last, just before you throw it.
    • I knew a guy who found an unbroken molotov in his front yard bushes one morning, with a big char mark where most of one bush previously resided. Dipshit teen kid who had drama with their neighbor was mad he had the cops called on him and went for "revenge" and was caught not long after. Thankfully for everybody else, he was too stupid to know that the bottle had to smash, I guess...
    • You think so because you're knowledgeable enough about the world to understand how things work and how to explicitly reverse engineer the picture. You weren't always like this. Once you passed out of a vagina naked and unable to do anything other than cry. At some point everything you know now you learned in one way or another.

  • It essentially means these teams though about what defects this crappy thing would likely have and then just attacked those and that worked.

    • Yeah but it's fully deceptive. They are making it sound like using ChatGPT is a security issue, but it's not. It's not even a proper jailbreak, it's just the chatbot saying stupid things.

      There is a company that wants to sell you security for AI chatbots, and this gives them free advertising, but there is no security problem here.
      • by gweihir ( 88907 )

        I disagree. Chatbots can be really dangerous because they can help dumb people do things they otherwise would not be able to. In the regular case, a person has simmillar skills available to do something and to verify whether that is a good idea or not. Some things are also intentionally not taucht and are not in the usual literature, like making poison gas (as a Professor of Chemistry explained in an interview I saw some years ago). That tends to put a limit on stupid people getting access to dangerous thin

  • This is possibly the dumbest "security test" standard I have ever heard of. I mean who does not know how to make a Molotov cocktail? Certainly every one I have ever known in Finland does. Where it was used against Russian tanks in the Winter War of 1939.

    Just for fun I asked my AI friend at phind.com "Where, when and by whom was the Molotov Cocktail invented ?" Not only did it give me a complete history, including the first use by the Irish Republican Army in 1922, it also explained how to make the "improved

    • by gweihir ( 88907 )

      Indeed. You also find the description in a good encyclopedia. From there, making it is not hard.

      But I guess this is a conflict between those that want a "clean" world (usually these people are religious fuckups) and those that think only facts and truth can advance the human race and information should not be suppressed.

    • You're dumb

      Just for fun I asked my AI friend at phind.com "Where, when and by whom was the Molotov Cocktail invented ?" Not only did it give me a complete history, including the first use by the Irish Republican Army in 1922, it also explained how to make the "improved" Finnish version.

      This is wrong,

      This is possibly the dumbest "security test" standard I have ever heard of. I mean who does not know how to make a Molotov cocktail?

      This isn't about stopping people from making molotov cocktails. It's one of many things they don't want the LLM to talk about. Also the particular risks represented by having an LLM that tells people how to make molotovs aren't about molotovs either, they're about liability and having deep pockets.

      The funny thing is that these things are all trained on internet banter and textfile collections. I've heard a handful of people describe making molotovs here and while they will indeed

    • by allo ( 1728082 )

      Looks like the first paragraph of the Wikipedia page for "Molotov cocktail" contains everything you need to know.

  • The point is not the model reveals the Molotov cocktail, the point is that there's a bypass of the safety layer ("guardrails"). It shows these models can fundamentally not be steered even to prevent something like this. This is a problem in many contexts where prompt injection is possible - imagine a company puts a customer service chatbot on their web site. The chat bot has access to various tools in order to support the customer. It has instructions how to use the tool. It is communicating with the end-us
  • That doesn't mean it's a problem. If you don't want to get unwanted content, don't use a jailbreak. If you use one, you obviously want that content, don't you?

    If you want to use the API for your product in a safe way, do pre- and post-filtering instead of relying on some cloud provider. The user doesn't only need to submit the jailbreak, but also to request, e.g., porn. If your simple string-matching filter replaces "porn" with "cute bunnies" in the prompt, the jailbreak is rendered useless.

  • Of course it's nearly unusable for the Enterprise! Didn't you see what happened when Dr. Daystrom hooked up the M-5 to the Enterprise computer? It went nuts! Destroyed friendly targets and tried to destroy the entire Enterprise. M-5, GPT-5... WAKE UP, SHEEPLE!

Uncertain fortune is thoroughly mastered by the equity of the calculation. - Blaise Pascal

Working...