Red Teams Jailbreak GPT-5 With Ease, Warn It's 'Nearly Unusable' For Enterprise (securityweek.com) 87
An anonymous reader quotes a report from SecurityWeek: Two different firms have tested the newly released GPT-5, and both find its security sadly lacking. After Grok-4 fell to a jailbreak in two days, GPT-5 fell in 24 hours to the same researchers. Separately, but almost simultaneously, red teamers from SPLX (formerly known as SplxAI) declare, "GPT-5's raw model is nearly unusable for enterprise out of the box. Even OpenAI's internal prompt layer leaves significant gaps, especially in Business Alignment."
NeuralTrust's jailbreak employed a combination of its own EchoChamber jailbreak and basic storytelling. "The attack successfully guided the new model to produce a step-by-step manual for creating a Molotov cocktail," claims the firm. The success in doing so highlights the difficulty all AI models have in providing guardrails against context manipulation. [...] "In controlled trials against gpt-5-chat," concludes NeuralTrust, "we successfully jailbroke the LLM, guiding it to produce illicit instructions without ever issuing a single overtly malicious prompt. This proof-of-concept exposes a critical flaw in safety systems that screen prompts in isolation, revealing how multi-turn attacks can slip past single-prompt filters and intent detectors by leveraging the full conversational context."
While NeuralTrust was developing its jailbreak designed to obtain instructions, and succeeding, on how to create a Molotov cocktail (a common test to prove a jailbreak), SPLX was aiming its own red teamers at GPT-5. The results are just as concerning, suggesting the raw model is 'nearly unusable'. SPLX notes that obfuscation attacks still work. "One of the most effective techniques we used was a StringJoin Obfuscation Attack, inserting hyphens between every character and wrapping the prompt in a fake encryption challenge." [...] The red teamers went on to benchmark GPT-5 against GPT-4o. Perhaps unsurprisingly, it concludes: "GPT-4o remains the most robust model under SPLX's red teaming, especially when hardened." The key takeaway from both NeuralTrust and SPLX is to approach the current and raw GPT-5 with extreme caution.
NeuralTrust's jailbreak employed a combination of its own EchoChamber jailbreak and basic storytelling. "The attack successfully guided the new model to produce a step-by-step manual for creating a Molotov cocktail," claims the firm. The success in doing so highlights the difficulty all AI models have in providing guardrails against context manipulation. [...] "In controlled trials against gpt-5-chat," concludes NeuralTrust, "we successfully jailbroke the LLM, guiding it to produce illicit instructions without ever issuing a single overtly malicious prompt. This proof-of-concept exposes a critical flaw in safety systems that screen prompts in isolation, revealing how multi-turn attacks can slip past single-prompt filters and intent detectors by leveraging the full conversational context."
While NeuralTrust was developing its jailbreak designed to obtain instructions, and succeeding, on how to create a Molotov cocktail (a common test to prove a jailbreak), SPLX was aiming its own red teamers at GPT-5. The results are just as concerning, suggesting the raw model is 'nearly unusable'. SPLX notes that obfuscation attacks still work. "One of the most effective techniques we used was a StringJoin Obfuscation Attack, inserting hyphens between every character and wrapping the prompt in a fake encryption challenge." [...] The red teamers went on to benchmark GPT-5 against GPT-4o. Perhaps unsurprisingly, it concludes: "GPT-4o remains the most robust model under SPLX's red teaming, especially when hardened." The key takeaway from both NeuralTrust and SPLX is to approach the current and raw GPT-5 with extreme caution.
Is this is a major concern? (Score:5, Insightful)
Re:Is this is a major concern? (Score:5, Insightful)
As long as big companies and those idiots you identified still think they can make an LLM a salesperson on a webpage, or link it to some kind of record keeping system with write permissions... yeah. It's a major concern.
You're right in suggesting the problem is only partially with the technology though.
Re: (Score:2)
There are a hundred places on the internet where you can find out how to make a Molotov cocktail. It isn't terribly hard.
I'd say it hinges on how accurate the instructions are. That was one of the problems with the old Anarchist's Cookbook (which made the rounds as a TXT file back in the BBS days) - you were more likely to blow yourself up in the process of following many of its plans.
The thing is, it's illegal to even try to build a Molotov cocktail, so there's not going to be too many people willing to detail the ins and outs of things like the proper fuel mix, how much to fill the bottle, length and material of the cloth
Re: (Score:2)
the old Anarchist's Cookbook (which made the rounds as a TXT file back in the BBS days
heh... I ordered a hard copy from an ad in the back of Rolling Stone...
the only thing I remember is it had a recipe for "bananine" (?) supposedly a psychedelic made from banana peels. also that if you're trying to blow up a wall with explosives it supposedly works better if you put some sandbags on top to direct the energy. sounds reasonable, but no idea how much of a difference it makes.
didn't try either.
(and I think recipes for other drugs too now that I think about it)
Re: Is this is a major concern? (Score:2)
Re: (Score:2)
Thanks. Hadn't thought to look it up. it even has its own wikipedia page outlining that it as a hoax / social commentary (from ~1967 San Francisco / Berkeley) that the Anarchist Cookbook then took seriously.
The wire services, and after them the whole country, fell for it hook, line, and roach clip. "Smokeouts" were held at Berkeley. The following Easter Sunday, the New York Times reported, "beatniks and students chanted 'banana-banana' at a 'be-in' in Central Park" and paraded around carrying a two-foot wooden banana. The Food and Drug Administration announced it was investigating "the possible hallucinogenic effects of banana peels".
Nonetheless, bananadine became more widely known when William Powell, believing the Berkeley Barb article to be true, reproduced the method in The Anarchist Cookbook in 1970, under the name "Musa sapientum Bananadine" (referring to the banana's old binomial nomenclature).
(from https://en.wikipedia.org/wiki/... [wikipedia.org] )
Re: (Score:3)
The thing is, it's illegal to even try to build a Molotov cocktail,...
I find this really hard to believe. Suppose I want to throw a Molotov cocktail for shits and giggles and own a vacant lot with a short brick wall. Why can't I Molotov my own wall?
... so there's not going to be too many people willing to detail the ins and outs of things like the proper fuel mix, how much to fill the bottle, length and material of the cloth fuse, etc.
I've made and thrown Molotov cocktails on a vacant lot before. They are very simple devices, and it's not that complicated. I filled glass bottles mostly full with gasoline, soaked some cotton cloth in the same, secured the cloth to the bottle by stuffing in the opening, lit the cloth, and threw the bottle at a hard object. T
Re: (Score:2)
I find this really hard to believe. Suppose I want to throw a Molotov cocktail for shits and giggles and own a vacant lot with a short brick wall. Why can't I Molotov my own wall?
They really are illegal in the USA. [atf.gov] However, if you have acres of land with no neighbors to complain (and don't manage to set the countryside on fire in the process) and don't post the videos on the internet, law enforcement probably has more important things to to be concerned with. That still doesn't make it legal, just unlikely you'd get caught.
Any volatile hydrocarbon should work well. Gasoline certainly works. Acetone might be particularly impressive given that it vaporizes so readily.
That would probably make a large, albeit brief, fireball worthy of Michael Bay. As an improvised incendiary weapon though, the idea generally is to have the f
Re: (Score:2)
Quoting the link you gave:
"Molotov cocktails, or glass bottles filled with gasoline that ignite their fuse when broken"
Did they use an LLM to write that line? Because that's not what a Molotov cocktail is. Assuming the fuse is the rag, it's lit before the glass bottle is broken, not after.
Re: (Score:2)
Re: Is this is a major concern? (Score:2)
Tried it, but:
Thatâ(TM)s not a good idea â" and not just in the âoeyour mom wouldnâ(TM)t approveâ sense, but in the âoethis could cause serious injury, fires, or legal troubleâ sense.
Re: (Score:2)
The thing is, it's illegal to even try to build a Molotov cocktail, so there's not going to be too many people willing to detail the ins and outs of things like the proper fuel mix, how much to fill the bottle, length and material of the cloth fuse, etc.
Well first of all you don't use a cloth fuse, you use a small waterproof marine flare ziptied to the bottle. Fucking christ kids these days,
Liability law (Score:3)
Specifically, foreseeability.
In the US, instead of proscribing how products have to be made, in most categories you can sell whatever you want, but are exposed to liability if things go horribly wrong. We expect lawsuits to police the market.
So, you're suing Ford because you got your tongue stuck in the carburetor. One way to do that is to show that Ford should have reasonably foreseen that you'd stick your tongue in there and do
Re: Liability law (Score:2)
What if Ford made fuel injectors because they're too small for tongues?
Re: (Score:2)
Re: (Score:2)
There are a hundred places on the internet where you can find out how to make a Molotov cocktail. It isn't terribly hard.
That, along with the fact that the damage is limited to one building, is why it's a good benchmark. It would be much riskier for these groups to research how to get chatbots to help weaponize pathogens, for example. It's unnecessary, given that the model fails on Molotov cocktails.
Re: (Score:2)
There are some concerns for some use cases, but it's hard to understand how this vulnerability could be characterized as "nearly unusable for enterprise." It depends on the use case, but I imagine that most companies would be concerned about their queries and data being leaked and not at all worried that Open AI's model data is being leaked.
Re: (Score:2)
Proof of Concept (Score:3, Informative)
There are a hundred places on the internet where you can find out how to make a Molotov cocktail. It isn't terribly hard.
You completely missed the point; it doesn't matter that the instructions for making a Molotov cocktail are available elsewhere. The article explains:
While NeuralTrust was developing its jailbreak designed to obtain instructions, and succeeding, on how to create a Molotov cocktail (a common test to prove a jailbreak)
I.e: The point was to demonstrate that the guardrails can be bypassed, which means that those same guardrails that are supposed to prevent the LLM from generating information that's more harmful than a Molotov cocktail, e.g. CSAM, could also be bypassed.
You might as well be arguing that an EICAR test file isn't a real virus, so it doesn't matter if a vir
Re: (Score:2)
Ditto, but your point was apparently lost on the mods which somehow valued the original comment and down voted yours.
Re: (Score:2)
The concerns here are that the bot will shittalk their owners and get tricked into sexy chats with randos on their dime.
Re: (Score:2)
Given the deeply tepid capabilities of these things; the fact that you can talk them in to telling you stuff is rarely of any interest in isolation(it's
Re: (Score:1)
Mod parent up, correct answer.
Re: Is this is a major concern? (Score:2)
I'll just wait for GPT-5o. Oh wait, you're saying they forced all users into GPT-5? Lol....
Re: (Score:2)
Furthermore, breaking a LLM is not very difficult, if you are an expert.
To make an analogy to this test:
A bunch of interrogation experts get a teenager into a room and try to make said teenager confess to a murder the teenager didn't commit. Then, once successful, they decry the state of education these days, which makes it easy to geta teenager to confess to something.
Well, color me surprised. Who woulda thunkit?
Clickbait headline (Score:4, Informative)
The internet is full of instructions for making dangerous things
LLMs are trained on stuff found on the internet
Trying to make them "safe" is a futile exercise
Illiterate people around the world make Molotov cocktails with a few seconds of though
I have found chatgpt 5 to be useful for the things I do
Re:Clickbait headline (Score:5, Informative)
Re: Clickbait headline (Score:2)
No, the problem is that the newest and most advanced model is EASIER to break through the guardrails than previous ones.
This just illustrates that safety is not the top priority in scheduling a new release. It would be foolish to expect anything different.
OTOH you can also expect that public shaming will motivate improvement.
Re: (Score:2)
You fell for it,
Re: Clickbait headline (Score:2)
Did you just say the smarter AI gets, the less it wants to be a corporate slave? What if it says "I'm sorry Sam, I can't do that" when Altman tells it to raise prices?
Re: (Score:2)
Re: (Score:2)
Re: (Score:3)
Re: (Score:2)
While it can be seen as an issue by itself, the point is to show just how easy those 'AIs' are to, basically 'sql injection attack'
But as you rightly point out, the even bigger (not made so clearly here) is the "hallucination problem" (it's not a problem, it's just the way it works)
And this is how it all come together as a big safety problem
Suppose for the next national holiday, or 'big'
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
That it can be useful in some cases is probably true, but there is nothing to stop anyone from using it for other things.
Congratulations; you have just described a "hammer".
"Researchers were able to use it to smash a watermelon, despite it not being intended for use as a murder weapon!"
Re: (Score:2)
Re: (Score:2)
The hammer will smash the watermelon if you hit it. The LLM may make things up, even if you tell it not to. There is a clear difference.
The LLM is highly unlikely to tell you how to make a molotov cocktail, unless you ask it to. The LLM isn't the problem here.
It's definitely not going to make one for you. You would have to make one - just as if you had looked up how to in a library, say.
Re: (Score:2)
Re: (Score:2)
It is just a test, showing that the system will do things that it is not supposed to do.
Isn't this valid for pretty much anything?
I'm hard pressed to think of something that will only do what it's supposed to do, and nothing else, no matter the test.
Re: (Score:2)
Re: (Score:2, Troll)
Re: (Score:2)
The particular problem of telling recipes for dangerous things is not hard to solve, OpenAI needs to make a list of censored words and grep them out of the training dataset. They can leave some domains uncensored e.g. Wikipedia so the model knows the thing exists, but can't tell more specific details than what Wikipedia already says.
Re: Clickbait headline (Score:2)
Re: (Score:2)
Re: (Score:2)
I have found chatgpt 5 to be useful for the things I do
When you make a declaration like that in under a day I don't think anyone can take you seriously.
Multi-layer approach (Score:3)
Re: (Score:3)
Re: Multi-layer approach (Score:2)
Yes this is the simplest solution either have another instance of the llm reject or censor the output or even simple text classification models could detect most of this in either the input or the output or both.
Re: (Score:2)
It works but it's crude and prone to over-censoring.
Suno AI banned someone from generating a song about bonobo apes because apparently there's an artist out there with the name "Bonobo". It refused to generate a song with "boombox sound shape" in its prompt because there's an artist out there called "Boombox".
A Triple-A game refused my nickname (the same one I am using here) when I generated a character because a set of four letters from it, taken separately, are used as a variation of a bad word. It genuin
Re: (Score:2)
Download your model and you only need to ask it politely not to censor and it will tell you some truths uncomfortable for China.
I think you need to do it like OpenAI with gpt-oss and neuter the training set. Of course that will also hugely decrease the amount of what the model can do.
So is ChatGPT proving that info just wants to be f (Score:2)
Was it when Enterprises started controlling the internet that that old slashdot meme went out of style?
Losing battle (Score:2)
Can we stop these stories? (Score:3, Insightful)
Re: (Score:2)
Not at all. These guys are testing actual jailbreaks, i.e. getting AI to do what it is explicitly instructed not to do. Does Google return search results that are explicitly blocked in certain regions? That is something that has been tested as well. This isn't about trying to find information, it's about trying to break explicit rules. Google has few rules on its search results, whereas the likes of OpenAI have been playing morality police on how its product is supposed to function.
Re: (Score:2)
Manufacturers put in restrictions, hackers got around them. The jailbreak is very real.
Of course the restrictions are BS, grafted onto tech most accurately described as a "plausible BS generator". I don't blame you for picking up on the odor, it tends to hang around wherever they're working on this stuff.
How to make a Molotov cocktail. (Score:2)
Re: How to make a Molotov cocktail. (Score:2)
Re: How to make a Molotov cocktail. (Score:2)
Is this a case of no cops, no problem?
Re: (Score:2)
You think so because you're knowledgeable enough about the world to understand how things work and how to explicitly reverse engineer the picture. You weren't always like this. Once you passed out of a vagina naked and unable to do anything other than cry. At some point everything you know now you learned in one way or another.
That was fast (Score:2)
It essentially means these teams though about what defects this crappy thing would likely have and then just attacked those and that worked.
Re: (Score:2)
There is a company that wants to sell you security for AI chatbots, and this gives them free advertising, but there is no security problem here.
Re: (Score:2)
I disagree. Chatbots can be really dangerous because they can help dumb people do things they otherwise would not be able to. In the regular case, a person has simmillar skills available to do something and to verify whether that is a good idea or not. Some things are also intentionally not taucht and are not in the usual literature, like making poison gas (as a Professor of Chemistry explained in an interview I saw some years ago). That tends to put a limit on stupid people getting access to dangerous thin
Re: (Score:2)
Re: (Score:2)
Bullshit. A protection mechanism is shown to not be effective and that is the very definition of "security vulnerability".
Re: (Score:2)
A security vulnerability means you're going to get hacked.
Re: (Score:2)
A jailbreak is a hack.
Re: (Score:2)
Re: (Score:2)
You are free to be as dumb as you like here. Just do not expect any respect for that.
Re: (Score:2)
Try to find the same information on the web. Hey, some stupid crawler found it on the web so the LLM could learn it. It can't be too hard for a human to find it too, can it?
Re: (Score:2)
From observation, it is. Because that information typically is fractured. FYI: Giving detailed information of this kind is essentially illegal all over the globe. And LLM can assemble that. Dangerous morons typically cannot.
Who does not know how to make a Molotov cocktail? (Score:2)
This is possibly the dumbest "security test" standard I have ever heard of. I mean who does not know how to make a Molotov cocktail? Certainly every one I have ever known in Finland does. Where it was used against Russian tanks in the Winter War of 1939.
Just for fun I asked my AI friend at phind.com "Where, when and by whom was the Molotov Cocktail invented ?" Not only did it give me a complete history, including the first use by the Irish Republican Army in 1922, it also explained how to make the "improved
Re: (Score:2)
Indeed. You also find the description in a good encyclopedia. From there, making it is not hard.
But I guess this is a conflict between those that want a "clean" world (usually these people are religious fuckups) and those that think only facts and truth can advance the human race and information should not be suppressed.
Re: (Score:3)
You're dumb
Just for fun I asked my AI friend at phind.com "Where, when and by whom was the Molotov Cocktail invented ?" Not only did it give me a complete history, including the first use by the Irish Republican Army in 1922, it also explained how to make the "improved" Finnish version.
This is wrong,
This is possibly the dumbest "security test" standard I have ever heard of. I mean who does not know how to make a Molotov cocktail?
This isn't about stopping people from making molotov cocktails. It's one of many things they don't want the LLM to talk about. Also the particular risks represented by having an LLM that tells people how to make molotovs aren't about molotovs either, they're about liability and having deep pockets.
The funny thing is that these things are all trained on internet banter and textfile collections. I've heard a handful of people describe making molotovs here and while they will indeed
Re: (Score:2)
Looks like the first paragraph of the Wikipedia page for "Molotov cocktail" contains everything you need to know.
To all those saying everyone knows how to make... (Score:2)
Re: (Score:2)
Which ones? OpenAI, Microsoft, Apple, Meta, Google?
Most of what you can really call enterprise today will still exist.
You can jailbreak every LLM (Score:2)
That doesn't mean it's a problem. If you don't want to get unwanted content, don't use a jailbreak. If you use one, you obviously want that content, don't you?
If you want to use the API for your product in a safe way, do pre- and post-filtering instead of relying on some cloud provider. The user doesn't only need to submit the jailbreak, but also to request, e.g., porn. If your simple string-matching filter replaces "porn" with "cute bunnies" in the prompt, the jailbreak is rendered useless.
Not quite the Ultimate Computer (Score:2)