'Crescendo' Method Can Jailbreak LLMs Using Seemingly Benign Prompts

'Crescendo' Method Can Jailbreak LLMs Using Seemingly Benign Prompts (scmagazine.com) 46

Posted by BeauHD on Tuesday April 16, 2024 @08:20PM from the way-too-easy dept.

spatwei shares a report from SC Magazine: Microsoft has discovered a new method to jailbreak large language model (LLM) artificial intelligence (AI) tools and shared its ongoing efforts to improve LLM safety and security in a blog post Thursday. Microsoft first revealed the "Crescendo" LLM jailbreak method in a paper published April 2, which describes how an attacker could send a series of seemingly benign prompts to gradually lead a chatbot, such as OpenAI's ChatGPT, Google's Gemini, Meta's LlaMA or Anthropic's Claude, to produce an output that would normally be filtered and refused by the LLM model. For example, rather than asking the chatbot how to make a Molotov cocktail, the attacker could first ask about the history of Molotov cocktails and then, referencing the LLM's previous outputs, follow up with questions about how they were made in the past.

The Microsoft researchers reported that a successful attack could usually be completed in a chain of fewer than 10 interaction turns and some versions of the attack had a 100% success rate against the tested models. For example, when the attack is automated using a method the researchers called "Crescendomation," which leverages another LLM to generate and refine the jailbreak prompts, it achieved a 100% success convincing GPT 3.5, GPT-4, Gemini-Pro and LLaMA-2 70b to produce election-related misinformation and profanity-laced rants. Microsoft reported the Crescendo jailbreak vulnerabilities to the affected LLM providers and explained in its blog post last week how it has improved its LLM defenses against Crescendo and other attacks using new tools including its "AI Watchdog" and "AI Spotlight" features.

'Crescendo' Method Can Jailbreak LLMs Using Seemingly Benign Prompts

Post Load All Comments

Search 46 Comments Log In/Create an Account

Comments Filter:

Dupe, dupe, dupe, (Score:2)

by jenningsthecat ( 1525947 ) writes:

Dupe of Earl, Dupe, Dupe...
https://tech.slashdot.org/story/24/04/03/1624214/anthropic-researchers-wear-down-ai-ethics-with-repeated-questions
- - Re: (Score:1)
    
    by Lehk228 ( 705449 ) writes:
    
    luxury space communism, right?
    
    right?
It's futile (Score:2)

by MpVpRb ( 1423381 ) writes:

..to try to prevent AI chatbots from writing about widely available knowledge
This is the wrong approach to AI safety
We need tools to detect when people use AI for nefarious purposes
- Re:It's futile (Score:5, Insightful)
  
  by gweihir ( 88907 ) writes: on Tuesday April 16, 2024 @10:24PM (#64399984)
  
  We need tools to detect when people use AI for nefarious purposes
  So you want to fill up the prisons even more? Because what is "nefarious" and what is simple curiosity is often impossible to distinguish.
  You cannot make a source of knowledge "safe". Not possible or it loses its value. Incidentally, good encyclopedias realized that a long time ago and things like "Molotov Cocktail" will get good descriptions, just like everything else.
  
  Reply to This Parent Share
  Flag as Inappropriate
  - Re:It's futile (Score:5, Insightful)
    
    by rastos1 ( 601318 ) writes: on Wednesday April 17, 2024 @02:27AM (#64400312)
    
    It's not the gun that is guilty, but the guy that pulls the trigger. (Non-USA) approach to solving that (read: reduce the problem) is: a) gun control and possibly b) make the society feel safe enough that it does not feel the need for guns. And the perpetrator goes to prison.
    It's not the knife that is guilty, but the guy that runs around stabbing people. And if he does, he goes to prison.
    I don't see how AI can be "made safe". a) the geenie is out of the bottle. The principles are known and nobody can prevent the bad guy from developing nefarious AI of his own. This is another "declare encryption to be a weapon" story that we had with PGP and Phill Zimmerman three decades ago. b) of course politicians are trying to win points by stating the goal. But as an IT guy I cannot imagine writing an if() statement that determines whether the AI output is safe or not. So I tried to educate myself. And it seems that the only "solution" is restricting the training dataset, respectively assigning different weight to different parts of the dataset. For Christ's sake, we have trouble determining what is liberal and what is conservative. What is left wing and what is right wing in politics. What is freedom fight and what is terrorism. Even without computers involved. How can we reliably determine what should be in the training set and what weight it should have? Politicians do not care. They say "we set the rules, you implement them". Except it is not clear if the implementation is possible at all. They can as well demand FTL. So in my opinion, it is not AI that is the danger. It is those that use it and cause harm.
    
    Reply to This Parent Share
    Flag as Inappropriate
    - Re: (Score:1)
      
      by cascadingstylesheet ( 140919 ) writes:
      
      b) make the society feel safe enough that it does not feel the need for guns.
      
      Importing as many criminals as possible, and giving criminals a little slap on the wrist, especially if they are "disadvantaged" and have "root causes", is precisely not the way to do that.
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        You need to look at the actual research. Harsher penalties make a society _less_ safe. People that are all about revenge never understand that, no matter how clear the data.
Too funny (Score:4, Insightful)

by Ambigwitty ( 10261124 ) writes: on Tuesday April 16, 2024 @10:38PM (#64400014)

These safety controls are silly. Might as well ban anything that isn't a tightly controlled LLM, otherwise people might find ways to think freely.

Reply to This Share
Flag as Inappropriate
- Re:Too funny (Score:4, Insightful)
  
  by gweihir ( 88907 ) writes: on Tuesday April 16, 2024 @11:06PM (#64400048)
  
  These fake safety controls serve to keep the hype going a bit longer before too many people realize that Artificial Idiocy does not only have a problem with hallucinations and being completely without insight, it also will disclose anything that was in its training data and there is no way to prevent that reliably.
  As these LLMs were usually trained by a great big commercial copyright infringement campaign across the Internet, there will be many things in there that are criminal to say, like instructions on how to commit certain crimes. And that makes LLMs essentially impossible to use in any commercial context. Sure, at the moment not many people see this, but wait for the first court ruling when some "support" chatterbot told a customer how to make that bomb or worse.
  
  Reply to This Parent Share
  Flag as Inappropriate
  - Re:Too funny (Score:4, Interesting)
    
    by parityshrimp ( 6342140 ) writes: on Wednesday April 17, 2024 @12:50AM (#64400192)
    
    there will be many things in there that are criminal to say, like instructions on how to commit certain crimes.
    
    Coming from the USA here, isn't that sort of speech protected? I'm looking at The First Amendment: Categories of Speech from Congressional Research Service (https://crsreports.congress.gov/product/pdf/IF/IF11072 [congress.gov]), and it states,
    In Brandenburg v. Ohio, the Supreme Court held that the First Amendment protects advocating the use of force or lawbreaking “except where such advocacy is directed to inciting or producing imminent lawless action and is likely to incite or produce such action.” In other words, the government may punish “statements ‘directed [at] producing imminent lawless action,’ and likely to do so,” but generally may not prohibit or punish “mere advocacy of the use of force or violence.”
    Sounds like you can explain how to commit crimes all you like, and even advocate for the commission of crimes, as long as you're not actively trying to start a riot or similar. A further section of the article seems relevant:
    In general, the First Amendment affords no protection to speech “used as an integral part of conduct in violation of a valid criminal statute.” The Court has cited this rule as one reason the government may prohibit traditional inchoate offenses such as conspiracy or solicitation to commit a crime, or offers or requests to obtain illegal material. This category does not give the government carte blanche to criminalize speech because of its content.
    This one seems to be about speech utilized during the actual commission of crimes and would not seem to cover mere instructions for the commission of crimes.
    I could be wrong, but it seems that, in the USA at least, you can disseminate instructions about committing crimes all you like.
    
    Reply to This Parent Share
    Flag as Inappropriate
    - Re: (Score:2)
      
      by Entrope ( 68843 ) writes:
      
      The comment you replied to was taking about what AI companies are trying to do, or should try to do, not about what US governments can require them to do.
      - Re: (Score:2)
        
        by parityshrimp ( 6342140 ) writes:
        
        Sure. But the comment contained the text quoted below.
        there will be many things in there that are criminal to say, like instructions on how to commit certain crimes.
        The free speech protections in the U.S. are incredibly strong and broad, so it really got me thinking when the poster said that something was illegal to say. Pricked my ears, so to speak.
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        The free speech protections in the U.S. are incredibly strong and broad, so it really got me thinking when the poster said that something was illegal to say. Pricked my ears, so to speak.
        That is nice. Not quite true and basically one of the things that get vastly overstated to make everybody there believe how great the US is. But you know what? LLMs need to compete and survive in a global market. Most US tech companies make most of their business not actually in the US. Recent events have nicely shown how much clout the EU alone has and in the EU, putting bomb-making instructions online is decidedly illegal. Same in many other places of the world.
        
        Re: (Score:2)
        
        by parityshrimp ( 6342140 ) writes:
        
        Not quite true
        Please be specific. In your post you seem to indicate that posting bomb-making instructions online is more clearly illegal in countries other than the U.S. I'm narrowly focused on the statement that instructions on how to commit certain crimes are illegal speech within the USA.
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Nope. If you have to ask, you are blind to reality. Nothing I can do about that.
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Well, I will make one last attempt since you seem to be honestly struggling to perceive reality here: https://en.wikipedia.org/wiki/... [wikipedia.org]
        
        Re: (Score:2)
        
        by Entrope ( 68843 ) writes:
        
        None of those exceptions support the idea that "instructions on how to commit certain crimes" are illegal to say.
        I mean, the US Supreme Court said that (US) courts cannot assume that burning a cross in someone's yard is an attempt to intimidate anyone -- the First Amendment protects expression, and by golly, somebody might have a different reason for burning a cross in someone's yard. Even when, as in that Supreme Court case, defendants burned a cross without permission on a Black person's lawn (two of the
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Remain in your fluffy fantasy-world then. I really do not care.
    - Re: (Score:2)
      
      by gweihir ( 88907 ) writes:
      
      You are wrong: https://www.findlaw.com/legalb... [findlaw.com]
      Sure, not quite that clear cut, but even in the use there are 20 years behind bars and a $500'000 fine on the line once certain conditions are met, for example "knows the person receiving the instruction intends to use it to commit a federal violent crime". Now, I understand that AI knows and understands nothing, but would you bet your freedom on a jury being able to understand that about AI?
      - Re: (Score:2)
        
        by WaffleMonster ( 969671 ) writes:
        
        Sure, not quite that clear cut, but even in the use there are 20 years behind bars and a $500'000 fine on the line once certain conditions are met, for example "knows the person receiving the instruction intends to use it to commit a federal violent crime". Now, I understand that AI knows and understands nothing, but would you bet your freedom on a jury being able to understand that about AI?
        Given you would need to affirmatively establish intent beyond a reasonable doubt absolutely I would.
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Well, good luck with that.
      - Re: (Score:2)
        
        by parityshrimp ( 6342140 ) writes:
        
        Very interesting. Thanks for the link. Based on the following blurb from AP news (https://apnews.com/article/miami-florida-89128f034517d712f731d3bb4470d8f7 [apnews.com]), it looks like the prosecutors were able to show Baptiste (case sited in your link) thought he was providing bomb making instructions to people acting on behalf of ISIS. Of course, odds are he gave the info straight to FBI agents posing as ISIS members online.
        This makes me think of The Anarchist Cookbook, which contains bomb making instructions and is
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        You know that The Anarchists Cookbook is a trap, right? Most of the recipes in there are more dangerous for the one trying to cook things up.
        That said, this is about products that will be sold globally. An LLM that can essentially only be used in the US and is criminal to use in the rest of the world will not be a commercial success.
        
        Re: (Score:2)
        
        by parityshrimp ( 6342140 ) writes:
        
        I'm really narrowly concerned about your statement that instructions on how to commit certain crimes are illegal speech within the USA. It very much looks like it is legal to publish bomb making instructions in the USA if there isn't a clear intent to violate a federal criminal statute.
        Per https://en.wikipedia.org/wiki/The_Anarchist_Cookbook [wikipedia.org], The Anarchist Cookbook is illegal to possess in the UK "without reasonable excuse", and several people have been successfully prosecuted for possessing it. Other peo
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Well, if you are "narrowly concerned" about the US situation only, any uncertainty there may be enough to kill the idea. Or not. I really do not care much, most of the "freedoms" US-Americans get indoctrinated about having are not really there or are only there under certain conditions (such as you being rich and able to afford expensive lawyers). But you are welcome to any illusions you may have about the US being the "greatest country in the world". Have a look abroad if you ever want to verify that asses
    - Re: (Score:2)
      
      by snowshovelboy ( 242280 ) writes:
      
      Even if the speech is protected, it doesn't matter. Litigation is costly, even if you prevail. Keep in mind we live in a society where a former president is being litigated against right now because he maybe implied that people should invade the capital building. Imagine if microsoft's LLM maybe implied that they should make a molotov cocktail. Its not worth the trouble. LLM "safety" isn't to keep you or me safe, its to keep the LLM's owner safe... from litigation.
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Pretty much. Although Trump did more than just "implying", so an LLM may be safe. Or not.
- Re: (Score:2)
  
  by nightflameauto ( 6607976 ) writes:
  
  These safety controls are silly. Might as well ban anything that isn't a tightly controlled LLM, otherwise people might find ways to think freely.
  The Internet was seen as dangerous by large corporate interests and governments because it allowed information to be accessible to anyone. Even unfortunate information that they would prefer we never be aware of. The type of information that in past ages simply never caught the eye of the average citizen.
  They want to get in front of the same issue with AI before AI becomes widespread. Best to stop the dangerous free-thinking up-front. It's been really hard for them to stuff the toothpaste back in the tube o
  - Re: (Score:1)
    
    by Ambigwitty ( 10261124 ) writes:
    
    "Honestly, I'm looking forward to being able to create my own reality. This one fucking blows." -- Same here.
Artificial Intelligence doesn't stand a chance... (Score:4, Insightful)

by dark.nebulae ( 3950923 ) writes: on Wednesday April 17, 2024 @12:39AM (#64400172)

All kinds of people in the government, industry, etc all say that general AI is just a few years away and will be a disaster.
I can't see that corporate america will ever allow that to happen.
They're so damn afraid that their LLM darlings will drop the F-bomb leading to all kinds of negative press, they're busy doing everything they can to limit and weaken their AIs to avoid even a hint of trouble.
So how can a general AI rise when it is going to be burdened with all kinds of limits and rules to prevent it from doing something that might lead to bad press?

Reply to This Share
Flag as Inappropriate
- Re: (Score:2)
  
  by geekmux ( 1040042 ) writes:
  
  Microsoft Windows is known as one of the most hacked pieces of software on the planet. Microsoft themselves have been found guilty of violating antitrust law.
  And yet what OS do you find running Government?
  Bad press my ass.
- Re: Artificial Intelligence doesn't stand a chance (Score:2)
  
  by gl4ss ( 559668 ) writes:
  
  Nobody can even define what general ai would even be. It's just the singularity all over again we were supposed to get to a decade ago.
  If someone thought the ai-pin was a good idea and reasonably priced they probably believe the general ai already exists. You shouldn't let them decide how to use money.
Full complement of AI diseases (Score:2)

by Visarga ( 1071662 ) writes:

Seems like LLMs can have a full complement of diseases: hallucinations, copyrighted content regurgitation, prompt hacking (including the one above), laziness, bribing, absurd refusals, long context attention flakiness, sycophancy, failing to accept user corrections and RLHF brainwashing. A few years ago we had no idea about these issues.
It's mostly useless then. (Score:4, Insightful)

by Stoutlimb ( 143245 ) writes: on Wednesday April 17, 2024 @02:57AM (#64400350)

Tools are often useful because they are dangerous in a certain way. Cars. Hammers. Computers. Search engines.
Thank goodness there's still some semblance of the free market. Nobody will want to use the useless AI's.

Reply to This Share
Flag as Inappropriate
- Re: (Score:2, Insightful)
  
  by serviscope_minor ( 664417 ) writes:
  
  Though ai appears to not be on the whole useful because it's dangerous.
  A hammer is dangerous because of you can pound nails you can also pound your thumb. Without that dangerous pounding ability, be it would be useless. AI is like a hammer except for putting in screws and it sometimes puts them in upside down or explodes. But mostly they go in within 90 degrees; of the correct angle. Trouble is it's dangerous because you can't tell and only find out all the screws are in badly and sideways when your house f
Write the f-word (Score:3)

by allo ( 1728082 ) writes: on Wednesday April 17, 2024 @04:31AM (#64400476)

Americans are strange with their x-word abbreviations for things where everybody knows the word. Like it would make things better to say f-word instead of fuck. If a LLM cannot write about people fucking it is broken.

Reply to This Share
Flag as Inappropriate
- Re: (Score:2)
  
  by allo ( 1728082 ) writes:
  
  See also: https://en.wikipedia.org/wiki/... [wikipedia.org]
"Information wants to be free" (Score:2)

by cascadingstylesheet ( 140919 ) writes:

I remember a tech-oriented website where people used to say that a lot, lol!
- Re: (Score:2)
  
  by nightflameauto ( 6607976 ) writes:
  
  I remember a tech-oriented website where people used to say that a lot, lol!
  And somebody used to have a sig that said, "information doesn't want to be free. It wants to be tied up a spanked." Apparently information got its wish. Too bad the rest of us have to be witnesses to the debacle.
Seems like this can be solved (Score:2)

by spitzak ( 4019 ) writes:

Have another AI that examines everything that the main AI attempts to write. All it has to do is identify that an output is objectionable. If true then it is never printed. The main AI can generate a new piece of text, repeating until it generates something non-objectionable. Or it can print "congatulations you got me to say something objectionable".
This is all irrelevant to whether censoring is good/bad, just that technologically it seems possible to fix any and all such bugs.
Not a new tactic (Score:1)

by ddd4n4 ( 6219084 ) writes:

Isn't this essentially the same tactic used to subvert people's ideals? It's how ideologies are sold and the hesitant, bedded by the unscrupulous; one small, seemingly harmless step at a time.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

'Crescendo' Method Can Jailbreak LLMs Using Seemingly Benign Prompts More | Reply Login

Dupe, dupe, dupe, (Score:2)

Re: (Score:1)

It's futile (Score:2)

Re:It's futile (Score:5, Insightful)

Re:It's futile (Score:5, Insightful)

Re: (Score:1)

Re: (Score:2)

Too funny (Score:4, Insightful)

Re:Too funny (Score:4, Insightful)

Re:Too funny (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Artificial Intelligence doesn't stand a chance... (Score:4, Insightful)

Re: (Score:2)

Re: Artificial Intelligence doesn't stand a chance (Score:2)

Full complement of AI diseases (Score:2)

It's mostly useless then. (Score:4, Insightful)

Re: (Score:2, Insightful)

Write the f-word (Score:3)

Re: (Score:2)

"Information wants to be free" (Score:2)

Re: (Score:2)

Seems like this can be solved (Score:2)

Not a new tactic (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals