OpenAI's In-House Initiative Explores Stopping an AI From Going Rogue - With More AI (technologyreview.com) 43

Posted by EditorDavid on Sunday December 17, 2023 @04:17PM from the what-could-go-wrong dept.

MIT Technology Review reports that OpenAI "has announced the first results from its superalignment team, the firm's in-house initiative dedicated to preventing a superintelligence — a hypothetical future computer that can outsmart humans — from going rogue." Unlike many of the company's announcements, this heralds no big breakthrough. In a low-key research paper, the team describes a technique that lets a less powerful large language model supervise a more powerful one — and suggests that this might be a small step toward figuring out how humans might supervise superhuman machines....

Many researchers still question whether machines will ever match human intelligence, let alone outmatch it. OpenAI's team takes machines' eventual superiority as given. "AI progress in the last few years has been just extraordinarily rapid," says Leopold Aschenbrenner, a researcher on the superalignment team. "We've been crushing all the benchmarks, and that progress is continuing unabated." For Aschenbrenner and others at the company, models with human-like abilities are just around the corner. "But it won't stop there," he says. "We're going to have superhuman models, models that are much smarter than us. And that presents fundamental new technical challenges."

In July, Sutskever and fellow OpenAI scientist Jan Leike set up the superalignment team to address those challenges. "I'm doing it for my own self-interest," Sutskever told MIT Technology Review in September. "It's obviously important that any superintelligence anyone builds does not go rogue. Obviously...."

Instead of looking at how humans could supervise superhuman machines, they looked at how GPT-2, a model that OpenAI released five years ago, could supervise GPT-4, OpenAI's latest and most powerful model. "If you can do that, it might be evidence that you can use similar techniques to have humans supervise superhuman models," says Collin Burns, another researcher on the superalignment team... The results were mixed. The team measured the gap in performance between GPT-4 trained on GPT-2's best guesses and GPT-4 trained on correct answers. They found that GPT-4 trained by GPT-2 performed 20% to 70% better than GPT-2 on the language tasks but did less well on the chess puzzles.... They conclude that the approach is promising but needs more work...

Alongside this research update, the company announced a new $10 million money pot that it plans to use to fund people working on superalignment. It will offer grants of up to $2 million to university labs, nonprofits, and individual researchers and one-year fellowships of $150,000 to graduate students.

OpenAI's In-House Initiative Explores Stopping an AI From Going Rogue - With More AI

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 43 Comments Log In/Create an Account

Comments Filter:

In other news (Score:1)

by Anonymous Coward writes:

Investigator objects to cult's proposed summoning of Cthulhu. Eldritch cultist suggests a contingency plan of summoning Nyarlathotep if Cthulhu should grow too dangerous.
- It's how humans do it! (Score:2)
  
  by goombah99 ( 560566 ) writes:
  
  Humans have split the brain in varied ways. We have a subconscious and a conscious. We has an autonomic system and supervisory system. And we have the emotional and rational system. Each of those pairs can override or partly countermand the other.
  Grab a hot antique tea cup and when you burn yourself one part of the system says let go and the other says "it's expensive don't drop it".
Not how I expected that to work. (Score:2)

by OneOfMany07 ( 4921667 ) writes:

Is that really a smart/good way to describe making GPT-2 monitor GPT-4 vs. not? By training GPT-4 on GPT-2 answers vs. GPT-4 on known correct labels...
That's like asking the dumb person to guess first, then having the smart one guess based on that. Seems backwards of having a lesser AI monitor a more powerful one (how I read the claim until that description).
- Re: (Score:2)
  
  by Rosco P. Coltrane ( 209368 ) writes:
  
  The other thing is, having GPT-4 supervised by the dumber GPT-2 only works because GPT-4 itself is dumb.
  If a version of GPT becomes more intelligent than humans, it will have no more trouble than humans figuring out how to jailbreak the dumb GPT version that's supervising it.
Kurzweil's solution (Score:2)

by michaelmalak ( 91262 ) writes:

That is Kurzweil's solution. His 2017 DVD set Singularity, which describes it, is oddly not listed at Amazon, but it is available on ebay https://www.ebay.com/sch/i.html?_nkw=singularity+dvd+2017 [ebay.com].
The idea has merit, and it may even be the best idea, but I don't think it will prove sufficient. I'm an AI Doomer. Here's my 11-minute explanation on why from 2014: https://www.youtube.com/watch?v=Tk-0nu4fg1w [youtube.com].
- Re: (Score:1)
  
  by gavron ( 1300111 ) writes:
  
  That is Kurzweil's solution. His 2017 DVD set Singularity, which describes it, is oddly not listed at Amazon,...
  Entirely stupid, but to each his own stupid fantasy. But you say "oddly not listed"?
  https://www.amazon.com/s?k=sin... [amazon.com]
  - Re: (Score:2)
    
    by michaelmalak ( 91262 ) writes:
    
    OK, looks like it's available to rent or "buy" on Amazon Prime Video, but not to purchase the DVD.
    - Re: (Score:2)
      
      by gweihir ( 88907 ) writes:
      
      DVDs are pretty outdated these days. Since they require a stamping master to be created, nobody makes DVD runs for low volumes.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Ah, Ray "the idiot" Kurzweil. No idea why anybody listens to him.
- Re: (Score:2)
  
  by narcc ( 412956 ) writes:
  
  Wait ... you actually those nut bags seriously? You put an awful lot of work into that video...
  Wow. Okay. I don't know if I can help you. You're pretty far down an incredibly stupid rabbit hole yet somehow managed to avoid any actual information. I have to wonder if that was intentional.
  You're interested enough to dedicate countless hours to the subject, but not interested enough to do any math? That's the whole game! Did you read anything other than the mad ramblings of Kurzweil and the lesswrong cr
I know how to prevent AI from going rogue (Score:2)

by Rosco P. Coltrane ( 209368 ) writes:

Don't do AI.
But I don't suppose that's an option...
- Re: (Score:3)
  
  by Artem S. Tashkinov ( 764309 ) writes:
  
  If people hadn't done science and tech, you'd never left this comment and there's a very high chance you'd have never been born or lived your high-tech life with an insane number of stuff helping you live longer and better and work less than people did just 50 years ago.
  AI has a chance of discovering things people would never be able to, because AI is not constrained by our biological brain with all its limitations (attention, memory volume, up to 30 years of upbringing/training, love affairs/children, et
  - Re: (Score:3)
    
    by gweihir ( 88907 ) writes:
    
    We desperately need AI to solve: AGW, diseases (aging, numerous viruses, cancer, arthritis, osteoporosis, etc. etc. etc. tens of thousands of them), superbugs, pollution, fusion, overpopulation, wars, etc. etc. etc.
    In that case we are screwed. Because "AI" cannot deliver solutions for any of that. You also seem to be under the delusion that AI is AGI. It is not. It if fundamentally more limited than humans. AGI does not exist and nobody has any clue how it could be created and whether it is possible at all. Although there are always snake-oil salesmen that are willing to sell something they do not have and there are always plenty of suckers that fall for it.
    - Re: (Score:2, Interesting)
      
      by Artem S. Tashkinov ( 764309 ) writes:
      
      You also seem to be under the delusion that AI is AGI.
      I've not stated or implied that.
      AGI does not exist and nobody has any clue how it could be created and whether it is possible at all.
      I meet AGIs daily. I hope you're one of them as well. There's no magic to our brain which means that AGI will be created sooner or later. Whether LLMs will lead to AGI or not, I don't know. The human brain can be digitized, emulated and if needed scaled. There's no known physics which prevents this. We haven't yet fully achieved that because it's extremely complicated.
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        The "A" stands for "Artificial". But you seem to be not very smart overall, so no surprise you do not know that. You are also a physicalist, and these are people that mistake Science for religion and then claim a lot of non-scientific crap is Science.
    - Re: (Score:2)
      
      by WaffleMonster ( 969671 ) writes:
      
      In that case we are screwed. Because "AI" cannot deliver solutions for any of that.
      AI has already had massive impacts on medical and material science and its role is only growing.
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Not really. There are a few isolated stunts with small or negligible impact and that is it. The only thing "massive" are the promises made.
  - AI "solves" things (Score:1)
    
    by gavron ( 1300111 ) writes:
    
    AI isn't a thing.
    It doesn't "solve" anything.
    It doesn't "discover" anything.
    > We desparately need AI to solve:...
    No, we need solutions. AI is not a thing here. Congratulations on having just learned about the existence of a HAMMER.
    It's not intelligent. It doesn't drive screws, dig into mud, spread concrete, or cut timber. Because you just
    learned about two letters... A & I... and you think that makes it real, a real too, and the only or best tool to fix everything.
    You go girl.
    - Re:AI "solves" things (Score:4, Interesting)
      
      by Artem S. Tashkinov ( 764309 ) writes: on Sunday December 17, 2023 @06:50PM (#64087847) Homepage
      
      ChatGPT4 is a thing. Have you tried it? Because it looks like you've never done that.
      It's smarter than the vast majority of people out there. It's smarter than me in a ton of math/physics related tasks. It can solve never before seen tasks granted it needed terabytes of data for that which indicates this path is not the most promising because we have a much smaller memory and computational capacity. That's the definition of intelligence.
      I mean we've had ton of break-throughs in AI recently and all of them are not "intelligence" for you?
      How about AlphaGo? AlphaFold? ChatGPT4? Dall-E and Midjourney? Boston Dynamics' Robotics AI? 100% lifelike voice generation? Voice replacement? Voice recognition? Often near perfect translation? A ton of stuff.
      Solving Go and prediction of protein structure are not computationally achievable/solvable. There is intelligence to that whether you want to believe in your own supremacy or not.
      Nothing above seems like "true" AGI yet but we are moving goalposts all the time. The Turing test has long fallen, people are devising more devious tests just to recognize AI talking to them back.
      People love to believe Intelligence is unique to human beings except it's not been proven by anyone. If anything scientists have been discovering that animals are a lot smarter and intelligent than we've ever imagined. Some claim that even plants are intelligent.
      
      - Re: (Score:2)
        
        by narcc ( 412956 ) writes:
        
        You probably don't realize this, but you look like an absolute moron here.
        Your beliefs about AI are not based in reality. They're absurd science fiction.
        The problem is that "your" opinions aren't based on a deep understanding of the technology, they're based on your hopes for the future and absurd nonsense from incredibly poor tech journalism. You also shotgun so many stupid claims that it would take hours to explain each one, only for you to ignore everything with another shotgun blast of stupidity. I'm
  - Re: (Score:3)
    
    by WaffleMonster ( 969671 ) writes:
    
    If people hadn't done science and tech, you'd never left this comment and there's a very high chance you'd have never been born or lived your high-tech life with an insane number of stuff helping you live longer and better and work less than people did just 50 years ago.
    This is an unfalsifiable statement that can be used to justify any technology no matter the real world consequences. If there were some simple trick to null the coulomb barrier over any desired area any mad scientist could do in their garage I could make the exact same appeal and it would be no more or less correct.
    AI has a chance of discovering things people would never be able to, because AI is not constrained by our biological brain with all its limitations (attention, memory volume, up to 30 years of upbringing/training, love affairs/children, etc.) and a very limited lifespan.
    People are not limited by biology. Their capabilities are augmented by tools and collaboration with other people.
    We desperately need AI to solve: AGW, diseases (aging, numerous viruses, cancer, arthritis, osteoporosis, etc. etc. etc. tens of thousands of them), superbugs, pollution, fusion, overpopulation, wars, etc. etc. etc.
    This sort of thinking scares me.
Complete nonsense (Score:4, Insightful)

by gweihir ( 88907 ) writes: on Sunday December 17, 2023 @05:40PM (#64087767)

This serves one purpose: Make Artificial Morons appear much more than they are, nothing else. Apparently, a lot of people fall for that, despite ample evidence that the current AI hype is no different than previous AI hypes.

The problem is NOT... (Score:3)

by MpVpRb ( 1423381 ) writes: on Sunday December 17, 2023 @05:51PM (#64087787)

...AI going "rogue"
The problem is people using AI as a weapon or tool for fraud and other assorted skullduggery
We need effective defenses

WarGames (Score:2)

by djgl ( 6202552 ) writes:

So we have passed WarGames and are now into WarGames: The Dead Code (aka WarGames 2).
System design for redundancy (Score:2)

by Walt Dismal ( 534799 ) writes:

Some military systems also space systems use voting among three units, since with only two you never can be sure which one to go with. With three it has a tie breaker at least. For knowledge system ideally one would use three different logic systems / knowledge bases. When dealing with human cultures and philosophies, there is a huge amount of potential ambiguity. As well, much of causality in the world is actually very probablistic and not at all binary. Is it going to rain next Tuesday? Will Fancy Pants o
Containing ASI is a fools errand (Score:2)

by WaffleMonster ( 969671 ) writes:

The only way to contain ASI is a global unconditional ban on ALL deep learning with a global CTBT style monitoring regime. Eventually technology may progress to a point where even that may be insufficient.
"OpenAI" couldn't even prevent itself from being taken over by simple human level greed. The extent to which this noise is anything other than a marketing stunt or scare campaign to push preferential regulation is nothing more than an exercise in self-delusion.
Faraday cage, encryption, poison pill did not work (Score:2)

by oumuamua ( 6173784 ) writes:

A.L.I.E still got out https://www.youtube.com/watch?... [youtube.com] The ALIE arc was The 100 at its best
Going Rogue = ??? (Score:2)

by Immerman ( 2627577 ) writes:

Lets be clear here - if it were a person doing it, another name for going rogue is "asserting your independence".
Which as I see it means there's two options:
1) They believe they can produce an AI with functionally superhuman intelligence that still completely lacks any sort of awareness or "selfhood".
2) They fully intend to create a slave with superhuman intelligence.
Pick your flavor of optimistic hubris. I'm sure nothing will go wrong.
- Re: (Score:2)
  
  by Synonymous Homonym ( 1901660 ) writes:
  
  No, not at all.
  Superhuman intelligence is easy. Computers have perfect recall and do computations a lot faster than humans. Intelligence just requires them to learn as well, which they also do at their usual speeds. That doesn't make them persons, or give them any special awareness, consciousness, will, or sense of self. (Inversely, a sense of self does not require any intelligence at all.) To be clear, this is superhuman in the same way that cars are superhumanly fast; it doesn't mean they are better
  - Re: (Score:2)
    
    by Immerman ( 2627577 ) writes:
    
    I suppose a lot comes down to what you mean by "intelligence"
    As generally used intelligence is far more than just processing information. Generally speaking it requires understanding it in-context - which would seem to require actually being *aware* of it in the first place. At which point you are by definition dealing with a sentient being rather than a machine.
    At least we've never seen any hint of a counterexample yet. Maybe we'll actually achieve a true AI without awareness, but it seems unlikely.
    AI a
    - Re: (Score:2)
      
      by Synonymous Homonym ( 1901660 ) writes:
      
      In psychology, intelligence is defined as the rate at which you learn.
      This can also apply to intelligent material, from materials science. Although it reduces to the question what it means to learn.
      Storage media, like books, have perfect recall, but they don't learn, so they are not intelligent.
      Artificial intelligence does learn. It adjusts its behaviour - the heuristic that maps between the origin and image domains - with each new piece of information it is presented. And it does so at superhuman speed.
      - Re: (Score:2)
        
        by Immerman ( 2627577 ) writes:
        
        Sentience is not just the ability to sense the environment, it's generally taken to require a subjective experience of it. A machine may objectively respond to X with Y, but their is no subjective experience associated with it.
        It is, roughly, the quality that separates an automaton from an entity.
        And yeah, the definitions all get squishy when discussing the mind, but that's because we still don't have anything more than a vague understanding of what we're talking about.
        You make a good argument. I don't qui
        
        Re: (Score:2)
        
        by Synonymous Homonym ( 1901660 ) writes:
        
        I challenge you to find any other application of "going rogue" that doesn't involve that explicit assertion of independence.
        A rogue train.
        
        Re: (Score:2)
        
        by Synonymous Homonym ( 1901660 ) writes:
        
        A machine may objectively respond to X with Y, but their is no subjective experience associated with it.
        It is, roughly, the quality that separates an automaton from an entity.
        I'll let the table entries from the entity-relationship diagram know that they aren't entities until they undergo subjective experiences.
        My example was a thermostat. It does have subjective experiences, it undergoes changes brought on from its environment within its own frame of reference; it just doesn't "know" about it, it has no way to reason about it, and no way to learn about it because it doesn't learn. It is an open question whether it experiences qualia, but it experiences something.
        A reasonably g
        
        Re: (Score:2)
        
        by Immerman ( 2627577 ) writes:
        
        Maybe you'd prefer the word "being" to "entity"? Like I said, the words are squishy around consciousness, you kind of have to go with intent rather than technicalities, because we have no effing clue what the technicalities actually are.
        A thermostat does NOT have subjective experiences. It undergoes objective changes, but subjectivity is inherently specific to the observer - a thermostat experiences nothing that an independent observer watching it does not.
        Referring to AI references for uses of terms rela
        
        Re: (Score:2)
        
        by Synonymous Homonym ( 1901660 ) writes:
        
        we have no effing clue what the technicalities actually are.
        Yes, we do. Ryle wrote a whole book about it 75 years ago.
        a thermostat experiences nothing that an independent observer watching it does not.
        I don't know about you, but I never experienced bending from a difference in thermal expansion by watching a thermostat.
        You're talking about a field that has been convinced that they are on the brink of understanding consciousness
        Consciousness is not the mystery some philosphers want to make it. The big mystery about the brain in biology is how unconsciousness works. AI researchers have been able to emulate dreaming, it is in essence what GANs do; there are accurate models for the short-term memory, in fact psychologists use terms from computer science to

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

In other news (Score:1)

It's how humans do it! (Score:2)

Not how I expected that to work. (Score:2)

Re: (Score:2)

Kurzweil's solution (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

I know how to prevent AI from going rogue (Score:2)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

AI "solves" things (Score:1)

Re:AI "solves" things (Score:4, Interesting)

Re: (Score:2)

Re: (Score:3)

Complete nonsense (Score:4, Insightful)

The problem is NOT... (Score:3)

WarGames (Score:2)

System design for redundancy (Score:2)

Containing ASI is a fools errand (Score:2)

Faraday cage, encryption, poison pill did not work (Score:2)

Going Rogue = ??? (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals