Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
AI

DeepMind Tests the Limits of Large AI Language Systems With 280-Billion-Parameter Model (theverge.com) 22

An anonymous reader quotes a report from The Verge: Language generation is the hottest thing in AI right now, with a class of systems known as "large language models" (or LLMs) being used for everything from improving Google's search engine to creating text-based fantasy games. But these programs also have serious problems, including regurgitating sexist and racist language and failing tests of logical reasoning. One big question is: can these weaknesses be improved by simply adding more data and computing power, or are we reaching the limits of this technological paradigm? This is one of the topics that Alphabet's AI lab DeepMind is tackling in a trio of research papers published today. The company's conclusion is that scaling up these systems further should deliver plenty of improvements. "One key finding of the paper is that the progress and capabilities of large language models is still increasing. This is not an area that has plateaued," DeepMind research scientist Jack Rae told reporters in a briefing call.

DeepMind, which regularly feeds its work into Google products, has probed the capabilities of this LLMs by building a language model with 280 billion parameters named Gopher. Parameters are a quick measure of a language's models size and complexity, meaning that Gopher is larger than OpenAI's GPT-3 (175 billion parameters) but not as big as some more experimental systems, like Microsoft and Nvidia's Megatron model (530 billion parameters). It's generally true in the AI world that bigger is better, with larger models usually offering higher performance. DeepMind's research confirms this trend and suggests that scaling up LLMs does offer improved performance on the most common benchmarks testing things like sentiment analysis and summarization. However, researchers also cautioned that some issues inherent to language models will need more than just data and compute to fix.
"I think right now it really looks like the model can fail in variety of ways," said Rae. "Some subset of those ways are because the model just doesn't have sufficiently good comprehension of what it's reading, and I feel like, for those class of problems, we are just going to see improved performance with more data and scale."

But, he added, there are "other categories of problems, like the model perpetuating stereotypical biases or the model being coaxed into giving mistruths, that [...] no one at DeepMind thinks scale will be the solution [to]." In these cases, language models will need "additional training routines" like feedback from human users, he noted.
This discussion has been archived. No new comments can be posted.

DeepMind Tests the Limits of Large AI Language Systems With 280-Billion-Parameter Model

Comments Filter:
  • by reanjr ( 588767 ) on Wednesday December 08, 2021 @09:19PM (#62061311) Homepage

    I know there's a school of thought that AI doesn't need to be constrained by biological intelligence mechanisms. This seems excessive though. I don't think language is actually that complicated. There are only 100 billion neurons in the brain, and only some are dedicated to language. I know it's not that simple, but I feel like we missed the boat somewhere on the technology.

    • by timeOday ( 582209 ) on Wednesday December 08, 2021 @09:27PM (#62061329)
      For AI, what's best is really just an empirical question, so they are pushing the limits to find out. They are intentionally shooting for overkill as soon as they can achieve it, to find out where it is.

      But does it matter if this has more neurons than a human brain? For the most part no.

      First, this thing certainly "knows far", far more than any human - along some dimensions, which might be hard to characterize and/or useless, depending on need.

      Second, the correspondence between an artificial neuron and biological one shouldn't be taken too literally. For AI, a larger number of simpler neurons are better if they are faster or more compact than a smaller number of more powerful neurons. It would be bizarre if the best tradeoffs in one medium (silicon) happened to work out the same as those in another (wetware).

    • by Drago3711 ( 1415041 ) on Wednesday December 08, 2021 @09:44PM (#62061351)
      The parameters in this context are much closer to neural connections than they are to neurons. In the human brain each neuron has ~7k connections. So while we only have 100 billion neurons, we have something like 100-500 trillion connections. In that context, these models seem quite capable by comparison. Source: https://en.wikipedia.org/wiki/... [wikipedia.org] Back of the envelope math puts this model somewhere around 0.056%-0.28% of the connections that our brains have. Obviously this is a bit of an apples to oranges comparison, but it's the best we've got. There is some recent work trying to quantify how many artificial neurons it takes to simulate a real one, and its somewhere around 1k: https://singularityhub.com/202... [singularityhub.com]
    • Re: (Score:2, Informative)

      by phantomfive ( 622387 )

      When a person speaks, typically we have a concept in mind, and we search for words to express that idea.

      When a neural network speaks (writes), it looks at the previous several words and probabilistically guesses what the next word should be.

      There is a clear difference between the two approaches, as should be obvious. Humans sometimes use the second approach, for example if I say, "twinkle twinkle little ____" everyone will guess automatically what the next word will be.

      • Re: (Score:2, Insightful)

        by Anonymous Coward

        When a neural network speaks (writes), it looks at the previous several words and probabilistically guesses what the next word should be.

        Dude.
        That's a Markov chain, not a neural net.

        • So what are you saying? That the neural network has an idea, and it's trying to express it in words?

          • You did describe a markov chain.

            A neural network doesnt have to be context-driven or work anything at all like that. You are imposing implementation-specifics because you read-about-one-such-implementation in a book like Artificial Life or whatever.

            What would you say if the implementation produced an audio file?

            Would you still be imagining a markov chain?

            Neural network are used as approximators of a multi-variate function. All multivariate functions can be approximated by them, not just specific kin
            • What would you say if the implementation produced an audio file?

              How does that relate to anything at all? Are you on crack? I cannot conceive of what is wrong with your mind to produce the post you produced.

    • by AmiMoJo ( 196126 )

      The issue is not the complexity, it's the knowledge that backs up what is said.

      Human beings have extensive knowledge of the world so they can fill in the unsaid parts and reject logically correct but practically unlikely meanings. It also enables them to appreciate humour and sarcasm.

      In the 80s efforts were made to teach computers about the world. It didn't go very well, the computer had trouble with a lot of basic concepts and they couldn't really figure out how to give it "common sense".

      More recently deve

  • by maiden_taiwan ( 516943 ) on Wednesday December 08, 2021 @11:07PM (#62061555)

    >...a language model with 280 billion parameters named Gopher.

    How do they tell all those parameters apart if they're all named "Gopher"?

    • How do they tell all those parameters apart if they're all named "Gopher"?

      gopher
      Gopher
      g0pher
      goph3r
      G0ph3r
      gopher2
      gopher_new
      gopher_newer
      gopher_new2a
      gopher1

      you know just like any other program.

  • ... or to paraphrase Smith: "Still using all the muscles because you haven't found the one that matters?"
  • But these programs also have serious problems, including regurgitating sexist and racist language and failing tests of logical reasoning.

    That sounds like an accurate replica of many humans to me. I'm convinced!

  • While the current neural net technology is certainly a quantum leap in machine ability, it seems to me they're still little more than very impressive statistical analysers and just don't work in the same way as a biological brain. Until we really understand how biological brains work at the data and processing level (or someone has an A-Ha! moment of genius) then I suspect from now on it'll be the law of diminishing returns with ANNs just as it was with traditional AI back in the 80s and 90s.

  • Re: "But these programs also have serious problems, including regurgitating sexist and racist language and failing tests of logical reasoning." -- They've trained an algorithm to string together words in ways that get a positive response from humans. That's all there is to it. The algorithm has no idea what it means or why it's producing those strings. It's essentially a giant bullshitter-bot and journalists are simply suffering from the ELIZA effect. I think AI is a great way to show us who we really are,
  • and machine learning killed it.

    Not real science.

  • with that many parameters, overfitting is nearly guaranteed unless they have trillions of training samples. So, it will work great on the training data, then fail miserably when it encounters something new.

    • by gweihir ( 88907 )

      with that many parameters, overfitting is nearly guaranteed unless they have trillions of training samples. So, it will work great on the training data, then fail miserably when it encounters something new.

      To be fair, statistical models only rarely deliver good results on "something new", and if they do, they do so by accident. What they excel at is dealing with cases that are somehow a mix of what was found in the input data, i.e. nothing new, but one more variant of the same thing. To expect anything else is foolish and only indicates that the "researchers" doing so are incompetent.

      Sure, this is valuable. But, for example, logical reasoning is not accessible to statistical models. Statistical models can be

  • by gweihir ( 88907 )

    The problem with these things being unable to do logic reasoning is not the size of the model used. Logic reasoning is not a language feature. Logic reasoning comes from insight and understanding and no statistical model can do that. In fact we see logic reasoning only in beings equipped with a consciousness and that is a rather large hint.

    The only thing these devices will ever be able to do is mimic conventions. (That, incidentally. is where their "racism" and "sexism" comes from: They mimic what they got

To do nothing is to be nothing.

Working...