

Microsoft CTO Kevin Scott Thinks LLM 'Scaling Laws' Will Hold Despite Criticism 18
An anonymous reader quotes a report from Ars Technica: During an interview with Sequoia Capital's Training Data podcast published last Tuesday, Microsoft CTO Kevin Scott doubled down on his belief that so-called large language model (LLM) "scaling laws" will continue to drive AI progress, despite some skepticism in the field that progress has leveled out. Scott played a key role in forging a $13 billion technology-sharing deal between Microsoft and OpenAI. "Despite what other people think, we're not at diminishing marginal returns on scale-up," Scott said. "And I try to help people understand there is an exponential here, and the unfortunate thing is you only get to sample it every couple of years because it just takes a while to build supercomputers and then train models on top of them."
LLM scaling laws refer to patterns explored by OpenAI researchers in 2020 showing that the performance of language models tends to improve predictably as the models get larger (more parameters), are trained on more data, and have access to more computational power (compute). The laws suggest that simply scaling up model size and training data can lead to significant improvements in AI capabilities without necessarily requiring fundamental algorithmic breakthroughs. Since then, other researchers have challenged the idea of persisting scaling laws over time, but the concept is still a cornerstone of OpenAI's AI development philosophy. Scott's comments can be found around the 46-minute mark.
LLM scaling laws refer to patterns explored by OpenAI researchers in 2020 showing that the performance of language models tends to improve predictably as the models get larger (more parameters), are trained on more data, and have access to more computational power (compute). The laws suggest that simply scaling up model size and training data can lead to significant improvements in AI capabilities without necessarily requiring fundamental algorithmic breakthroughs. Since then, other researchers have challenged the idea of persisting scaling laws over time, but the concept is still a cornerstone of OpenAI's AI development philosophy. Scott's comments can be found around the 46-minute mark.
Inductive reasoning (Score:2)
It's a true point, but things scale up until they don't. He has no real metric or reasoning to determine when it will stop scaling up, just that it's worked until now.
diminishing returns tho (Score:2)
Thing is you're going to get diminishing returns, this is surely already happening.
We've gone to Ai that can make a crude lumpy nightmare portrait to AI that can make an almost indistinguishable photo of a celebrity. Cool!
I guess the next step is to make it a high resolution photo, with details like bystander faces and trees that look perfect. But the thing is that making a 1024 pixel image go to 2048 is that its actually 4 times bigger. You're going to need 4x more GPUs, except probably a lot more because
Re: (Score:2)
Yeah, heard an analysis of how the "scaling up" immediately started to weaken with the release of ChatGPT 4. Now, it's more about applications and combining things like audio and visual/video with LLMs.
Still, such combined approaches could lead into new areas and capabilities we might not expect to "get that good, that fast" in a few years, and kick off new waves of job destruction as they grow.
Who truly knows this shape of this tsunami at this early stage? I'm still extremely worried.
Re: (Score:2)
LLMs are pretty much at peak size. The people behind them pirated all content they could lay hands on. There is no more content that could cause any real scaling. Hence, unless we find a few more Internets, scaling for LLMs has already ended.
Re: (Score:2)
The people behind them pirated all content they could lay hands on.
I did wonder about that, are they really going to find more data?
Re: (Score:2)
I honestly do not think so. Maybe they can get a factor of 2 or so, but that will have almost no effect. And the more AI generated data is in there, the worse the quality gets due to Model Collapse. Hence, because nobody can reliably identify AI generated data, they may even get less data pretty soon or will have to rely on old, outdated data. Hence we may well see the peak of what LLMs can do right now.
Don't think ... (Score:2)
... just throw more hardware at it.
Truly the simple man's approach to AI.
Re: (Score:2)
As what is needed is more data, it is even worse than that. This person does not understand how the technology works. Well, it is MS, no surprise they have a CTO that does not understand technology.
AI has now become the new dot com bubble. (Score:3)
It's something that people don't really understand and project magical understanding into. A becomes a panacea for everything.
AI is useful. But it's a black box that hasn't been quantified yet. The more data present the better the results but also the greater potential for colossal mistakes.
The tech is about given certain data guessing the results you want. It is in no way real thinking. We have similar mechanisms in our brain... for instance how we learn to ride a bike, walk, and otherwise move.
But those systems are trained by our consciousness which is composed of logic, feelings, and innate instinctual motivations. It is able to some extent self reflect and analyze in order to retrain parts of the brain.
For instance when meeting a person with an unfamiliar accent. We listen and guess what they are trying to say... we consciously analyze what we think they said, ask questions, and review everything to retrain our brains to understand the pronunciation and broken grammar.
The models will never be perfect and will always make mistakes. The trick is detecting the mistakes. And yet still some will get through... Are we ready for machines that make mistakes?
P.S. I remember talking to people about the perceptron from the 1970's... no one knew about it. I was always asking why people weren't using the tech all during the 90's....
Yes computers were slow. But it was obviously useful and we could have been quantifying it and learning best practices all that time. Now we have too much computing potential and not enough understanding.
Re: (Score:2)
One of the most important differences between how the human brain works and how an LLM works is that the human brain can tell when it's made an error. You get better at riding a bike by falling off, thinking about why you fell off, and then trying again with this new understanding. An LLM doesn't even realise it's fallen off and so can't learn from it. Maybe if a human is in the loop and can tell the LLM it fell off, then it could work out why and improve but that's not very efficient.
This is to an extent w
Re: (Score:2)
We don't need to know exactly how the brain works to know that this is a difference between humans and LLMs. Humans can learn from their mistakes while LLMs can't. This was my point. I don't think you are denying that humans can learn from their mistakes, and we don't need to know precisely how they do so to know that they do.
A lot of my post was using metaphor to describe things we don't completely understand. But it was sufficient to serve my purpose, which was to show one of the reasons why LLMs are a de
Re: (Score:2)
You are making two fundamental mistakes here: 1. You assume we know how the human brain works and 2. you assume the Physicalist world-view is accurate. Both assumptions are not part of the current scientific state-of-the-art and hence conclusions drawn from them are not scientifically valid.
Re: (Score:2)
It's hard to predict how much better they'll get - or systems that use LLMs as a component.
One simple possibility of such a system could be along the lines of a system that monitors it's main output, has another LLM critique it and does RL training based on that, with some occasional trusted human feedback training both networks (I assume this already exists)
There a good deal of smart people working in th
Re: (Score:2)
That is a very good description.
It's something that people don't really understand and project magical understanding into. A becomes a panacea for everything.
Humans have that tendency. Actual engineers get that beaten out of them, because it never works and it gets people killed or worse.
The tech is about given certain data guessing the results you want. It is in no way real thinking. We have similar mechanisms in our brain... for instance how we learn to ride a bike, walk, and otherwise move. But those systems are trained by our consciousness which is composed of logic, feelings, and innate instinctual motivations. It is able to some extent self reflect and analyze in order to retrain parts of the brain.
Exactly. For simple things, like riding a bike, everybody adds plausibility checking. Rather easy, because when you fall on your face, it is obviously not working. LLMs cannot do that step. Plausibility checking is hard and not accessible to machines at this time except for very simple things. In fact, for even somewhat more complex situations, mo
Duplicates not slowing down, just warming up (Score:1)
He Has to Believe (Score:2)
You can't disabuse someone of an idea that their salary depends on them believing. If AI doesn't continue scaling, or even simply scales less, then Microsoft just flushed billions of dollars down the drain.