Comment Re:Stupid is as stupid does (Score 1) 189
A bit idealistic, but nevertheless correct.
A bit idealistic, but nevertheless correct.
I wouldn't call it odd or surprising - this just reflects one of the limitations of the Transformer - that it consists of a smallish (~100) fixed number of layers of transformations (cf thought steps). If you need to try to get it to do something that requires more than 100 steps of "thought" then the old way was "think step by step" prompting, which these "thinking" models now do automatically.
Every word/token the LLM generates gets fed back in as an input, giving it another 100 layers of computation for the next token, etc, etc.
Yes, but I'm sure Anthropic is well aware that China will never slow down, and even if they could convince Trump to ban Chinese models that would just put the US at a disadvantage.
Therefore, rather than this being a plea to not ruin a good thing by racing towards a plateau, I think the main goal here is Anthropic's now typical MO of fear-mongering as marketing, trying to juice the demand for their upcoming IPO by "warning" about how powerful AI is becoming - and will continue to become - unlimited investor upside !
FWIW I think that smaller models are going to eat Anthropic's lunch, as well as that of OpenAI, with the market for increasingly capable smaller, and much cheaper, models being much larger than that for these huge SOTA models whose differentiated capabilities are going to become less and less relevant for most use cases (especially the high volume ones of coding and business automation).
Which is why it gets ridiculous. You can build an oven with a few conveniently shaped rocks. You can build an over with raw river clay, then use that oven to make better bricks, then use those bricks to make a better oven, which you can use to make even better bricks.
So, all it takes to turn somebody into a greedy capitalist exploiter is...the ability to stack rocks into a box with an opening.
The word "capitalism" wasn't coined until much later. That means two things: One, it doesn't uphold capitalism and Two, it doesn't disparage it. What is in the Constitution is fundamental rights. Capitalism is a consequence of individuals exercising those rights, up to the point where it infringes on the rights of others. Recognizing that is one of the things that made Theodore Roosevelt a great president. There is nothing un-American about wanting to reign in capitalism, but there is something decidedly un-American about wanting to destroy it wholesale, since as mentioned previously it arises from the exercise of natural rights. This is the much-hated nuance, particularly despised by the left, who seek to abolish capitalism; but also some on the right who have an agenda to give free reign to robber-barons and undo the works of T.R. and others.
It's not about physics - it's about architecture.
LLMs use the Transformer architecture, was never designed as a cognitive architecture (a brain) and is way too simple to be one.
For those unaware of the history, the Transformer was just designed to be a "better LSTM" - a sequence-to-sequence mapping architecture meant for tasks like machine translation etc. The big innovation was to make it parallel rather than sequential to make it more efficient on today's highly parallel hardware.
Now, if you apply a Transformer to the task of language prediction it is, at scale, highly capable, but at the end of the day it is just a mashup-generator recombining language patterns it was trained on into "novel" outputs.
If you want more than this, more than an auto-regressive language predictor, then you need to design something more/different than a Transformer. The obvious thing to shoot for is a brain-like cognitive architecture, not just a more efficient LSTM.
Well, they are already finding out that they've used most of the USEFUL human training data, which is why they are increasingly using self-generated data.
But sure, they could feed every human generated intelligence artifact, past and future, into these predictors, and all that will do is expand what they can predict.
What they really want to do (what some are calling "AGI") is to make something as smart as a human that can generate novel intelligent behavior by itself - they want to build the machine (a brain), not just copy what the machine is generating, otherwise the intelligence of what they are generating is limited by the intelligence of what they are copying, and super-intelligence is just a pipe dream.
In the 1980's the big fear was "Japanese 5th generation systems", some fantasy version of rule-based systems which were the "AI" of the day, taking over the world and leaving everyone else behind.
Today it's LLMs. It advises you to leave the car at home and drive to the car wash (despite the superhuman reasoning skills they want to convince you it has), but OMG it's going to take over the world.
Now, we will get to human level intelligence one day, maybe within a few decades, but this isn't it. Humans being humans they will still probably want to be social and boss each other about. The boss probably doesn't really want to manage a console full of AI agents anymore than he wants you to work from home rather than come into the office where he can lord over you.
"AI" is being used today, the exact same way it has been used historically, to refer to a machine doing something new that previously only humans were capable of. Expert Systems were the old AI.
LLMs can also do something new - generate language mash-ups (generate the closure of the patterns present in the training set), and so have been labelled as "AI", despite the fact that their most impressive narrow vertical achievements in RL-friendly areas like math are perhaps more like Deep Blue or AlphaGo - not the result of general intelligence but more reflective of a particular algorithm (in this case RL) being used in a narrow domain.
Of course just because a machine can do something new that was previously the exclusive domain of humans, doesn't mean that it can do everything than a human can do, and some of the more glaring things missing from LLMs are an ability to self-train (whether batch mode or incrementally/continually), to be creative and go beyond the closure of their training data, etc. At the end of the day an LLM is just what the name says - a language model, not a cognitive architecture, even the most trivial comparison of the underlying transformer architecture and a brain points out many things that are lacking.
Tomorrow the novelty of language models will have worn off, just as the novelty of expert systems did, LLMs will go back to being referred to as LLMs, not AI, and the label "AI" will be slapped on whatever the next generation of machine intelligence is - perhaps LLMs with continual learning.
> They are finding a plateau with where the LLMs can go
Yes, partly this, and partly this "OMG it's so powerful" narrative fits their upcoming IPO agenda
The whole "recursive" (iterative) self-improvement story is a joke, with the "recursive" wording designed to feed fantasies/fears of an upcoming singularity. In Anthropic's press release the best they can do to support this notion of "self-improvement" is to point to the accelerating numbers of lines of code their developers are generating using Claude, but of course Claude is a data-driven LLM not a GOFAI program where more lines of code could contribute to it. These voluminous lines of code they are referring to is things like Claude Code - a half-million lines of LLM-generated slop when a few thousand lines would have done the job.
The other fantasy they want you/investors to buy into is that the intelligence ceiling for an LLM is infinite, otherwise any "recursive self-improvement" (which as noted is itself a lie) would just be a race towards an asymptote/plateau.
Of course this dream of an infinite (or a massively higher ceiling) of intelligence is utterly dependent on a supporting definition of intelligence which is nowhere to be found, and certainly not from Anthropic.
While Anthropic want to scare-monger and IPO-juice about AI self-improvement, what they are actually working on and delivering speaks for itself. It's been 10 years since "attention is all you need", and yet Anthropic are still stuck on LLMs, still trying to bedazzle you with narrow achievements in math and hoping that will distract you from how utterly dumb their "AI" (language model) is in everyday tasks such as advice on how to get to the car wash.
You buy some flour for a dollar. You use that flour to bake bread. You sell that bread for two dollars.
Why the price increase? Because you've improved the capital.
Brain damage is all in your head. -- Karl Lehenbauer