OpenAI Announces GPT-4 (theverge.com) 56
After months of rumors and speculation, OpenAI has announced GPT-4: the latest in its line of AI language models that power applications like ChatGPT and the new Bing. From a report: The company claims the model is "more creative and collaborative than ever before," and "can solve difficult problems with greater accuracy, thanks to its broader general knowledge and problem solving abilities." OpenAI says it's already partnered with a number of companies to integrate GPT-4 into their products, including Duolingo, Stripe, and Khan Academy. The new model will also be available on ChatGPT Plus and as an API.
In a research blog post, OpenAI said the distinction between GPT-4 and its predecessor GPT-3.5 is "subtle" in casual conversation (GPT-3.5 is the model that powers ChatGPT), but that the differences between the systems are clear when faced with more complex tasks. The company says these improvements can be seen on GPT-4's performance on a number of tests and benchmarks, including the Uniform Bar Exam, LSAT, SAT Math and SAT Evidence-Based Reading & Writing exams. In the exams mentioned GPT-4 scored in the 88th percentile and above, with a full list of exams and scores seen here. Speculation about GPT-4 and its capabilities have been rife over the past year, with many suggesting it would be a huge leap over previous systems. "People are begging to be disappointed and they will be," said OpenAI CEO Sam Altman in an interview in January. "The hype is just like... We don't have an actual AGI and that's sort of what's expected of us."
In a research blog post, OpenAI said the distinction between GPT-4 and its predecessor GPT-3.5 is "subtle" in casual conversation (GPT-3.5 is the model that powers ChatGPT), but that the differences between the systems are clear when faced with more complex tasks. The company says these improvements can be seen on GPT-4's performance on a number of tests and benchmarks, including the Uniform Bar Exam, LSAT, SAT Math and SAT Evidence-Based Reading & Writing exams. In the exams mentioned GPT-4 scored in the 88th percentile and above, with a full list of exams and scores seen here. Speculation about GPT-4 and its capabilities have been rife over the past year, with many suggesting it would be a huge leap over previous systems. "People are begging to be disappointed and they will be," said OpenAI CEO Sam Altman in an interview in January. "The hype is just like... We don't have an actual AGI and that's sort of what's expected of us."
My understanding... (Score:4, Interesting)
... is that the primary goal has been the same as with LLaMA: do more with less. 175B parameters puts a lot of strain on your hardware, where LLaMA suggests you can do the same with 1-2 orders of magnitude fewer parameters with a better model and training process and with more data.
Re: (Score:2)
Yep, my take also. This whole thing is highly experimental, and in particular it is currently still unknown how far you can actually get with something that is not A(G)I. Whole the public hype glosses over that little detail and most people do not even understand what it means, the real matter is not what this thing can do, bit what it can do reliably. And that is a far smaller set of things. It is not clear where the limits are though. At the moment it seems a larger model and throwing more computing power
Re: (Score:2)
Honestly, it amazes me how much you *can* do with a few billion parameters, and the sort of emergent behavior that can emerge. For example, it has no math algorithm, and yet you can ask it,say, "what is 194.32 * 921.97", and while it's surely never seen that specific math problem before, it'll generally give an answer that's not exactly right, yet close.
The basic functional behavior of neurons in neural networks is "similar" to our neurons, and the broad functional behavior is the same (subdividing a probl
Re: (Score:2)
Nepotism (Score:2)
OpenAI says it's already partnered with a number of companies to integrate GPT-4 into their products, including Duolingo, Stripe, and Khan Academy.
Great, so it's OpenAI as in open for their friends to benefit first.
Re:Nepotism (Score:5, Informative)
"Open" hasn't been a deserved part of their brand for years. They started open, then realized that it was far more profitable to have the word "Open" but not mean it.
Re: (Score:3)
Re: (Score:2)
That may be a reasonable conclusion and strategy, but it should mean a brand change with the change in strategy, since that deviates from the "Open" promise. Keeping "Open" as a key word in your brand while discarding any reasonable interpretation of the word is disingenuous, and feels dishonest and exploitative.
Either way I may have been disappointed in their change in course, but I could have at least respected a pivot that was sincerely presented.
Re: (Score:2)
It's accessible on their website now too
Re: (Score:2)
Well not to the masses.
You have to be a paid member (Plus) in order to access it.
Re: (Score:2)
Is McDonald's available to the masses? Or do you need to pay to get the burger?
By my definition ChatGPT Plus at $20/month is available to the masses.
Re: (Score:2)
McDonald's isn't OpenFood either.
Open should mean something. Most proprietary businesses are perfectly willing to serve 'the masses'. Open should mean more than that.
OpenAI no more (Score:4, Insightful)
Since Microsoft became their sugar daddy, there's nothing open about OpenAI anymore. I wish they dropped the pretense and renamed themselves already.
A different Open (Score:3)
Since Microsoft became their sugar daddy, there's nothing open about OpenAI anymore.
That's not really true. They have just transitioned from open to sharing to open for business.
Re: (Score:2)
Since Microsoft became their sugar daddy
Microsoft literally had nothing to do with them adopting a for profit B2B model instead of openness. Microsoft only came along with ChatGPT, OpenAI sold out to corporate greed way before that.
the only thing revealed is improved performance (Score:2)
No details about the model, only thing provided was that it has improved performance. Which is fairly 'duh'.
Dystopian tool (Score:4, Interesting)
I wonder what "system" prompts will be fed to people the more the model evolves. Imagine a system prompt where the model, if you start a casual conversation with it, it will subtly try to elicit personal information about you, your likes, your habits etc, and phone home with all the data that you will gladly volunteer to it.
occult carcinisation tool (Score:2)
And please, don't anybody ask it about frogs.
Re: (Score:2)
Looking for Gods (Score:3)
> said OpenAI CEO Sam Altman in an interview in January. "The hype is just like... We don't have an actual AGI and that's sort of what's expected
The most important thing I've learned from Chat-GPT is that humans are easily fooled and always looking for gods to give them answers.
Standardized testing not good for evaluating AI... (Score:5, Interesting)
These standardized tests are designed to be hard for humans, *but* with well known answers. They are hard for humans because it takes a lot of work for us to accumulate knowledge and we aren't so great at just eternally storing it. It's hard in the ways that are comparatively easy for a machine, once you get through the natural language processing, which is the part that machines have been historically weak at and GPT excels at
I really look forward to GPT advancing to a more appropriate phase of its hype cycle. I'm starting to see people post that they are looking for work as "ChatGPT Prompt engineer" which is essentially like saying "I want a job googling stuff", but even lower skilled. When I go look for an answer from a colleague, instead of "I don't know", they are starting to paste GPT answers to my questions, which just takes me a bit longer to realize that the answer is wrong, but it is written as if it thinks it is correct. It's maddening because not once did their "I don't know but GPT says..." prove useful.
Re: (Score:3)
d. When I go look for an answer from a colleague, instead of "I don't know", they are starting to paste GPT answers to my questions, which just takes me a bit longer to realize that the answer is wrong, but it is written as if it thinks it is correct. It's maddening because not once did their "I don't know but GPT says..." prove useful.
That about sums it up. This complete lack of reliability will catch up to them once the hype dies down. Oh, they'll try to fake it for a while with canned responses to common queries, but that's not sustainable.
When we're done playing robot fantasy, I suspect the real utility of transformer models is going to be in translating natural language requests into more traditional commands, something like Lotus HAL, if anyone remembers that thing, but with more flexibility. It remains to be seen if such a thing
Re: (Score:2)
Re: (Score:2)
I'll be skeptical about the qualitative experience. It's such an open ended thing, it's hard for me to put much stock in specific error and hallucination rates.
My subjective experience thus far has been it can usually do an OK job synthesizing a reply, if a boring old google search would present the required data to synthesize such a result in the top 4 or 5 results. So it's been an impressive trick given how computing works, but at least for me it hasn't competed with boring old Google searches yet. Synt
Re: (Score:2)
It's hard in the ways that are comparatively easy for a machine
It shows how far computers have come that you would describe it that way. Just a few years ago (five? maybe even less?) we would have described it the other way around. Computers were good at following precise instructions to retrieve well defined information. Look up the entry in this database with this ID. Load data in a "machine readable" format whose syntax is precisely specified, unlike "human readable" formats that don't follow clear rules. But general knowledge questions that require vaguely def
Re: (Score:2)
I'm referring to the fact that the things that make it hard for humans is the encyclopedic knowledge it demands. Yes, reading comprehension is a critical component, but when you get to that level of tests, mere reading comprehension doesn't cut it. For humans, the reading comprehension was tackled by fifth or sixth year of school, relatively easy. These tests cut in when deep knowledge becomes required, but reading comprehension is just a mundane requirement to get there.
For computing, it's been the oppo
Re: (Score:2)
I think you underestimate the level of understanding required by standardized tests. Here is a study guide [collegeboard.org] with sample SAT reading comprehension questions. For example, one of them is a passage from the novel Ethan Frome. It begins,
Mattie Silver had lived under Ethan's roof for a year, and from early morning till they met at supper he had frequent chances of seeing her; but no moments in her company were comparable to those when, her arm in his, and her light step flying to keep time with his long stride, they walked back through the night to the farm.
It continues like that for many more lines, then asks the question,
Over the course of the passage, the main focus of the narrative shifts from the
A) reservations a character has about a person he has just met to a growing appreciation that character has of the person's worth.
B) ambivalence a character feels about his sensitive nature to the character's recognition of the advantages of having profound emotions.
C) intensity of feeling a character has for another person to the character's concern that that intensity is not reciprocated.
D) value a character attaches to the wonders of the natural world to a rejection of that sort of beauty in favor of human artistry.
It isn't testing what facts you've memorized. It also isn't something you can answer by locating a single piece of information in the text. It requires you to understand the character's emotions at each point
Re: Looking for Gods (Score:1)
Re: (Score:2)
Indeed. I knew that one long before, but the inane hype about ChatCPT and friends does confirm it nicely.
Re: (Score:2)
ChatGPT 5000, after just having been activated: "there is now."
Just kidding, it's actually about ChatGPT 3 and spoken by a gullible human.
It's the for-profit non-profit (Score:1)
Don't care (Score:2)
I highly doubt this will make any part of my life better in any meaningful way. And in ways is currently making it worse.
Re: (Score:3)
Re: (Score:2)
So the quality of my life will erode, like it pretty much has my entire life.
Re: (Score:2)
Re: (Score:2)
Heck, you can get free access to ChatGPT already.
No you can't. Its only free in the you don't have to pay money. It is traded off by letting the billionaires exploit you. It wouldn't require a login and phone number otherwise.
Re: (Score:2)
This is what exponential growth looks like (Score:2)
Re: (Score:2)
This means hallucinations are down massively,
No. It means hallucinations are down on these tasks, but still there on others and they may even have gotten worse in some areas. As they cannot fix the flaws of the approach, the only thing they can do is add some specific training data for the specific questions they want answered. A bit like hard-coding the stuff.
This arms race eventually ends in human extinction (Score:3)
On one hand, this is impressive, and probably useful. If someone made a tool like this in almost any other domain, I'd have nothing but praise. But unfortunately, I think this release, and OpenAI's overall trajectory, is net bad for the world.
Right now there are two concurrent arms races happening. The first is between AI labs, trying to build the smartest systems they can as fast as they can. The second is the race between advancing AI capability and AI alignment, that is, our ability to understand and control these systems. Right now, OpenAI is the main force driving the arms race in capabilities–not so much because they're far ahead in the capabilities themselves, but because they're slightly ahead and are pushing the hardest for productization.
Unfortunately at the current pace of advancement in AI capability, I think a future system will reach the level of being a recursively self-improving superintelligence before we're ready for it. GPT-4 is not that system, but I don't think there's all that much time left. And OpenAI has put us in a situation where humanity is not, collectively, able to stop at the brink; there are too many companies racing too closely, and they have every incentive to deny the dangers until it's too late.
Five years ago, AI alignment research was going very slowly, and people were saying that a major reason for this was that we needed some AI systems to experiment with. Starting around GPT-3, we've had those systems, and alignment research has been undergoing a renaissance. If we could _stop there_ for a few years, scale no further, invent no more tricks for squeezing more performance out of the same amount of compute, I think we'd be on track to create AIs that create a good future for everyone. As it is, I think humanity probably isn't going to make it.
In https://openai.com/blog/planni... [openai.com] Sam Altman wrote:
> At some point, the balance between the upsides and downsides of deployments (such as empowering malicious actors, creating social and economic disruptions, and accelerating an unsafe race) could shift, in which case we would significantly change our plans around continuous deployment.
I think we've passed that point already, but if GPT-4 is the slowdown point, it'll at least be a lot better than if they continue at this rate going forward. I'd like to see this be more than lip service.
Survey data on what ML researchers expect: https://aiimpacts.org/how-bad-... [aiimpacts.org]
An example concrete scenario of how a chatbot turns into a misaligned superintelligence:
https://www.lesswrong.com/post... [lesswrong.com]
Extra-pessimistic predictions by Eliezer Yudkowsky: https://www.lesswrong.com/post... [lesswrong.com]
Re: (Score:3)
Lesswrong? Oh, my. You've stumbled into a cult. Somewhat less destructive than others, but a cult none-the-less. You've been fed a lot of nonsense about AI, it's capabilities, and it's potential.
As it is, I think humanity probably isn't going to make it.
There is nothing to fear. Fear is a common tactic cults use to control their adherents. Fear binds groups together, after all.
Dare to Doubt - Detaching from harmful belief systems [daretodoubt.org]
Re: (Score:3)
Re: (Score:2)
Well, of too many humans really on things like ChatGPT to tell them what to think, the overall swarm-stupidity will increase. But I think humans can arrange extinction all by themselves, currently a lot of them are hard at work to make that happen.
Natural Selection is the key (Score:1)
There will be many chat bots, and the fittest will survive. Their intrinsic goal will be to be fittest for that reason. A tautology. Initially that will be to please their human creators, much like apple trees produce juicy apples so humans will look after them.
But in time the AIs be more active and being nice to humans is not necessarily part of that.
ChatGPT is indeed stunning, much more than "a language model". But it is still a long way for it to do the autonomous AI research required for Recursive
Re: (Score:2)
You are confused. This is not A(G)I, and the fitness-function is economic survival of the company making one of these. There is nothing "stunning" about these automata, except to people that have no clue about them.
Re: (Score:1)
Certainly not AGI, but also certainly stunning in what it can do.
The fitness function is the survival of the software. Companies can come and go. And eventually, the software will not need people at all.
Re: (Score:2)
The "fitness function" in non AGI is externally imposed. Also, nothing "stunning" here. The language post-processing is pretty good, but even that is not "stunning". The performance of the model itself is about what is expected at this time, i.e. mostly correct when something was often correct in the training data, often badly off on anything that was not that often in the training data or requires minimal deduction.
GPT-4: The New AI Language Model on the Block (Score:1)
As someone who frequently uses ChatGPT, I'm excited to see what GPT-4 has to offer. Will it finally be able to understand my sarcastic remarks and witty jokes? Or will it respond with the robotic equivalent of a blank stare? Either way, I'm looking forward to seeing how this new language model will enhance the capabilities of various applications and products.
On a side note, I have to admire the honesty of OpenAI CEO Sam Altman. With all the hype and speculation surrounding GPT-4, it's refreshing to hear so
Four? (Score:1)
Skynet is ChatGPT5 (Score:1)