
OpenAI Releases First Open-Weight Models Since GPT-2 (openai.com) 30
OpenAI has released two open-weight language models, marking the startup's first such release since GPT-2 in 2019. The models, gpt-oss-120b and gpt-oss-20b, can run locally on consumer devices and be fine-tuned for specific purposes. Both models use chain-of-thought reasoning approaches first deployed in OpenAI's o1 model and can browse the web, execute code, and function as AI agents.
The smaller 20-billion-parameter model runs on consumer devices with 16 GB of memory. Gpt-oss-120B model will require about 80 GB of memory. OpenAI said the 120-billion-parameter model performs similarly to the company's proprietary o3 and o4-mini models. The models are available free on Hugging Face under the Apache 2.0 license after safety testing that delayed their March announcement.
The smaller 20-billion-parameter model runs on consumer devices with 16 GB of memory. Gpt-oss-120B model will require about 80 GB of memory. OpenAI said the 120-billion-parameter model performs similarly to the company's proprietary o3 and o4-mini models. The models are available free on Hugging Face under the Apache 2.0 license after safety testing that delayed their March announcement.
How Do I Run It? (Score:2)
How do I run it, some OLlama setup?
16GB RAM or GPU RAM?
What's the performance like on consumer hardware? Having to wait 5 seconds for every response gets old after the third prompt.
Re: (Score:2)
Performance depends on your setup but if you can fit the model entirely into VRAM then it should run pretty well.
Re: (Score:2)
Why don't you ask Chatgpt?
Re: (Score:2)
Why don't you ask Chatgpt?
More seriously, I did at first. It told me that it couldn't be run locally. I chose not to believe it.
So either an nVIDIA A100 or a maxed M2 Mac Pro (Score:3, Interesting)
"Gpt-oss-120B model will require about 80 GB of memory."
So either an nVIDIA A100 or a maxed M2 Mac Pro.
The current nVIDIA A100 costs 10,000.oo ~15,000.oo U$D, has 80GB of VRAM, and needs a server around it.
The decked out M2 Mac Pro cost a smidge less than U$D 10,000.oo* , but it has so much RAM (192GB), that you can run inference on 2 different models of similar size (one on the GPU and one on the NPU), with left over memory for the OS.
For training, nVIDIA is the right choice, but for inference? Apple hands down.
* And you get a Keyboard and a mouse included ;-)
Re:So either an nVIDIA A100 or a maxed M2 Mac Pro (Score:4, Informative)
Actually a mouse is an extra $150 with the apple rig.
Re: (Score:2)
So to Apple fans reality is trolling? See for yourself, a mouse is a $150 upcharge: https://www.apple.com/shop/buy... [apple.com]
Re: (Score:2)
Your inability to read aside, you can use a non-apple mouse. I use a G502 with no issues.
You guys need some coffee (Score:2)
Yikes... The magic mouse is included with "the apple rig" in question
If you want the magic trackpad instead, thats $50.
If you want both, that's $150.
Re: (Score:2)
I appreciate a good nitpick but does the distinction between $99, $50, $150 really matter when the combo can't possibly cost Apple $10? For $150 you can probably find a chromebook that throws in a mouse and touchpad.
Re:So either an nVIDIA A100 or a maxed M2 Mac Pro (Score:4, Informative)
Or you can buy a Framework Desktop with 128GB of unified memory for $2,895.
Re: (Score:2)
Oooh, I like that.
Re: (Score:2)
Re: (Score:2)
If you want to run an 80GB model, a 48GB laptop will not run it, a 128GB APU will, and the rest of the comparison is irrelevant.
Re: (Score:2)
Things will get really interesting if the upcoming M5 Pros have a 96GB RAM option. My M4 Pro is fast enough for these midrange MoE models already and only lacks RAM to run them. The 120b OpenAI model is only ~63GB and the experts are ~5GB, which ought to give somewhere around 25-30 tokens/second inference speeds. If the M5 gets a RAM
Re: (Score:2)
either an nVIDIA A100 or a maxed M2 Mac Pro
Or basically any Linux box built within the last 5 years. Save yourself a pile of dough and power.
Re: (Score:2)
What CPU? How well does the CPU + 128gb ram run the models we're talking about?
Re: (Score:2)
What CPU? How well does the CPU + 128gb ram run the models we're talking about?
I can answer that: Like dog shit.
Re: (Score:3)
For ollama run gpt-oss:$SIZE --verbose --think true --hidethinking `cat prompt.txt` on a M2 Max Macbook Pro (96 GB RAM) versus a Threadripper 3960X (128 GB RAM, Geforce RTX 2080 Super):
120b model, M2 Max: 67.43 token/s prompt eval (221 tokens), 21.29 token/s output eval (8694 tokens)
20b model, M2 Max: 164.65 token/s prompt eval, 35.50 token/s output eval (4180 tokens)
120b model, 3960X: 18.08 token/s prompt eval, 6.00 token/s output eval (8916 tokens)
20b model, 3960X: 30.30 token/s prompt eval, 10.67 token/s
Re: (Score:2)
Thanks for sharing, that's fascinating. I've only started using ollama within the last month, and I'm really, really regretting I didn't load up my MBP with more ram. When I bought it, 36gb seemed like more than I would probably need..
What models are you using for coding?
Re: (Score:2)
I'm not using any models for real coding, thanks to job responsibilities that keep me too busy doing other things.
But as a consolation prize, I will note that the output "eval rate" usually drops as the number of generated tokens rises, which makes intuitive sense. For the 120b size and 3960X, between 5.5 and 6.75 token/s; for 120b and M2 Max, between 20.16 and 23.83 token/s; for 20b and 3960X, between 9.42 and 10.74 token/s; for 20b and M2 Max, between 32.86 and 36.44 token/sec. Additionally, the 120b mo
Re: (Score:2)
We tried running a 70b deep seek r1 on an Epyc server at work w/ dual CPUs totalling around 128 cores and 512gb ram.
It ran like dog shit, pondering its own navel for about 10 minutes on how to respond to "hello".
Older model CPU, but still packs a punch on pure specs alone.
Re: (Score:2)
Are people just dropping extremely high vram GPUs into linux boxes willy nilly now.
Because that seems like a weird move when you consider that outside the Apple silicon chips, there just aren't a lot of options for capable GPUs under the $20K pricetag. I mean, maybe a 4090 if you can live with the very small models?
Re: (Score:2)
Why would you buy a M2 Ultra Mac Pro with 192GB Ram, 76Core GPU ($9599) rather than an M3 Ultra Mac Studio with 256GB of Ram and 80Core GPU ($7099)?
The Pro Mac seems pointless and over priced compared to the recently updated Studio.
And "What’s in the Box" [apple.com] at the bottom of the Pro page lists Mouse (or keyboard) and keyboard.
So I'm wondering if I'm even looking at the same rig as @Shaitan or @williamyf
Re: (Score:2)
Why would you buy a M2 Ultra Mac Pro with 192GB Ram, 76Core GPU ($9599) rather than an M3 Ultra Mac Studio with 256GB of Ram and 80Core GPU ($7099)?
The Pro Mac seems pointless and over priced compared to the recently updated Studio.
And "What’s in the Box" [apple.com] at the bottom of the Pro page lists Mouse (or keyboard) and keyboard.
So I'm wondering if I'm even looking at the same rig as @Shaitan or @williamyf
No, we were not looking at the same machine. I stoped following apple closely after the switch to ARM, but should have known that they would neglect the Pro Tower, they have done it before...
Yes, the Decked up studio IS the best value, but that gives even more gravitas to my point. For training, nVIDIA, for inference, Apple.
Re: (Score:2)
I've finally* got it running on my M4 Mac Mini (Pro C14/G20/N16, 64G Ram) and it's significantly better than R1 using the same questions.
Much more detailed answers and very nicely formatted output. That's for 20b and comparing it to deepseek-r1:70b.
Speed was ok, not blazing.
response tokens/sec was ~30, prompt tokens/sec ~175
I would guess it runs well too on an entry level M4 Mac Mini (C10/G10/N16) with its, now minimum, 16G ram.
* (Ollama got to 95% and then took 2 hours to get the last 5% downloaded. Everyo
Re: (Score:2)
Also go it to tell me it's training cutoff: June 2024
Re: (Score:2)
What speeds did you see with your setup? response/prompt tokens/sec?
I found that the 70b R1 wasn't much different than the 32b or 14b in answers just slower in response.
It would be interesting to see comparisons between 120b and 20b gpts.
Ran it. Its neutered by "policy" that's harmful (Score:2, Interesting)
So I just loaded them on my local dev machine and it looks like it's a very capable model but it's also limited by the fact that it's been absolutely neutered by some kind of policy architecture that not only does a pre pass on your input to see if it's even allowed to be inferred on, but they actively then in the model will actually modify what you asked it to feed it a different version of your input and then even then it will still possibly censor its own output.
I expected this kind of censorship from Ch
Re: (Score:1)
Yeah, their model card pretty proudly boasts of how thoroughly they have censored the output: https://openai.com/index/gpt-o... [openai.com]