Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
AI

OpenAI Releases First Open-Weight Models Since GPT-2 (openai.com) 30

OpenAI has released two open-weight language models, marking the startup's first such release since GPT-2 in 2019. The models, gpt-oss-120b and gpt-oss-20b, can run locally on consumer devices and be fine-tuned for specific purposes. Both models use chain-of-thought reasoning approaches first deployed in OpenAI's o1 model and can browse the web, execute code, and function as AI agents.

The smaller 20-billion-parameter model runs on consumer devices with 16 GB of memory. Gpt-oss-120B model will require about 80 GB of memory. OpenAI said the 120-billion-parameter model performs similarly to the company's proprietary o3 and o4-mini models. The models are available free on Hugging Face under the Apache 2.0 license after safety testing that delayed their March announcement.

OpenAI Releases First Open-Weight Models Since GPT-2

Comments Filter:
  • How do I run it, some OLlama setup?

    16GB RAM or GPU RAM?

    What's the performance like on consumer hardware? Having to wait 5 seconds for every response gets old after the third prompt.

    • by EvilSS ( 557649 )
      VRAM (GPU) or unified (Apple or the new AMD "Strix Halo" PCs) is what you want. You can load models into system RAM but it's going to be painful. For running them Ollama is the standard. If you want something with a GUI (Ollama is getting one soon) LM Studio is probably the easiest.

      Performance depends on your setup but if you can fit the model entirely into VRAM then it should run pretty well.
    • Why don't you ask Chatgpt?

      • Why don't you ask Chatgpt?

        More seriously, I did at first. It told me that it couldn't be run locally. I chose not to believe it.

  • by williamyf ( 227051 ) on Tuesday August 05, 2025 @03:58PM (#65568390)

    "Gpt-oss-120B model will require about 80 GB of memory."

    So either an nVIDIA A100 or a maxed M2 Mac Pro.

    The current nVIDIA A100 costs 10,000.oo ~15,000.oo U$D, has 80GB of VRAM, and needs a server around it.
    The decked out M2 Mac Pro cost a smidge less than U$D 10,000.oo* , but it has so much RAM (192GB), that you can run inference on 2 different models of similar size (one on the GPU and one on the NPU), with left over memory for the OS.

    For training, nVIDIA is the right choice, but for inference? Apple hands down.

    * And you get a Keyboard and a mouse included ;-)

    • by Shaitan ( 22585 ) on Tuesday August 05, 2025 @04:22PM (#65568428)

      Actually a mouse is an extra $150 with the apple rig.

      • by Shaitan ( 22585 )

        So to Apple fans reality is trolling? See for yourself, a mouse is a $150 upcharge: https://www.apple.com/shop/buy... [apple.com]

        • by EvilSS ( 557649 )
          First of all, it's $99: https://www.apple.com/shop/pro... [apple.com] Still a stupid price but not the $150 you quoted since that was for a track pad and the mouse. The Mac Pro you linked to includes the mouse in the price.

          Your inability to read aside, you can use a non-apple mouse. I use a G502 with no issues.
          • Yikes... The magic mouse is included with "the apple rig" in question

            If you want the magic trackpad instead, thats $50.

            If you want both, that's $150.

            • by Shaitan ( 22585 )

              I appreciate a good nitpick but does the distinction between $99, $50, $150 really matter when the combo can't possibly cost Apple $10? For $150 you can probably find a chromebook that throws in a mouse and touchpad.

    • by Guspaz ( 556486 ) on Tuesday August 05, 2025 @04:29PM (#65568444)

      Or you can buy a Framework Desktop with 128GB of unified memory for $2,895.

      • Oooh, I like that.

      • Costs more than my binned M4 Pro/48GB Macbook Pro, is slower, 2-3X the power during inference, and isn't a (really nice) laptop ... but more RAM!
        • by Guspaz ( 556486 )

          If you want to run an 80GB model, a 48GB laptop will not run it, a 128GB APU will, and the rest of the comparison is irrelevant.

          • True. But given how much these OpenAI models are getting slagged (supported by my own testing with the 20b model), I'll stick with Qwen3 30b a3b 8-bit MLX @ ~32GB file size.

            Things will get really interesting if the upcoming M5 Pros have a 96GB RAM option. My M4 Pro is fast enough for these midrange MoE models already and only lacks RAM to run them. The 120b OpenAI model is only ~63GB and the experts are ~5GB, which ought to give somewhere around 25-30 tokens/second inference speeds. If the M5 gets a RAM
    • either an nVIDIA A100 or a maxed M2 Mac Pro

      Or basically any Linux box built within the last 5 years. Save yourself a pile of dough and power.

      • What CPU? How well does the CPU + 128gb ram run the models we're talking about?

        • by EvilSS ( 557649 )

          What CPU? How well does the CPU + 128gb ram run the models we're talking about?

          I can answer that: Like dog shit.

        • by Entrope ( 68843 )

          For ollama run gpt-oss:$SIZE --verbose --think true --hidethinking `cat prompt.txt` on a M2 Max Macbook Pro (96 GB RAM) versus a Threadripper 3960X (128 GB RAM, Geforce RTX 2080 Super):

          120b model, M2 Max: 67.43 token/s prompt eval (221 tokens), 21.29 token/s output eval (8694 tokens)
          20b model, M2 Max: 164.65 token/s prompt eval, 35.50 token/s output eval (4180 tokens)
          120b model, 3960X: 18.08 token/s prompt eval, 6.00 token/s output eval (8916 tokens)
          20b model, 3960X: 30.30 token/s prompt eval, 10.67 token/s

          • Thanks for sharing, that's fascinating. I've only started using ollama within the last month, and I'm really, really regretting I didn't load up my MBP with more ram. When I bought it, 36gb seemed like more than I would probably need..

            What models are you using for coding?

            • by Entrope ( 68843 )

              I'm not using any models for real coding, thanks to job responsibilities that keep me too busy doing other things.

              But as a consolation prize, I will note that the output "eval rate" usually drops as the number of generated tokens rises, which makes intuitive sense. For the 120b size and 3960X, between 5.5 and 6.75 token/s; for 120b and M2 Max, between 20.16 and 23.83 token/s; for 20b and 3960X, between 9.42 and 10.74 token/s; for 20b and M2 Max, between 32.86 and 36.44 token/sec. Additionally, the 120b mo

          • We tried running a 70b deep seek r1 on an Epyc server at work w/ dual CPUs totalling around 128 cores and 512gb ram.

            It ran like dog shit, pondering its own navel for about 10 minutes on how to respond to "hello".

            Older model CPU, but still packs a punch on pure specs alone.

      • Or basically any Linux box built within the last 5 years. Save yourself a pile of dough and power.

        Are people just dropping extremely high vram GPUs into linux boxes willy nilly now.

        Because that seems like a weird move when you consider that outside the Apple silicon chips, there just aren't a lot of options for capable GPUs under the $20K pricetag. I mean, maybe a 4090 if you can live with the very small models?

    • by seoras ( 147590 )

      Why would you buy a M2 Ultra Mac Pro with 192GB Ram, 76Core GPU ($9599) rather than an M3 Ultra Mac Studio with 256GB of Ram and 80Core GPU ($7099)?
      The Pro Mac seems pointless and over priced compared to the recently updated Studio.

      And "What’s in the Box" [apple.com] at the bottom of the Pro page lists Mouse (or keyboard) and keyboard.
      So I'm wondering if I'm even looking at the same rig as @Shaitan or @williamyf

      • Why would you buy a M2 Ultra Mac Pro with 192GB Ram, 76Core GPU ($9599) rather than an M3 Ultra Mac Studio with 256GB of Ram and 80Core GPU ($7099)?
        The Pro Mac seems pointless and over priced compared to the recently updated Studio.

        And "What’s in the Box" [apple.com] at the bottom of the Pro page lists Mouse (or keyboard) and keyboard.
        So I'm wondering if I'm even looking at the same rig as @Shaitan or @williamyf

        No, we were not looking at the same machine. I stoped following apple closely after the switch to ARM, but should have known that they would neglect the Pro Tower, they have done it before...

        Yes, the Decked up studio IS the best value, but that gives even more gravitas to my point. For training, nVIDIA, for inference, Apple.

        • by seoras ( 147590 )

          I've finally* got it running on my M4 Mac Mini (Pro C14/G20/N16, 64G Ram) and it's significantly better than R1 using the same questions.
          Much more detailed answers and very nicely formatted output. That's for 20b and comparing it to deepseek-r1:70b.
          Speed was ok, not blazing.
          response tokens/sec was ~30, prompt tokens/sec ~175
          I would guess it runs well too on an entry level M4 Mac Mini (C10/G10/N16) with its, now minimum, 16G ram.

          * (Ollama got to 95% and then took 2 hours to get the last 5% downloaded. Everyo

          • by EvilSS ( 557649 )
            I'm running the 120B and definitely better than the latest R1. It's really chatty though. By default it acts like it's trying to write a paper. I had up the token cap twice (once to 4K then after it blew past that one a single answer I just maxed it to 131K). Getting 40-50 tokens/sec on a M4 Max Macbook pro with 128GB RAM.

            Also go it to tell me it's training cutoff: June 2024
            • by seoras ( 147590 )

              What speeds did you see with your setup? response/prompt tokens/sec?
              I found that the 70b R1 wasn't much different than the 32b or 14b in answers just slower in response.
              It would be interesting to see comparisons between 120b and 20b gpts.

  • by Anonymous Coward

    So I just loaded them on my local dev machine and it looks like it's a very capable model but it's also limited by the fact that it's been absolutely neutered by some kind of policy architecture that not only does a pre pass on your input to see if it's even allowed to be inferred on, but they actively then in the model will actually modify what you asked it to feed it a different version of your input and then even then it will still possibly censor its own output.

    I expected this kind of censorship from Ch

Surprise your boss. Get to work on time.

Working...