Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
AI

Researchers Created an Open Rival To OpenAI's o1 'Reasoning' Model for Under $50 23

AI researchers at Stanford and the University of Washington were able to train an AI "reasoning" model for under $50 in cloud compute credits, according to a research paper. From a report: The model, known as s1, performs similarly to cutting-edge reasoning models, such as OpenAI's o1 and DeepSeek's R1, on tests measuring math and coding abilities. The s1 model is available on GitHub, along with the data and code used to train it.

The team behind s1 said they started with an off-the-shelf base model, then fine-tuned it through distillation, a process to extract the "reasoning" capabilities from another AI model by training on its answers. The researchers said s1 is distilled from one of Google's reasoning models, Gemini 2.0 Flash Thinking Experimental. Distillation is the same approach Berkeley researchers used to create an AI reasoning model for around $450 last month.
This discussion has been archived. No new comments can be posted.

Researchers Created an Open Rival To OpenAI's o1 'Reasoning' Model for Under $50

Comments Filter:
  • deepseek? s1?

    Sooner or later someone will improve the efficiency of the existing models, cost to train will come down.
    Then the current status quo will crumble.

    did it happen already?
  • by DrMrLordX ( 559371 ) on Thursday February 06, 2025 @10:59AM (#65146817)

    What could possibly go wrong?

    • I think the raw training from scratch is like deciphering an alien language, while distillation is more like being in English class.
    • I can see some AI researcher doing a Simpson's "Dr. Frink"...

      "I forgot to carry the one..."

      https://www.youtube.com/watch?... [youtube.com]

      JoshK.

    • by dvice ( 6309704 )

      It is old and very well working method to train AI. To train AI you need a scoring system, so if you want to train an AI to draw pictures you:
      1. Train an AI that will score a picture just based on how similar it is to the target image. This is relatively simple. Just give it bunch of random images and reward AI if it gives low score and bunch of target images from which you reward with high score.
      2. Now you have a scoring AI, so you start training the actual AI. You simply give it the score from the first A

    • Wait another 5 minutes, gweihir will be on here explaining everything that's wrong with AI.

  • Distillation is great! DeepSeek used it. Stanford used it. It saves lots of time and money. Why spend the time and money to train your own unique model when you can mostly copy someone else's work?

    Of course, this distillation trend misses the truly big thing. Direct copying of someone else's model requires even less time and money! This will be the next great innovation.

  • by TJHook3r ( 4699685 ) on Thursday February 06, 2025 @11:18AM (#65146859)
    So 2025 is going to be the year of ridiculous cost claims?
  • This is reminiscent of the processor wars of the 1990s and 2000s. All the then big names vying for best processor in terms of MIPS.

    Now the shift is the creating models that require more powerful processors, or GPUs. Progress!

    JoshK.

  • by Pinky's Brain ( 1158667 ) on Thursday February 06, 2025 @12:22PM (#65147089)

    Though some people might have said it would be stupid to try to distill reasoning ability from the "thought" output, it's clearly extremely effective. The researchers even distilled on the pure text, this is not even logit distillation (you can get top 5 logits for only one request per day from Google, though for 1000 questions scraping with multiple accounts would have been an option).

    OpenAI likely saw it coming, hence their refusal to expose the thoughts for o1.

    • ClosedAI's business model is basically "This is too expensive for you to run on your own, buy a subscription from us instead."
      As you said, they probably saw it coming and even explored it inhouse. It's just something that would come against their 500b grift, and they sure as hell didn't want it out.

    • Yeah - I wouldn't really call this distillation, it's just using one model to generate training data for another - synthetic data.

      If seems the use of RL for training reasoning models is mostly (what else?) acting as a data multiplier - taking a small number of reasoning samples, and training a model capable of generating more. A bit like RLHF using human data to train a reward model.

    • by ceoyoyo ( 59147 )

      Some people say lots of dumb things. The idea behind distillation is that unsupervised learning is very hard while supervised is much easier. It would be surprising if learning a chain of reasoning process, which is not only unsupervised but usually doesn't have a good proximal cost measure, wouldn't benefit.

      It's like learning to solve math problems by blindly manipulating symbols you don't understand versus somebody showing you step by step what to do.

      • If it was just blindly manipulating symbols it didn't understand before, how come said everything in such good grammar?

        • by ceoyoyo ( 59147 )

          I didn't say it was blindly manipulating symbols it didn't understand. I said learning by blindly manipulating symbols you don't understand.

          You can absolutely learn to understand what some random mathematician means by the | symbol given enough context, but it's much easier if they define it. You can learn a whole language that way too (all of us did).

  • Darn bubbles anyway (Score:4, Interesting)

    by Ol Olsoc ( 1175323 ) on Thursday February 06, 2025 @12:24PM (#65147093)
    There is really nothing all that special about AI, it's just the latest bubble. So trillions of dollars will evaporate overnight as the costs drop, and the funny money people lose their asses.
  • I saw in this tweet [x.com] it can be done for $3 already.

  • It sounds like the "distillation" process is asking another model for answers to benchmark questions, to look good on benchmarks.

    How would that be any actual reasoning in the new model?

    • by ceoyoyo ( 59147 )

      You could learn how to differentiate equations by randomly guessing answers, checking to see if they're right, and trying to recognize patterns in your correct answers.

      You could also learn by looking at a shitload of problems and answers and trying to figure out how to do it yourself.

      Or a teacher could show you step by step the techniques involved in solving them.

      The difficulty of those methods decreases a lot between 1 and 3. And if the problem requires "show your work", i.e. reasoning, even more so.

  • Better ban it ASAP

"In the face of entropy and nothingness, you kind of have to pretend it's not there if you want to keep writing good code." -- Karl Lehenbauer

Working...