Hugging Face Researchers Are Trying To Build a More Open Version of DeepSeek's AI 'Reasoning' Model 32

Posted by msmash on Tuesday January 28, 2025 @05:45PM from the new-race-begins dept.

Hugging Face researchers are attempting to recreate DeepSeek's R1 artificial intelligence model in an open-source format, just days after the Chinese AI lab's release sent markets soaring. The project, called Open-R1, aims to replicate R1's reasoning capabilities while making its training data and code publicly available. DeepSeek's R1 model, which matches or surpasses OpenAI's o1 on several benchmarks, was released with a permissive license but keeps its underlying architecture private. Hugging Face will use its research server with 768 Nvidia H100 GPUs for the effort.

Hugging Face Researchers Are Trying To Build a More Open Version of DeepSeek's AI 'Reasoning' Model

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 32 Comments Log In/Create an Account

Comments Filter:

Soaring (Score:4, Interesting)

by GrahamJ ( 241784 ) writes: on Tuesday January 28, 2025 @06:08PM (#65126219)

I donâ(TM)t think this word means what you think it means

- Re: (Score:2, Insightful)
  
  by Rinnon ( 1474161 ) writes:
  
  Ditto "reasoning".
  - Re: (Score:2)
    
    by DamnOregonian ( 963763 ) writes:
    
    noun: reasoning; plural noun: reasonings
    
    the action of thinking about something in a logical, sensible way.
    
    Sure does.
- Re: (Score:3)
  
  by Drethon ( 1445051 ) writes:
  
  I donâ(TM)t think this word means what you think it means
  The AI probably doesn't either.
  - China regulations for AI to be approved in China (Score:2)
    
    by will4 ( 7250692 ) writes:
    
    https://carnegieendowment.org/... [carnegieendowment.org]
    - The regulation includes a number of vague censorship requirements, such as that deep synthesis content “adhere to the correct political direction,” not “disturb economic and social order,” and not be used to generate fake news. When such content “might cause confusion or mislead the public,” it must include a “conspicuous label in a reasonable position” to alert the public that it was synthetically generated.
    https://www.techn [technologyreview.com]
    - Human verified question and answers (Score:2)
      
      by will4 ( 7250692 ) writes:
      
      What's the value in a vast list of questions and correct answers which have been human verified?
      How much is that worth to companies or organizations building AI training data sets?
You'll kill us all! (Score:5, Funny)

by shino6 ( 2368984 ) writes: on Tuesday January 28, 2025 @06:18PM (#65126243)

I keep reading that as Face Hugger Reseachers.

- Re: (Score:3)
  
  by sound+vision ( 884283 ) writes:
  
  Yeah, I'm not sure what they were going for with that one. Terrible name. I don't even know what movie that's from (Alien I think?) but upon reading the name I reflexively reached for a crowbar to bat away headcrabs.
  - Re: (Score:3)
    
    by mattr ( 78516 ) writes:
    
    Apparently hugging face is the name of an emoji.. a face with two hands. Not sure if that is why they called it so though.
It's just a transformer (Score:2)

by SpinyNorman ( 33776 ) writes:

There's nothing secret about the architecture - you can download the weights and run it for yourself (which means you're creating an instance of the architecture, and loading the weights into it).
The biggest barrier to replication, which may or may not turn out to be an issue, is the training data needed for the various training phases. DeepSeek use quite an elaborate training process/pipeline, starting with their DeepSeek-V3 base mode, but creating two intermediate models before arriving at the final DeepS
- Re: (Score:3)
  
  by larryjoe ( 135075 ) writes:
  
  From the hugging face blog [huggingface.co]:
  However, the DeepSeek-R1 release leaves open several questions about:
  Data collection: How were the reasoning-specific datasets curated?
  Model training: No training code was released by DeepSeek, so it is unknown which hyperparameters work best and how they differ across different model families and scales.
  Scaling laws: What are the compute and data trade-offs in training reasoning models?
- Re: (Score:2)
  
  by Shaitan ( 22585 ) writes:
  
  "The biggest barrier to replication, which may or may not turn out to be an issue, is the training data needed for the various training phases."
  One would hope the barrier to replication turns out to be nothing. But it is important to replicate, not just to validate the technology but so that a genuine open, uncensored, and untainted model exists. That isn't necessarily on the folks at DeepSeek, they can't help where they are, but nobody sane will use this tainted model for much of anything. The objective in
- Re: (Score:2)
  
  by tomkost ( 944194 ) writes:
  
  Obligitory link: https://youtu.be/LoheCz4t2xc?s... [youtu.be]
- Re: (Score:2)
  
  by haruchai ( 17472 ) writes:
  
  BLM? Whut?
data collection (Score:2, Troll)

by ZipNada ( 10152669 ) writes:

DeepSeek R1 collects a significant amount of personal data and ships it off to servers in China. A truly open source implementation would presumably eliminate that.
https://mashable.com/article/d... [mashable.com]
Not only does DeepSeek collect "text or audio input, prompt, uploaded files, feedback, chat history, or other content that [the user] provide[s] to our model and Services," but it also collects information from your device, including "device model, operating system, keystroke patterns or rhythms, IP address, and s
- But when you run it on your own machine (Score:2)
  
  by daveapriltwenty ( 1919206 ) writes:
  
  I have the 32B Q8 version running on my M2 Max Mac Studio with 64 GB of RAM. I can already get north of 60,000 tokens, without optimization, and generate at about nine tokens a second, here at my house. No phoning home.
  - - Re: (Score:3)
      
      by ZipNada ( 10152669 ) writes:
      
      Maybe the AI knows what its talking about and all you have is some weird-ass conspiracy theories.
      - Re: (Score:1)
        
        by Shaitan ( 22585 ) writes:
        
        Conspiracy theory? Dude, Biden literally withdrew and they swapped in Kamala Harris who subsequently lost the election because of the mental health issues on the debate stage.
        In any case the AI doesn't just spin this with pro-Biden bias like many of these legacy news sources, it claims their reports [below] and the special council report [https://www.documentcloud.org/documents/24414255-report-from-special-counsel-robert-k-hur-february-2024/ this is a mirror but feel free to pull direct from the DOJ] NEVER
        
        Re: (Score:2)
        
        by ZipNada ( 10152669 ) writes:
        
        Next you'll be telling us the Jan. 6 rioters were just tourists, and that trump has already reduced the price of eggs.
        
        Re: (Score:2)
        
        by Shaitan ( 22585 ) writes:
        
        Did you take a head injury or are you a Deepseek instance? You can try to spin the special council report despite everything that happened afterward but that the report happened isn't disputed. Even by the legacy media propaganda networks... which is why I linked them all reporting on it. ^
        But yeah, sure, it's all a conspiracy theory... we seriously need to find a way to detox you folks from all the kool-aid you all drank. Also it's unlikely that Trump would have been able to wave some sort of magic wand to
        
        Re: (Score:2)
        
        by ZipNada ( 10152669 ) writes:
        
        Why should anyone give a damn what Robert Hur thinks about Biden's memory? Hur said that Biden "cooperated with investigators and agreed to searches of his homes". Trump, on the other hand, held onto sensitive documents and obstructed justice “by enlisting others to destroy evidence and then to lie about it” according to Hur.
        Meanwhile trump is a convicted felon who claimed “When I win, I will immediately bring prices down, starting on Day One”. Lied about that too, lol.
        https://tradin [tradingeconomics.com]
    - Re: (Score:1)
      
      by Iem Eel ( 4268177 ) writes:
      
      Sure... But isn't that "bias" and "misinformation" inherent to literally all models where ever they come from, East, West, China, US, EU ?
      This is the nature of LLMs in contrast to algorithms, "unpredictability", "inability to verify the answer" through consistent reasoning and obfuscated "guardrails and guardrails mechanisms". This is why LLMs should never ever be used to make "choices" which need to be rationally argumented within specific contexts (legal, healthcare, employment, etc).
      In short, solutions
      - Re: (Score:2)
        
        by Shaitan ( 22585 ) writes:
        
        "Sure... But isn't that "bias" and "misinformation" inherent to literally all models where ever they come from, East, West, China, US, EU ?"
        Yes and no, some like Llama 3 are genuinely open and have the uncensored model and public training data. But you are still drawing a false equivalence here. Sure openai, fb, and the wokesters trying to literally train their ideology as AI guardrails are harmful but they aren't overtly malicious, the CCP IS. Their model might push me toward a Biden sympathetic take and a
    - Re: (Score:2, Insightful)
      
      by Bert64 ( 520050 ) writes:
      
      Of course that is harmless, Biden's condition is still denied
      That's the point right there...
      There were claims about his condition but no concrete evidence, followed by denials from many of the mainstream media outlets. The training process is going to skew towards that being a false claim based on the weight of the data.
      You don't need to intentionally bias the model, the model will gain its own bias based on the available training data.
- Re: (Score:2)
  
  by Tom ( 822 ) writes:
  
  Run it locally and you have none of these worries.
  This, to me, is the main advantage of DeepSeek R1. You can download the full model or for consumer hardware one of the quantified models.
  Or use the API (through LM Studio or other tools) if the main thing you worry about is shady stuff like keystroke patterns (which really, an online service has no business collecting).
They'll never duplicate it! (Score:2)

by sabt-pestnu ( 967671 ) writes:

They'd have to replace the dwarf under the table!

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Hugging Face Researchers Are Trying To Build a More Open Version of DeepSeek's AI 'Reasoning' Model 32

Hugging Face Researchers Are Trying To Build a More Open Version of DeepSeek's AI 'Reasoning' Model More Login

Hugging Face Researchers Are Trying To Build a More Open Version of DeepSeek's AI 'Reasoning' Model

Soaring (Score:4, Interesting)

Re: (Score:2, Insightful)

Re: (Score:2)

Re: (Score:3)

China regulations for AI to be approved in China (Score:2)

Human verified question and answers (Score:2)

You'll kill us all! (Score:5, Funny)

Re: (Score:3)

Re: (Score:3)

It's just a transformer (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

data collection (Score:2, Troll)

But when you run it on your own machine (Score:2)

Re: (Score:3)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2, Insightful)

Re: (Score:2)

They'll never duplicate it! (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot