Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI

Hugging Face Researchers Are Trying To Build a More Open Version of DeepSeek's AI 'Reasoning' Model 32

Hugging Face researchers are attempting to recreate DeepSeek's R1 artificial intelligence model in an open-source format, just days after the Chinese AI lab's release sent markets soaring. The project, called Open-R1, aims to replicate R1's reasoning capabilities while making its training data and code publicly available. DeepSeek's R1 model, which matches or surpasses OpenAI's o1 on several benchmarks, was released with a permissive license but keeps its underlying architecture private. Hugging Face will use its research server with 768 Nvidia H100 GPUs for the effort.

Hugging Face Researchers Are Trying To Build a More Open Version of DeepSeek's AI 'Reasoning' Model

Comments Filter:
  • Soaring (Score:4, Interesting)

    by GrahamJ ( 241784 ) on Tuesday January 28, 2025 @05:08PM (#65126219)

    I donâ(TM)t think this word means what you think it means

  • by shino6 ( 2368984 ) on Tuesday January 28, 2025 @05:18PM (#65126243)
    I keep reading that as Face Hugger Reseachers.
    • Yeah, I'm not sure what they were going for with that one. Terrible name. I don't even know what movie that's from (Alien I think?) but upon reading the name I reflexively reached for a crowbar to bat away headcrabs.

      • by mattr ( 78516 )

        Apparently hugging face is the name of an emoji.. a face with two hands. Not sure if that is why they called it so though.

  • There's nothing secret about the architecture - you can download the weights and run it for yourself (which means you're creating an instance of the architecture, and loading the weights into it).

    The biggest barrier to replication, which may or may not turn out to be an issue, is the training data needed for the various training phases. DeepSeek use quite an elaborate training process/pipeline, starting with their DeepSeek-V3 base mode, but creating two intermediate models before arriving at the final DeepS

    • From the hugging face blog [huggingface.co]:

      However, the DeepSeek-R1 release leaves open several questions about:

      Data collection: How were the reasoning-specific datasets curated?

      Model training: No training code was released by DeepSeek, so it is unknown which hyperparameters work best and how they differ across different model families and scales.

      Scaling laws: What are the compute and data trade-offs in training reasoning models?

    • by Shaitan ( 22585 )

      "The biggest barrier to replication, which may or may not turn out to be an issue, is the training data needed for the various training phases."

      One would hope the barrier to replication turns out to be nothing. But it is important to replicate, not just to validate the technology but so that a genuine open, uncensored, and untainted model exists. That isn't necessarily on the folks at DeepSeek, they can't help where they are, but nobody sane will use this tainted model for much of anything. The objective in

  • DeepSeek R1 collects a significant amount of personal data and ships it off to servers in China. A truly open source implementation would presumably eliminate that.

    https://mashable.com/article/d... [mashable.com]
    Not only does DeepSeek collect "text or audio input, prompt, uploaded files, feedback, chat history, or other content that [the user] provide[s] to our model and Services," but it also collects information from your device, including "device model, operating system, keystroke patterns or rhythms, IP address, and s

    • I have the 32B Q8 version running on my M2 Max Mac Studio with 64 GB of RAM. I can already get north of 60,000 tokens, without optimization, and generate at about nine tokens a second, here at my house. No phoning home.
    • by Tom ( 822 )

      Run it locally and you have none of these worries.

      This, to me, is the main advantage of DeepSeek R1. You can download the full model or for consumer hardware one of the quantified models.

      Or use the API (through LM Studio or other tools) if the main thing you worry about is shady stuff like keystroke patterns (which really, an online service has no business collecting).

  • They'd have to replace the dwarf under the table!

It isn't easy being the parent of a six-year-old. However, it's a pretty small price to pay for having somebody around the house who understands computers.

Working...