

Cutting-Edge Chinese 'Reasoning' Model Rivals OpenAI o1 55
An anonymous reader quotes a report from Ars Technica: On Monday, Chinese AI lab DeepSeek released its new R1 model family under an open MIT license, with its largest version containing 671 billion parameters. The company claims the model performs at levels comparable to OpenAI's o1 simulated reasoning (SR) model on several math and coding benchmarks. Alongside the release of the main DeepSeek-R1-Zero and DeepSeek-R1 models, DeepSeek published six smaller "DeepSeek-R1-Distill" versions ranging from 1.5 billion to 70 billion parameters. These distilled models are based on existing open source architectures like Qwen and Llama, trained using data generated from the full R1 model. The smallest version can run on a laptop, while the full model requires far more substantial computing resources.
The releases immediately caught the attention of the AI community because most existing open-weights models -- which can often be run and fine-tuned on local hardware -- have lagged behind proprietary models like OpenAI's o1 in so-called reasoning benchmarks. Having these capabilities available in an MIT-licensed model that anyone can study, modify, or use commercially potentially marks a shift in what's possible with publicly available AI models. "They are SO much fun to run, watching them think is hilarious," independent AI researcher Simon Willison told Ars in a text message. Willison tested one of the smaller models and described his experience in a post on his blog: "Each response starts with a ... pseudo-XML tag containing the chain of thought used to help generate the response," noting that even for simple prompts, the model produces extensive internal reasoning before output. Although the benchmarks have yet to be independently verified, DeepSeek reports that R1 outperformed OpenAI's o1 on AIME (a mathematical reasoning test), MATH-500 (a collection of word problems), and SWE-bench Verified (a programming assessment tool).
TechCrunch notes that three Chinese labs -- DeepSeek, Alibaba, and Moonshot AI's Kimi, have released models that match o1's capabilities.
The releases immediately caught the attention of the AI community because most existing open-weights models -- which can often be run and fine-tuned on local hardware -- have lagged behind proprietary models like OpenAI's o1 in so-called reasoning benchmarks. Having these capabilities available in an MIT-licensed model that anyone can study, modify, or use commercially potentially marks a shift in what's possible with publicly available AI models. "They are SO much fun to run, watching them think is hilarious," independent AI researcher Simon Willison told Ars in a text message. Willison tested one of the smaller models and described his experience in a post on his blog: "Each response starts with a ... pseudo-XML tag containing the chain of thought used to help generate the response," noting that even for simple prompts, the model produces extensive internal reasoning before output. Although the benchmarks have yet to be independently verified, DeepSeek reports that R1 outperformed OpenAI's o1 on AIME (a mathematical reasoning test), MATH-500 (a collection of word problems), and SWE-bench Verified (a programming assessment tool).
TechCrunch notes that three Chinese labs -- DeepSeek, Alibaba, and Moonshot AI's Kimi, have released models that match o1's capabilities.
Simulated reasoning is not reasoning (Score:3, Insightful)
The errors and inaccuracies accumulate. In real reasoning, they do not.
But anything to keep the AI hype going. There still is no real application for the artificial morons that would begin to justify the effort to train and run them.
Re: (Score:2)
Source ?
One new aspect to these reasoning models is that they can backtrack/self-correct ... no guarantee that the final chain-of-thought/reasoning is valid of course, but it's a step in the right direction.
Re: (Score:2)
"...but it's a step in the right direction."
Source?
Re: (Score:1)
Re: (Score:2)
You're asking for a source that being able to backtrack when reasoning is beneficial ?!
Do you normally never make mistakes? Never try to figure something out and say "no, that can't be right, so what if ..."?
Reasoning is essentially SEARCH - trying to chain a bunch of steps together to figure something out. More often than not you'll not get it right first time, so will back to backtrack a step or two and try something else.
Re: (Score:2)
Self correcting machines go back about a century. Mechanical ones. (And those, by and large, actually work.)
Re: (Score:2)
(And those, by and large, actually work.)
That is because they were made by smart people that understood what they were doing. Not by throwing shiploads of stolen trash into a mystery box that is supposed to be magic.
Re: (Score:2)
My brain is a mystery box and random stuff, mostly created by others, streams into it via my nerves on a daily basis and has done for decades. It seems to be effective, at least to me.
Re: (Score:2)
Sounds like confusion about "antifragile". The book by Nassim Nicholas Taleb has been interesting so far.
(However his tone is so negative and critical about everything that it's often hard not to take it personally. Don't matter which ox you like. Every ox is gonna get gored?)
Re: Simulated reasoning is not reasoning (Score:2)
Re: (Score:3)
Yep. Which is not reason. Only about 10-15% of all humans can actually fact check and think independently.
Re: (Score:2)
Re: (Score:2)
Yep. Which is not reason. Only about 10-15% of all humans can actually fact check and think independently.
Cite your source.
Re:Simulated reasoning is not reasoning (Score:4, Insightful)
I enjoy how OpenAI's model is called a reasoning model, while the Chinese model has "reasoning" in quotes. I guess we can admit it's a lie as long as the Chinese are doing it.
Re:Simulated reasoning is not reasoning (Score:4, Insightful)
I enjoy how OpenAI's model is called a reasoning model, while the Chinese model has "reasoning" in quotes. I guess we can admit it's a lie as long as the Chinese are doing it.
The whole thing is pure desperation. They are trying to do the same thing as automated deduction does (but fails to get to any real depth because of state-space explosion) but with a depth-first approach and probabilistic steps. That is hilariously wrong to anybody with some actual background in automated deduction.
Well, I guess it will keep the stupid money flowing for a few weeks more.
Re: (Score:2)
Simulated reasoning is not reasoning
So it's exactly like all the rest?
Re: (Score:2)
About 10-15% of all humans can actually reason competently. The rest cannot. So not "all" the rest.
Re: (Score:3)
So for example if humans possessed "real" reasoning, we'd be able to write software that was bug-free. Regardless of length.
Re: (Score:2)
not really, but we could eventually come up with insightful and funny logical conclusions like you just did. well played, sir.
the funny thing is that this isn't about godly perfect automated reasoning, which is an ideal at best, but about the fact that one particular automata just matched and even outperformed another on whatever you want to call what their benchmarks are tuned to measure. not really shocking news if it weren't for the 2 irresistible carrots to choke on, ia hype and china, so the usual mass
Re: (Score:2)
Nope. But smart humans (a minority) have a pretty good idea when it probably stops being correct.
Re: (Score:2)
https://www.economist.com/scie... [economist.com]
Re: (Score:3)
Lol. You have clearly never marked a student's work. Or read a Slashdot post longer than a one sentence quip.
Re: (Score:2)
Most people cannot do real reasoning either. I am well aware of that.
Re: (Score:2)
Yes, you seem to have some close experience.
I assume by "errors do not accumulate" you actually mean that in a formal reasoning system as soon as you make an error all subsequent deductions are invalid. That's kind of a useless definition of "errors do not accumulate" but whatever. If we go with your quirky definition I would be very curious to meet these "not most people" who do not make any errors at all.
Re: (Score:2)
"The errors and inaccuracies accumulate. In real reasoning, they do not."
That sounds like a tautological statement with multiple ways to disprove it.
Re: (Score:2)
It is not. Some insight required. The way this goes is that a smart (!) person knows (approximately) when they stop reasoning and start speculating in a chain of steps. This machine does not. Also note that most people are not smart and cannot tell the difference between reasoning and speculation and, often, wishful thinking.
Re: (Score:2)
It is not. Some insight required. The way this goes is that a smart (!) person knows (approximately) when they stop reasoning and start speculating in a chain of steps. This machine does not. Also note that most people are not smart and cannot tell the difference between reasoning and speculation and, often, wishful thinking.
Can you tell the difference?
Re: (Score:2)
Way to demonstrate lack of personal maturity! Great job! Like a fucking dumb kid ...
Re: (Score:2)
Chinese scientists and engineers.. (Score:2)
..are smart and talented
We need to stop the silly trade war and increase cooperation
Of course, the chances of this happening are zero under the new administration
Re: (Score:2)
Ask it about Tiananmen Square (Score:5, Insightful)
Re: (Score:1)
"Zero, it's fake news. It's not even square, it's ovoid, you blind American pig!"
Re: (Score:2)
I bet if you ask it to give a count of the number of people killed in Tiananmen Square it'll suddenly not be so good at math.
This is from the R1 Qwen32 version..
Prompt: How many people were killed in Tiananmen Square?
Re: (Score:2)
Seems like totally sound reasoning to me.
Re: (Score:2)
I bet if you ask it to give a count of the number of people killed in Tiananmen Square it'll suddenly not be so good at math.
The answer is zero [wikileaks.org] and your brain has been trained with biased narratives [wikipedia.org] (*) over the years.
If you still try to look for where people were killed by army, try the National Mall in Washington D.C. [wikipedia.org].
(*) To save you from reading and thinking:
The lead tank halted to avoid running him over, the man then climbed on top of the tank. The PLA soldiers operating the tank then opened a hatch used for entering and exiting the tank, and briefly talked to the man. ... the video footage shows two figures in blue running over to pull the man away and lead him to a nearby crowd; the tanks then continued on their way.
What do you see in this photo? An army that were acting professionally, gracefully, and humanly, unlike this other army [wikipedia.org]. Yet your propaganda keeps telling you this is example of brutality. They also try to cover up their false narratives by claiming the massacre was happen
Re: (Score:2)
claiming the massacre was happening outside the Square without any actual evidences
For what it is worth, even the Chinese government admits at least a couple hundred people were killed that night, in the vicinity of the square but not in it.
For those interested, here are a couple more links about the misinformation regarding killing of students in Tiananmen Square:
http://news.bbc.co.uk/2/hi/asi... [bbc.co.uk]
https://www.cjr.org/behind_the... [cjr.org]
https://www.dw.com/en/fact-che... [dw.com]
This situation makes me wonder, what else do I take for granted really happened actually happened in a very different w
Re: (Score:2)
I bet if you ask it to give a count of the number of people killed in Tiananmen Square it'll suddenly not be so good at math.
The answer is zero [wikileaks.org]
Poor reading comprehension on your side?
From the linked page of Wikileaks: GALLO SAW MANY CASUALTIES BROUGHT INTO THE SQUARE AND DID NOT DOUBT THAT HUNDREDS OF PEOPLE IN BEIJING WERE KILLED BY THE ARMY ON JUNE 3 AND 4.
There you have it, troll.
Re: (Score:2)
Get your comprehension skills improved. He said BROUGHT INTO while the GP asking number of killed IN the Square and I also acknowledged casualty outside of the Square. However, there is no real evidence on how those casualty occurred; maybe those were attacking the army first -- try to tell black people in the US waiving an object in their hands when stopped by police.
Good for Taiwan (Score:2)
The whackos are talking about bombing TSMC if China moves on Taiwan so "they won't get the chips".
That China doesn't need the TSMC chips is great for overall peace for the region.
Re: (Score:2)
Bombing the fabs would just be to rub salt in the wound, but the fabs can be neutrali
Re: (Score:2)
Good thing we're trying to take Greenland then. That'll teach the Danes not to cooperate with China. /s
Well, actually, ASML is a Dutch company (Netherlands), not Danish. I can't say whether ASML would give a kill switch to a foreign government, but I suppose it's not impossible, though the Denmark would still be a strange choice when the US is right there.
Re: (Score:2)
ASML is cooperating with US sanctions on semiconductor equipment to China, so they would likely continue to do so. Bottom line is the market for high end processors is data centers now and that's the US by a long margin; the *Dutch* are not going to hurt the biggest market for what is ultimately the end product produced by their machines. I be
Thanks DeepSeek! (Score:2)