
DeepSeek Accelerates AI Model Timeline as Market Reacts To Low-Cost Breakthrough (reuters.com) 25
Chinese AI startup DeepSeek is speeding up the release of its R2 model following the success of January's R1, which outperformed many US competitors at a fraction of the cost and triggered a $1 trillion-plus market selloff. The Hangzhou-based firm had planned a May release but now wants R2 out "as early as possible," Reuters reported Tuesday.
The upcoming model promises improved coding capabilities and reasoning in multiple languages beyond English. DeepSeek's competitive advantage stems from its parent company High-Flyer's early investment in computing power, including two supercomputing clusters acquired before U.S. export bans on advanced Nvidia chips. The second cluster, Fire-Flyer II, comprised approximately 10,000 Nvidia A100 chips. DeepSeek's cost-efficiency comes from innovative architecture choices like Mixture-of-Experts (MoE) and multihead latent attention (MLA).
According to Bernstein analysts, DeepSeek's pricing was 20-40 times cheaper than OpenAI's equivalent models. The competitive pressure has already forced OpenAI to cut prices and release a scaled-down model, while Google's Gemini has introduced discounted access tiers.
The upcoming model promises improved coding capabilities and reasoning in multiple languages beyond English. DeepSeek's competitive advantage stems from its parent company High-Flyer's early investment in computing power, including two supercomputing clusters acquired before U.S. export bans on advanced Nvidia chips. The second cluster, Fire-Flyer II, comprised approximately 10,000 Nvidia A100 chips. DeepSeek's cost-efficiency comes from innovative architecture choices like Mixture-of-Experts (MoE) and multihead latent attention (MLA).
According to Bernstein analysts, DeepSeek's pricing was 20-40 times cheaper than OpenAI's equivalent models. The competitive pressure has already forced OpenAI to cut prices and release a scaled-down model, while Google's Gemini has introduced discounted access tiers.
Thwey have a small window (Score:2)
They've established themselves and have a small window to gain space before the others catch up.
cheaper or subsidized? (Score:5, Interesting)
"DeepSeek's pricing was 20-40 times cheaper than OpenAI's equivalent models"
I'm wondering if this is because DeepSeek is cheaper to run or because they are offering it for less than it costs them. China has aspects of a command economy, and if a market segment is targeted in the State plan they may pump a lot of money in to make sure it succeeds.
Re: (Score:3, Interesting)
Im pretty sure their costs are lower
If I spend 1 million to download an AI competitors data once a week, where it cost them 500 million to make it, Im doing way better on cost than they are.
Re: (Score:3)
Did those AI engines "steal" their data?
Re: (Score:3)
It is admittedly very funny that OpenAI has complained that these folks are using their model to train DeepSeek's version, when literally all the AI companies are doing just that with content provided by everyone else for free.
Re: (Score:2)
Yeah, heavily subsidized. That could be fixed by applying an equivalent tax.
Re: (Score:2)
What if you subsidize too?
Re: (Score:2)
Re: (Score:2)
And if you are pretending you didn't have to buy thousands of chips, you can make it seem even cheaper!
Yeah, heavily subsidized. That could be fixed by applying an equivalent tax.
Question - does the Chines AI need nuclear reactors specifically for it's AI?
As well, is American AI copyrighted, so cannot be used anywhere else? Third question is does the Chines AI cost the multiple Billions the US AI does?
The bubble is getting close to bursting.
Re: (Score:2)
Re: (Score:2)
I'll begin and end with your third question, as best I can parse your broken English. Yes, it does. Lying about how much it cost or what hardware was used doesn't change the truth.
Oh, good. Perhaps you would like a dissertation.
The first question is based on the fact that Microsoft is planning on restarting One of the three Mile Island reactors in order to provide enough electricity for it's AI needs. Spo whatever you believe my broken English is, here is some people who might write in a manner in which you will comprehend and parse better than I can write. https://www.cnn.com/2024/09/20... [cnn.com]
So if American AI requires an entire Nuclear power plant in order to perform AI, does t
Re: (Score:2)
Which I'm guessing is part of your argument, which again, is not easy to process. Are you saying that it is an economic bubble because it uses too much pow
Re: (Score:2)
We don't have accurate reliable data on how much power DeepSeek uses, as they have lied about everything else, but we do know that China is building coal fired power plants like crazy. They're also building nuclear plants. Why should we dismiss the notion that they are building power plants dedicated to powering AI datacenters? Right, we shouldn't.
Which I'm guessing is part of your argument, which again, is not easy to process. Are you saying that it is an economic bubble because it uses too much power? Are you saying that DeepSeek isn't lying about their costs? Are you saying that AI is a bubble for other reasons? I don't wish to offend or condemn you for not being fluent in a non-native language (you should hear my Spanish), but I'm really not sure what you're saying half the time.
I am a native english speaker. Sorry if my communication skills are inadequate. Perhaps I should just give you some links, a brief description, and you can look it up for yourself.
You can look up what a bubble is. Why would I claim it is a bubble? Were you alive during the 2000 dotcom bubble burst? We can start there. I will also provide links .
I concur with this: https://en.wikipedia.org/wiki/... [wikipedia.org]
Now let us move on to another bubble, the subprime loan crisis https://en.wikipedia.org/wiki/... [wikipedia.org]
Re: (Score:2)
But I have to say it was much easier to read and understand this time. My guess would be that you took a breath first and didn't rush through, but whatever you did keep it up!
Re: (Score:3)
There's a cost associated with training the AI, and then a separate cost involved in generating the inferences from the models.
Generating the inferences can be a significant ongoing compute expense that is usually offset by user fees. What I'm seeing from this article is that DeepSeek's fees are "20-40 times cheaper" than the competition, which seems a little unrealistic.
Re: (Score:2)
"Generating the inferences can be a significant ongoing compute expense"
What if that cost is easily covered by their investment gains (on their own stock even)?
Re: (Score:2)
It seems like other companies would have that same advantage in the current investment climate.
"Adnan Masood of U.S. tech services provider UST told Reuters that his laboratory had run benchmarks that found R1 often used three times as many tokens, or units of data processed by the AI model, for reasoning as OpenAI's scaled-down model."
They are running the inferences on "a large A100 cluster", so similar hardware to the Western competition. But they potentially take 3 times the amount of compute resources.
Re: (Score:2)
C'mon. How much did OpenAI pay for your data online?
Even though there is no definite proof, I am pretty convinced that they indeed used ChatGPT data. Like literally a thousand other models you can download, as many of the open datasets (e.g. on Huggingface) are created using ChatGPT. When the first open source competitors were created, people literally asked ChatGPT to create example content for their datasets. Try out a few models and you'll soon learn to recognize "GPTism". Ever read from a model "Let's d
Re: (Score:2)
Re: (Score:2)
The market selloff wasn't as much (Score:3)
as deepseeks model being low energy (which it wasn't, they still had multiple runs and didn't include the training costs associated with massaging the data). But it was the market's realization that AI isn't going to 'go to the moon' forever. Someone is going to figure out how to train models with lower energy costs, and the assumption was its going to take more hardware and more energy, which isn't true in the long run and that isn't how capitalism works. Capitalism favors people that can reduce costs and I think you'd be stupid to believe that its going to be nvidia forever.
Re: (Score:2)
NVIDIA is still going to make lots of money selling hardware either way. This AI stuff is in its toddler phase.
Llama delayed again? (Score:2)
After R1 Meta delayed the planned Llama4 release to catch up. Let's see if they further delay it after R2.
Partially correct (Score:2)
Mixture of experts is not a DeepSeek innovation, as the core idea has been around for a very long time, and other models used mixture of experts before V2/V3. Multihead latent attention is a true DeepSeek innovation, as is writing direct PTX to repurpose SMs to bypass the Nvidia crippling of inter-chip bandwidth for H800s.
Pricing is perhaps another DeepSeek innovation. However, it remains to be seen whether DeepSeek's aggressive pricing is due to cost efficiencies, predatory pricing, or government subsidi