Comment Re:Are you telling me.. (Score 3, Insightful) 148
Yeah, their (Deepseek's) rock-bottom pricing is all you need to know about this sitch. If the numbers don't work w.r.t. profitability (i.e. they are lying about the upfront investment), then their investors will roast them alive and/or they will fall behind with time as their fabricated budge won't be able to keep pace with the actual (i.e. hidden) training costs.
More germanely, it is entirely plausible that their model was orders of magnitude cheaper to train than even their own preceding models (e.g. Deepseek v3) given that it's an MOE with a reasoning fix on top of it. It is DEFINITELY true that they are cheaper to run, esp. given the distillation process + results (i..e you can imbue the reasoning aspects onto small dense models like 32B Qwen, which will run well on current 24/32Gb cards, and even the full-banana MOE R1 deepseek model runs at 8-9 t/s on 12-channel DDR5 systems). That's the really disruptive aspect here, IMHO. The fact that any US business model relying on gatekeeping the model weights behind the huge training costs and then rent-seeking the inferencing API due to the energy constraints (i.e. data centers with nuclear power plants attached) just got a kick in the pants. You can go buy a $5k computer today and have something that benchmarks comparably to the state-of-the-art closed dense models from the big boys. In a few years that will be a $1k computer. In 5-10 years, if not sooner, that will be in your pocket.