I hate it when people say Llama and Deepseek are "open source." No, you can download a binary off hugging face that has millions of weights for all the tokens trained. But you don't get the "source." Having the source lets you rebuild the model from scratch. For Llama, it turns out Facebook pirated a Books-A-Billion worth of text. Deepseek likely did the same. If you read the Deepseek paper, they used 2048 machines with Intel nVidia H800 cards. Those cards are $5k ~ $8k a pop! Sure data centers probably get a discount when they put in orders for a few million, but it took them ~4 days using a massive cluster to train Deepseek. Even if you had the "source" material, no one has a spare $8 million+ to spend on the compute power to replicate it.