Comment Re:This is how it was always going to be (Score 2) 28
LLaMA is an LLM model. Models run on hardware. The article is about difficulty of challenging existing hardware vendors.
You can run AI models on any hardware, from standard CPUs, to GPUs. But this article is focused on specialized AI hardware such as TPU (Google), TSP (Groq), Tranium (Amazon), or AIU (IBM).
TL;DR - hardware is hard.
Designing a new chip is hard. Tape-out of a new design can be incredibly expensive ($50-$100m). Integrating new chips into boards and systems is also incredibly expensive. Building an assembly line and quality control for volume is incredibly expensive. Next you need to spend a ton of time and money to develop the software needed to allow people to use your new hardware, ideally in a way that matches industry standards (like pytorch) to avoid the negatives of a steep learning curve. Once all this is done, you need to get noticed in the market, develop customers, and generate enough ongoing demand and revenue to not only recoup these start-up costs but also creating enough profit to make the entire risky venture worthwhile...
Meanwhile, Nvidia has a very strong interest in cutting off your air supply as quickly as possible to ensure their profit margins are never challenged.