OpenAI is Exploring Making Its Own AI Chips (reuters.com) 22
OpenAI, the company behind ChatGPT, is exploring making its own AI chips and has gone as far as evaluating a potential acquisition target, Reuters reported Friday, citing people familiar with the company's plans. From the report: The company has not yet decided to move ahead, according to recent internal discussions described to Reuters. However, since at least last year it discussed various options to solve the shortage of expensive AI chips that OpenAI relies on, according to people familiar with the matter. These options have included building its own AI chip, working more closely with other chipmakers including Nvidia and also diversifying its suppliers beyond Nvidia.
CEO Sam Altman has made the acquisition of more AI chips a top priority for the company. He has publicly complained about the scarcity of graphics processing units, a market dominated by Nvidia, which controls more than 80% of the global market for the chips best suited to run AI applications. The effort to get more chips is tied to two major concerns Altman has identified: a shortage of the advanced processors that power OpenAI's software and the "eye-watering" costs associated with running the hardware necessary to power its efforts and products.
CEO Sam Altman has made the acquisition of more AI chips a top priority for the company. He has publicly complained about the scarcity of graphics processing units, a market dominated by Nvidia, which controls more than 80% of the global market for the chips best suited to run AI applications. The effort to get more chips is tied to two major concerns Altman has identified: a shortage of the advanced processors that power OpenAI's software and the "eye-watering" costs associated with running the hardware necessary to power its efforts and products.
Sam Altman is a master of burning bridges (Score:1)
I suppose 15 minutes and all...
Are there generic chip design firms? (Score:2)
Are there similar firms that design silicon? This month you're making a TV tuner/decoder for fred, next month a video chip for sally, and early next year an AI processor for Open AI?
Re: Are there generic chip design firms? (Score:2)
Yes. I used to work for one in Scotts Valley, CA, called Silicon Engineering, Inc. The company designed silicon for Ford, Adaptec, and many others. When I left, hilariously, they were contracted to do Cirrus Logic's part of Microsoft's "Project Talisman" GPU which never happened. Microsoft had contracted three different companies to design a four-chip set, I forget who the other two were but CL wasn't actually competent to do their part.
AI making its own chips (Score:2)
Journalists need to learn... (Score:2)
...the meaning of the word "make"
OpenAI will not make anything, they will contract with a fab
They will not design anything, they will hire or acquire designers
Nvidia has been doing this for a long time and they are really good at it
It's probably fairly easy to assemble the talent to design mediocre chips, but equaling or surpassing the best of the best is really, really, REALLY hard
Re: (Score:3)
They wouldn't be trying to make the best GPU, they'd probably just want to make a chip that was completely dedicated to tensor cores. There's potentially a lot of performance to be gained by focusing the chip on exactly what they need and stripping out all the other stuff that a GPU has that isn't necessary. Do they even need general purpose compute on the chip, or do they just need to do FMA at really high throughput?
Google went down this path too with their TPUs, and their chips were roughly competitive w
Re: (Score:2)
They wouldn't be trying to make the best GPU, they'd probably just want to make a chip that was completely dedicated to tensor cores. There's potentially a lot of performance to be gained by focusing the chip on exactly what they need and stripping out all the other stuff that a GPU has that isn't necessary.
The purely compute part of the an AI chip that does multiplies and adds is fairly simple to design. The big challenge of AI (and a lot of other) chips is data movement, i.e., how can data be moved from DRAM to the ALU with the best performance and power. Much of the chip area on a GPU is dedicated to this challenging question, and it's why GPUs were well positioned to take the leadership in AI. Of course, the other part of the challenge is crafting the software to unlock the hardware. That's why Nvidia
Re: (Score:2)
What we need isn't "a new, better GPU". We need more radical solutions for chips designed for AI.
IMHO, I'm increasingly leaning on the notion that AI chips would best include analog elements. For example, instead of summing the product of a floating point matrix of weights times a matrix of activations, if weights were represented as resistances and activations as currents, just simply literally connecting them and measuring the current. It's sort of like the difference between measuring a glass of water
Re: (Score:2)
(The above comment on precision is in reference to inference, not training. But inference is like 90% of the compute load companies like OpenAI have)
Making it yourself is complex (Score:3)
These options have included building its own AI chip, working more closely with other chipmakers including Nvidia and also diversifying its suppliers beyond Nvidia
If they go the route of making their own, that's a whole can of worms I don't think they're ready for. Much like that time Canonical wanted to make their own display server Mir for Ubuntu, there's a lot of spinning gears to it. They would be wise to perhaps look at diversifying (which I don't know where they'll go, their code base they've admitted is CUDA specific) maybe coming back and working on their efforts to overthrow the CUDA monopoly [zdnet.com] (which as an aside, good luck on that, nVidia got in early and went straight for the colleges so now we've got a whole generation of ML programmers that know only CUDA, well not the whole lot, but CUDA absolutely is lion's share here), or maybe they just need to fully get in the bed they've made and gently cuddle the abusive boyfriend they've picked.
It's up to them really. But the making their own chip is . . . a bit extreme to put it nicely and is a bit more complicated than first blush. I say this having done a few RISC-V things on FPGA and a few FPGA conversions to ASIC (not full custom) for an employer but also note none of that is directly in the domain of what's needed for what is inside of a "CUDA core". So anyone reading my comment should know, I don't have direct experience working in this domain, but doing RISC-V on FPGA is, well I wouldn't call it super difficult but it's tricky for lack of better adjective, and I've done some FPGA to ASIC conversions where we had the fab help out getting the FPGA design onto their platform and that wasn't difficult but it too was tricky but a different kind of tricky. So based on that experience, I can't imagine that ground up full custom is suddenly easier (especially in things like clock distribution to prevent skew), but I could be wrong here a blank canvas might just be an awesome amount of freedom. Anyone who has ever done full custom, feel free to chime in.
Anyway, the whole article though sounds like a nothing burger. There's really no fire and hardly any smoke. OpenAI is just "looking", that could literally amount to nothing. BUT, it does indicate the wonderful world of having a closed source pretty much has completely dominated the market vendor. Preface, I only know of folks I've talked to that do ML, my employer currently isn't wading into any of that stuff so take the following with a grain of salt. Most of the people I know that are doing ML are using CUDA and it seems that's the main go to for lots of AI tools. So CUDA seems to have pretty much cornered the market and yeah, vendors are going to eventually tire of that whole nature of that arrangement.
Re: (Score:3)
maybe coming back and working on their efforts to overthrow the CUDA monopoly
Ugh. Fukin' AMD. They know how to snatch defeat from the jaws of victory.
They finalllllly got pytorch working with ROCM after years of not giving a shit. Except it requires docker fuckery. Fine for me. I can deal with docker, but they instantly lose about 95% of the budding ML students wanting to give it a go.
So everyone ends up going into industry with "well I know NVidia works". So who'd take the risk?
They haven't seemed to notic
Re: (Score:2)
Can't one install it without docker just by running:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/w... [pytorch.org]
Re: (Score:2)
Yep but as I understand it only inside a docker container work the rocm stuff.
At least ask the instructions say you need docker. I expect it's possible without but may be on hard mode.
Wait, what does that mean for GPUs? (Score:2)
Will GPU prices finally return to sane levels when they lose that market too?
Re: (Score:2)
The loss of the lowest class of septate GPU was a needed step when everything is 4k Native displays. That lowest class becoming 4% of the CPU die intergrated graphics. A GPU/DPU chiplet on the carrier did increase costs a few dollars per system but really what the Slashdotted cares about system on chip vs system on motherboard unless the cost savings are epic.
DPUs and AI Engines are the same, one is sexy stock marke
Re: Wait, what does that mean for GPUs? (Score:2)
I don't care about per transistor or per watt. I care about per finished GPU product given market position.
This is the first step... (Score:2)
Analog Circuits (Score:2)
Re: (Score:2)
It does work, but doesn't scale. Every instance of transistors/memristors has slightly different thresholds and must be individually trained on a chip by chip basis just like human children. Also they are really sensitive to EMI and power supply fluctuations. Digital is much more robust to device nonidealities.
https://www.eetimes.com/resear... [eetimes.com]
“If you try to do back propagation in analog, you get this effect where the forward activations and backwards activations from the two separate [data] paths
Re: (Score:2)
Backpropagation isn't needed in inference.
Sounds like a job for the Silicon Ronin. (Score:2)
'Make' presumably means hire someone like Jim Keller to design it and then pay TSMC to fabricate it.