Could AMD's AI Chips Match the Performance of Nvidia's Chips? (reuters.com) 37
An anonymous reader shared this report from Reuters:
Artificial intelligence chips from Advanced Micro Devices are about 80% as fast as those from Nvidia Corp, with a future path to matching their performance, according a Friday report by an AI software firm.
Nvidia dominates the market for the powerful chips that are used to create ChatGPT and other AI services that have swept through the technology industry in recent months. The popularity of those services has pushed Nvidia's value past $1 trillion and led to a shortage of its chips that the Nvidia says it is working to resolve. But in the meantime, tech companies are looking for alternatives, with hopes that AMD will be a strong challenger. That prompted MosaicML, an AI startup acquired for $1.3 billion earlier this week, to conduct a test comparing between AI chips from AMD and Nvidia.
MosaicML evaluated the AMD MI250 and the Nvidia A100, both of which are one generation behind each company's flagship chips but are still in high demand. MosaicML found AMD's chip could get 80% of the performance of Nvidia's chip, thanks largely to a new version of AMD software released late last year and a new version of open-source software backed by Meta Platforms called PyTorch that was released in March.
Nvidia dominates the market for the powerful chips that are used to create ChatGPT and other AI services that have swept through the technology industry in recent months. The popularity of those services has pushed Nvidia's value past $1 trillion and led to a shortage of its chips that the Nvidia says it is working to resolve. But in the meantime, tech companies are looking for alternatives, with hopes that AMD will be a strong challenger. That prompted MosaicML, an AI startup acquired for $1.3 billion earlier this week, to conduct a test comparing between AI chips from AMD and Nvidia.
MosaicML evaluated the AMD MI250 and the Nvidia A100, both of which are one generation behind each company's flagship chips but are still in high demand. MosaicML found AMD's chip could get 80% of the performance of Nvidia's chip, thanks largely to a new version of AMD software released late last year and a new version of open-source software backed by Meta Platforms called PyTorch that was released in March.
Not without CUDA (Score:4, Informative)
Re: Not without CUDA (Score:2)
Re:Not without CUDA (Score:5, Interesting)
Oh no, it's actually much worse than that. You see AMD don't really need CUDA, what they need to do is stop getting in their own way.
AMD already have a platform/API that's largely compatible with CUDA called HIP. It's even supported by Pytorch. The API is supposed to help you port CUDA native code to AMD. It's missing some features but would still be relatively useful EXCEPT - AMD deliberately nerf it so it's unusable on cheaper hardware.
It doesn't matter whether the hardware COULD support HIP / ROCm because AMD hardcode device IDs into the library to make it run only on expensive workstation GPUs. If you want to run it on cheaper and/or older hardware - too bad. I'm convinced this isn't purely a technical limitation, AMD just want people to believe it is. What it's really about is stopping corporate / datacenter users from using cheaper gaming GPUs in place of expensive compute / workstation devices.
So yeah, instead of competing with Nvidia, AMD are more worried about competing with themselves. The consequence of this is ROCm / HIP is constantly being undermined by AMD themselves.
The reason I know this? I have a bunch of GPUs using Vega chips with HBM2 memory. These are capable compute devices and If I hack the device IDs into the library code I can use ROCm flawlessly. However, AMD have made it clear they don't want my business so I went with Nvidia hardware instead. AMD can go fuck themselves.
Re: (Score:3)
AMD already have a platform/API that's largely compatible with CUDA called HIP. It's even supported by Pytorch. The API is supposed to help you port CUDA native code to AMD. It's missing some features but would still be relatively useful EXCEPT - AMD deliberately nerf it so it's unusable on cheaper hardware.
Ugh.
Just ugh.
Fucking AMD.
OK, so it took them fucking forever to support pytorch. To a large extent no one gives a shit about CUDA, because for a long time now people running GPU code aren't writing CUDA
Re: (Score:2)
They want to keep artificial market separation.
But nvidia does this actual hardware. They push the market into differentiation up until there is a backlash, and pull back a little from there.
They nerf the 4060 cards (or rather all x060 ones), but they are usually for laptops anyway. They tried to put a low RAM 4080 into the market, received backlash, and later was rebranded 4070 TI. They keep the entire enterprise market separate, with ECC RAM, vGPU and other features that are usually not expected by consum
Re: (Score:2)
Will it match the easy of development that Nvidia provides for its GPU's? No.
Not sure about that. Running code with a difficult development tool is significantly easier than running code on no-existent and unobtainable hardware. Try actually going and buying an NVIDIA A100 right now. You'll be waiting months ... which is still better than trying to buy a H100 where no one will give you a delivery date.
If AMD can actually produce the things and put them to market they will sell right now. No start-up in this space is going to sit around with their thumb up their arse waiting for an "
Yet another clueless GPU benchmark story (Score:5, Informative)
AMD GPUs have always been competitive with Nvidia in hardware benchmarks. Yet, at the same time, they have been crushed in market share. That should be a glaring indication that hardware benchmarks aren't the problem for AMD. The biggest problem for AMD is software, including the success of CUDA, the relative stability of Nvidia gaming drivers, and Nvidia's huge lead in AI support software.
How big is Nvidia's software moat? Look at the MLPerf results to see how far behind not only AMD is but also most other competitors. This benchmark was launched by Google and friends (not including Nvidia), and yet even Google fails to submit results in most categories.
Unfortunately the software story is also AMD's weakness because it doesn't have the headcount to compete with Nvidia. So, AMD tries to enlist third-party help to blunt Nvidia's software advantage. However, that strategy fails because AI is such a fast moving area that most potential buyers buy Nvidia because they need something that works now. Many such buyers would prefer to support AMD and increased competition in the market, but not at the expense of personally missing out on the current advances in AI.
Re: (Score:2)
What happened to AMD's open source drivers? Why aren't they better than before like ATI's days with their closed drivers?
Re: (Score:2)
AMD GPUs have always been competitive with Nvidia in hardware benchmarks.
Which benchmarks? At which price point. Completely ignoring AI / raytracing stuff (i.e. ignoring features of a card with cost money and thus stack the argument very favourably for AMD) they have only really been competitive in the low-mid range. In the mid-high they've largely taken lower tier cards and run them to the point of borderline breaking, and in the really high end they aren't present at all.
Re: (Score:2)
AMD will probably take the lead again in the future, but if that's af
Nvidia wins the Least-Creep benchmark (Score:3, Funny)
Nvidia's chips draw people with an average of 5.7 fingers while AMD's draws them with 6.2.
Re: (Score:2)
Nvidia's chips draw people with an average of 5.7 fingers while AMD's draws them with 6.2.
On each hand, or on several hands, or coming out elsewhere? It matters.
Re: (Score:1)
I'd love to have fingers on my
Future path (Score:4, Funny)
So AMDs chips in the future will match nVidia's chips today?
Re: (Score:3)
Current gen Nvidia AI chips are very much unobtainium and extremely expensive right now due to LLM rush. This is why the comparison is to previous gen, which is at least somewhat available.
Remember, you don't get rich digging gold in a gold rush. You get rich selling shovels to miners.
Re: (Score:2)
Re: (Score:2)
Wow, looks like someone just hates AMD for no reason. Meanwhile AMD is pushing Intel out of the datacentre and encroaching on nVidia's lead in machine learning.
Re: (Score:2)
and encroaching on nVidia's lead in machine learning
Eeeh, not really. The only reason AMD has any chance in being competitive right now is the complete inability to actually get NVIDIA hardware for ML. They are well and truly over a generation behind currently, but even NVIDIA's previous generation is hard to get (4-5month waiting times on an A100). AMD is smashing it in the CPU / Datacentre market, but they are barely competitive in the GPU space and they are currently a hardly-playing in the ML space.
Re: (Score:2)
Meanwhile AMD is pushing Intel out of the datacentre
Eh. A better description is slowly and surely chipping away at Intel's overwhelmingly commanding lead in the datacenter.
and encroaching on nVidia's lead in machine learning.
lol, no. AMD isn't even a serious contender in the ML space right now due to the abysmal state of their software.
There's a reason AMD only has 9 showings in the Top500.
In the datacenter market at large, NV has around 90% of the market.
Looks like someone likes to greatly embellish AMD's standing.
Why is that? Do you feel an emotional connection to the corporation?
Re: (Score:2)
Intel's dcg revenue is tanking. I swear people do not pay attention around here.
As for whom I might be rooting for in the ML space? Most likely Tenstorrent. Interesting bunch. And I don't really think AMD's software stack is that bad anymore but whatever.
Re: (Score:2)
Intel's dcg revenue is tanking. I swear people do not pay attention around here.
Of course it's "tanking".
If AMD increases their market share from 10% to 20%, that means Intel had to lose 10%.
But the fact is- the split is still 80% of new DC CPUs sold are Intel, and 20% are AMD.
You can call that "pushing Intel out of the DC", but that's a grossly misleading characterization. "Chipping away at Intel's overwhelming lead" is accurate.
As for whom I might be rooting for in the ML space? Most likely Tenstorrent. Interesting bunch. And I don't really think AMD's software stack is that bad anymore but whatever.
It is.
TensorRT and cuDNN are the world leaders in large model software stacks for inference and training. ROCm is horrendous to work with.
That's why AMD
Re: (Score:2)
looks like someone just hates AMD for no reason
"Intel only shines because its only competition is so inept" hardly makes this come across as hate for no reason. It comes across as exasperation.
And in the context of the AI (and GPGPU in particular) side of things at least I heartily agree with that sentiment. It's just not an area they've excelled in compared to CPUs and more recently with FPGAs and CPLDs (with AMD buying Xilinx and Intel buying Altera). For the average consumer looking to buy a graphics card and dabble in AI Nvidia is the obvious choice
Re: (Score:2)
The average consumer isn't relevant to the ML space or this discussion, this is about high-end hardware that only larger orgs can afford.
Maybe some researchers are stuck using a 4090 at home because a 7900XTX isn't as good for ML, but that really is irrelevant in the grand scheme of things. AND definitely isn't prioritizing ML with their consumer hardware. The RDNA/CDNA split happened for a reason.
WTF is "artificial intelligence chips"? (Score:2)
Craptastic headlines on par with the craptastic content.
I have another question (Score:3)
Can either of them produce an affordable GPU that doesn't melt cables because it draws more power than the amps at a rock concert?
Re: (Score:2)
Re: (Score:2)
Please look up the meaning of exaggeration. Sometimes people do it to stress a point.
Re: (Score:2)
Then yes, yes they can. And they have, and they do. Just don't buy an NVIDIA 4080 or 4090.
All "affordable GPUs that doesn't melt cables because it draws more power than the amps at a rock concert."
Re: (Score:2)
People who don't understand hyperbole should be publicly eviscerated in the town square.
They are _NOT_ "AI chips" (Score:3)
Waste of Natural Resouces (Score:2)
Re: Waste of Natural Resouces (Score:3)
Betteridge that (Score:2)