AMD Introduces Radeon Instinct Machine Intelligence Accelerators (hothardware.com) 55
Reader MojoKid writes: AMD is announcing a new series of Radeon-branded products today, targeted at machine intelligence and deep learning enterprise applications, called Radeon Instinct. As its name suggests, the new Radeon Instinct line of products are comprised of GPU-based solutions for deep learning, inference and training. The new GPUs are also complemented by a free, open-source library and framework for GPU accelerators, dubbed MIOpen. MIOpen is architected for high-performance machine intelligence applications and is optimized for the deep learning frameworks in AMD's ROCm software suite. The first products in the lineup consist of the Radeon Instinct MI6, the MI8, and the MI25. The 150W Radeon Instinct MI6 accelerator is powered by a Polaris-based GPU, packs 16GB of memory (224GB/s peak bandwidth), and will offer up to 5.7 TFLOPS of peak FP16 performance. Next up in the stack is the Fiji-based Radeon Instinct MI8. Like the Radeon R9 Nano, the Radeon Instinct MI8 features 4GB of High-Bandwidth Memory (HBM) with peak bandwidth of 512GB/s. The MI8 will offer up to 8.2 TFLOPS of peak FP16 compute performance, with a board power that typical falls below 175W. The Radeon Instinct MI25 accelerator will leverage AMD's next-generation Vega GPU architecture and has a board power of approximately 300W. All of the Radeon Instinct accelerators are passively cooled but when installed into a server chassis you can bet there will be plenty of air flow. Like the recently released Radeon Pro WX series of professional graphics cards for workstations, Radeon Instinct accelerators will be built by AMD. All of the Radeon Instinct cards will also support AMD MultiGPU (MxGPU) hardware virtualization technology.
Re: (Score:2)
The specific AI use is deep learning [wikipedia.org], which you'll no doubt write off as a buzz word, but it's important to a large number of fields such as image recognition, voice recognition, drug research, product recommendations and so on.
Part of deep learning is the analysis of large quantities of data. A GPU should be able to analyze thousands of sets of data in parallel, which would make deep learning cheaper and faster. ATI is attempting to produce the tools needed to make that happen.
Re: (Score:2)
Woosh.
Re: (Score:2, Informative)
"Besides being built for massive scaling, it includes compilers, language run times and interesting (and importantly) CUDA-application support. (CUDA being the NVIDIA developed GPGPU programming language.)"
Holy balls! Time to eat crow buddy: CUDA is fucking supported...
Source: https://www.pcper.com/reviews/Graphics-Cards/Radeon-Instinct-Machine-Learning-GPUs-include-Vega-Preview-Performance
Re: (Score:1)
I was thinking Slashdot would be the crowd that I wouldn't have to add the sarcasm tag (/s) but it appears a few people took it literally.
Re: (Score:2)
I was thinking Slashdot would be the crowd that I wouldn't have to add the sarcasm tag (/s) but it appears a few people took it literally.
Many Aspies have difficulty understanding understanding sarcasm. We take everything literally. Slashdot tends to have more "whooshes" than other online forums.
Re: (Score:3)
Re: CUDA benchmarks? (Score:2)
Which open, reasonably available standards make it as easy to write compute kernels and interface host code with them as CUDA? CUDA lock-in is not pleasant, but writing code to launch OpenCL or Vulkan kernels is at least an order of magnitude harder than the code to launch a CUDA kernel, and often two orders of magnitude harder.
AMD driver developer says: (Score:2)
In own words of AMD driver developer:
"We don't happen to have the resources to pay someone else to do that for us."
https://lists.freedesktop.org/... [freedesktop.org]
AMD does hardware, but they dont support it with software.
Re: (Score:1)
Boss: I need you to get some hardware to try out neural net training on $DATASET
0100010001010011: Well. I can buy the Nvidia and get started with Tensorflow or AMD opensourced everything and I have to write the tools myself.
Boss: I need the results by the end of the month, I don't care how you do it.
Re: (Score:2)
Addressable memory (Score:2)
Re: (Score:3)
As opposed to RAM that's put on a video card but isn't addressable, so that all it does is waste space and power?
FP16 isn't even meant for computation (Score:2)
So they're all excited about the lowest-precision, smallest-size floating point math in IEEE 754?
Not only that, but FP16 is intended for storage (of many floating-point values where higher precision need not be stored), not for performing arithmetic computations. [wikipedia.org]
Kudos to AMD's marketing department for boasting about their compute performance with a number format that was never meant for computation.
Tell them to get back to me with their 64, 128, and 256-bit IEEE floating point performance..
Re: (Score:2)
Re:FP16 isn't even meant for computation (Score:4, Insightful)
So they're all excited about the lowest-precision, smallest-size floating point math in IEEE 754?
FP16 is good enough for neural nets. Do you really think the output voltage of a biological neurons has 32 bits of precision and range? For any given speed, FP16 allows you to run NNs that are wider and deeper, and/or to use bigger datasets That is way more important than the precision of individual operations.
Re: (Score:2)
Re: (Score:1)
Actually, since perceptrons can't do non-linear separable problems, it might be more accurate to call them non-linear multiple regression optimizers. Although, since gradient descent backpropogation is based on a difference of squares, it might be even more accurate to call them non-linear least squares multiple regression optimizers. But then, since they do non-linear regression with zillions terms, they are really arbitrary function approximators, so it might be even more accurate to call them non-linear
Re: (Score:2)
More pragmatically, they are a network of perceptrons with sigmoidal output functions.
Today, most bleeding edge NNs use rectified linear activation functions [wikipedia.org]. Sigmoids are soooo 2014.
Once you start talking about a deep learning network the updates to individual perceptrons can be very small and 32 bits are needed.
You can get the same flexibility and more just by going wider and deeper. The bottleneck for NNs is not the math ops, but getting data in and out of the GPU. By using FP16, you cut the per-neuron data in half.
Re: (Score:2)
So they're all excited about the lowest-precision, smallest-size floating point math in IEEE 754?
FP16 is good enough for neural nets. Do you really think the output voltage of a biological neurons has 32 bits of precision and range? For any given speed, FP16 allows you to run NNs that are wider and deeper, and/or to use bigger datasets That is way more important than the precision of individual operations.
There's a lot of rounding error with FP16. The neural networks I use are 16-bit integers, which work much, much better, at least for the work I'm doing. Also, do you have a good citation that FP16 neural networks are, overall, more effective than FP32 networks, as you've described?
Re: (Score:3)
There's a lot of rounding error with FP16.
Sure, but it doesn't matter. Backprop, learning rate, denoising, etc. all just heuristics anyway. So what if your mantissa is off by one bit? You get better accuracy by going wider, adding layers, and (most importantly) using more data. But you can't afford to do that if half your bandwidth is sucked up transmitting meaningless precision.
Also, do you have a good citation that FP16 neural networks are, overall, more effective than FP32 networks, as you've described?
They are not necessarily more effective, just more efficient. If you have infinite resources, you might even get better results using FP32. But resources are never inf
Re: (Score:2)
So, one problem is that there is not always more data. In my field, we have a surplus of some sorts of data, but other data requires hundreds of thousands of hours of human input, and we only have so much of that to go around. Processing all of that is easy enough, getting more is not.
Also, by "effective", I should have made it clear that I meant "an effective overall solution to the problem", which includes all costs of training a wider, lower-precision network. This includes input data collection, stora
Re: (Score:2)
Do you really think the output voltage of a biological neurons has 32 bits of precision and range?
...What? It's analog. It's got precision going down into the quantum scale... You know depending on noise. Range is also a big issue. But it's leveraging real-world physics to compute things. Think about how many discreet binary operations you'd have to perform to calculate the weighted middle point between populated cities. With an analog "computer"(it's a board with holes, a bit of string with some rocks on the end) it's "computation" is done practically instantly when you lift the thing up and gravity p
Re: (Score:2)
My apologies. I assumed biological neurons ran in meatspace.
Re: (Score:2)
Did you even read the thing I quoted?
How about my second to last sentence?
Tell you what, give the entire post another once over and then try again.
Re: (Score:2)
...What? It's analog. It's got precision going down into the quantum scale...
That is not true in any meaningful sense. If you give the same inputs to the same biological neuron, there is no way that you are going to get the same output down to the planck scale. In fact, it is unlikely that you are even going to get 8 bit precision (an output difference of 1/256th).
Re: (Score:2)
Wow, it's like the meaningful usage of a biological neuron depends on how much noise there is in the system.
But I think you forget the subject matter. It doesn't matter if the neuron fires 10% early 20% of the time. It's a real-world genetic algorithm system. That's just a feature the GA gets to play with. Because it truly doesn't care about getting exact answers, only good enough to balance a shmuck on two legs... most of the time.
Jesus, meat-space is just different. Comparing the two is going to run into
Re: (Score:2)
What I wanted to reply to the parent : it's not like a one dimensional analog signal either. This leaves out chemicals and finer details of what's happening in dendrites and axon and whatever stuff I can't name.
The idea you can map out the high level electrical brain and only that, and get a brain is a fallacy. It's like we're stuck in the late 90s and Ray Kurweil's ideas of the brain ; I reckon it's the main limit to transhumanism or singularity philosophies. Computer neural networks do have their own uses
Re: (Score:1)
This is aimed at "deep learning" and 16 bit is just what they need.
Re: (Score:2)
There already exist some Fire Pro branded cards with the virtualization features. One is based around the Radeon R9 380's GPU, and is quite very expensive but you pay a "Fire Pro" premium akin to a "Quadro" premium mostly. (Substitute FireGL / Fire Pro / Pro)
It's the counterpart to nvidia's Geforce Grid (or formerly VGX), one redeeming quality is nvidia has sold complete Geforce Grid systems as in pre-built rackable servers while AMD will sell you the card only.
I'll say it's a licensing issue : on a similar