Start-up Could Kick Opteron into Overdrive 127
An anonymous reader writes "The Register is reporting that a new start-up, DRC Computer, has created a reprogrammable co-processor that can slot directly into Opteron sockets. This new product has the potential to boost the Opteron chips well ahead of their Xeon-based competition. From the article: 'Customers can then offload a wide variety of software jobs to the co-processor running in a standard server, instead of buying unique, more expensive types of accelerators from third parties as they have in the past.'"
Why read a re-written press release (Score:4, Informative)
http://www.drccomputer.com/pages/products.html [drccomputer.com]
A bit more accurate summary (Score:5, Informative)
That's a pretty cool niche.
Re:Kick ass synth? (Score:2, Informative)
Re:neural networks or java? (Score:4, Informative)
In late 90's, I've been burned off in precisely such start-up. We built an ASIC Java piggy-back byte-code CPU. It worked... as a proof of an idea. It didn't give much performance boost, at best, in 20-30% range. Noone wanted it.
Re:Er.... question (Score:4, Informative)
Clockspeed is not a measurement of performance unless you are comparing similar architectures. With FPGAs you can do everything in parallel, whereas microprocessors are inherently sequential. In effect, you can potentially complete hundreds of instructions per clock cycle, whereas a microprocessor will complete 2 or 3.
In practical terms, this product lends itself to compute intensive tasks such as signal processing, not data serving.
Re:Er.... question (Score:3, Informative)
With FPGAs you can do everything in parallel, whereas microprocessors are inherently sequential. In effect, you can potentially complete hundreds of instructions per clock cycle, whereas a microprocessor will complete 2 or 3.
True, but if the microprocessor's clock speed is hundreds of thousands of times fater than the FPGA, then you are even again. There's no clock speed for this device in the article so we can't really compare.
Re:Er.... question (Score:4, Informative)
FPGA vs. General Purpose CPU (Score:4, Informative)
Hardware, on the other hand, is massively parallel. All the "gates" (*) are all running all the time. It's like multi-threading a program, taken to the limit of infinity. However, if designed correctly, this thing can scale beyond belief, since it's all parallel.
It's also important to note that it's a Virtex4 [xilinx.com] on that card. That's one hell of an FPGA, they sure aren't cutting any corners. I'm not sure which one they're using, but some Virtex4 chips have PowerPC processors at 450 MHz.
This is definitely a niche product for now, due mainly to the lack of people who can write code in Hardware Description Languages (HDLs). But if you can figure it out, and you have an application that works on a massive scale, this may be for you.
Oh, and for all you detractors who are saying "that thing only runs at 500 MHz! How is it supposed to be faster than my 2 GHz AMD chip?" You're forgetting one very important factor. Your AMD chip executes one instruction at a time, and the important instructions are surrounded by instructions whose sole purpose is to control program flow or move data back and forth. However, the XtremeDSP slices of a Virtex4 can each execute a multiply and an add in a single cycle, and there are up to 512 of them in the most hardcore Virtex4 chip, and other logic executing in parallel can control the "program flow" and ferry data back and forth across the bus.
*: Modern FPGAs are actually built out of SRAMs that can implement arbitrary logic functions. They're no longer arrays of gates, so to speak.
Re:About Time! (Score:3, Informative)
10x - 20x performance? You betcha. (Score:3, Informative)
There's a market for GPU's on video cards running $1,200+... People that buy them won't be satisfied with standard GPU's no matter how fast their main processors run... The custom acceleration of graphics calculation makes it worthwhile.
Now, imagine doing massive calculations (think three blackboards filled with quantum physics equations) -- and you can see how some scientific/industrial applications would go ga-ga over this stuff...
Re:Berkeley (Score:3, Informative)
There have been tons of addon cards that do FFTs, TCP offloading NICs, physics engines, or whatever you want. The problem is twofold. 1) These cards are expensive, or at the least nonfree and nonstandard as the rest of the computer and need software support to drive them 2) They often do not give the performance as advertised.
Take for example an FFT card for Photoshop filters. The image is in GPU memory and in system RAM. The image must be sent from system RAM over to the PCI card. Even if the card was sitting off of a HyperTransport, which is about the fastest external bus available on a PC today is only 3.2 GB/s. PCI is between 133 MB/s to 2133 MB/s for the new PCI-X second generation. Its common for memory busses to be in excess of 3.2 GB/s and some of the new Itaniums have something like 10.5 GB/s memory busses now.
Back to price. These cards are a niche product, so the price has to be high because the demand is low. The price of these cards can skyrocket very quickly because its common for these things to have RAM on them for cache and buffering, and this cache is often needed in the 1-4+ gigabyte range, which is not cheap in itself.
I'm not saying that I would not welcome something like an effective FFT offloading engine, but there is so much pre/post processing on the data that needs to come through the main system memory through the CPU, that the offloaders don't give you much.
For high performance computing, memory bandwidth is frequently the bottleneck, and has been for years. High end GPUs are a little different because they have had specialized busses for years (AGP and the like), and they also have the advantage of being told what to do by the CPU, given some data, and then internally processing it, and dumping the data straight to the monitor. The CPU does not need that data back. Its more or less a one way operation, the other offloader cards are usually a 2 way operation. Even in the case of TCP offloader cards, performance often does not keep up with software and general CPU improvements. Also, TCP offloaders don't work very well with things like software firewalls that want their hands in monkeying with the TCP data as well.
So, I believe at this point in time, offloader cards are not too valuable. Maybe for a specific problem or set of problems, but I haven't found one that could significantly improve performance yet.