Start-up Could Kick Opteron into Overdrive

Start-up Could Kick Opteron into Overdrive 127

Posted by ScuttleMonkey on Monday April 24, 2006 @07:36AM from the true-niche-markets dept.

An anonymous reader writes "The Register is reporting that a new start-up, DRC Computer, has created a reprogrammable co-processor that can slot directly into Opteron sockets. This new product has the potential to boost the Opteron chips well ahead of their Xeon-based competition. From the article: 'Customers can then offload a wide variety of software jobs to the co-processor running in a standard server, instead of buying unique, more expensive types of accelerators from third parties as they have in the past.'"

Start-up Could Kick Opteron into Overdrive

This discussion has been archived. No new comments can be posted.

Search 127 Comments Log In/Create an Account

Comments Filter:

Why read a re-written press release (Score:4, Informative)

by Threni ( 635302 ) writes: on Monday April 24, 2006 @07:46AM (#15188937)

when you can just read about it on the company's website?

http://www.drccomputer.com/pages/products.html [drccomputer.com]

A bit more accurate summary (Score:5, Informative)

by subreality ( 157447 ) writes: on Monday April 24, 2006 @07:54AM (#15188970)

They basically made a FPGA (field programmable gate array) that can plug directly into HyperTransport (the Opteron CPU bus). FPGAs let you efficiently solve many problems that a general purpose processor can't. This has been done with PCI cards before, but the PCI is too slow for many uses. Giving it direct access to HT solves that problem.

That's a pretty cool niche.

Re:Kick ass synth? (Score:2, Informative)

by alienw ( 585907 ) writes: <alienw.slashdotNO@SPAMgmail.com> on Monday April 24, 2006 @08:15AM (#15189036)

An FPGA does not make a very good DSP for the price. I suppose if it's one of the nicer ones from the Virtex series, you can get it to do DSP, but it won't be as good as the processor already in the PC. I'd say your best bet would be hacking a videocard to do the synth stuff -- it's optimized for the kind of parallel computation that DSP requires.

Re:neural networks or java? (Score:4, Informative)

by Anonymous Coward writes: on Monday April 24, 2006 @08:23AM (#15189062)

Java co-processor: it has been tried before, with negative success. Main reason: it turns out that compiling byte-code to CISC CPU assembler and running the native code gives more speed than executing byte-code directly.
In late 90's, I've been burned off in precisely such start-up. We built an ASIC Java piggy-back byte-code CPU. It worked... as a proof of an idea. It didn't give much performance boost, at best, in 20-30% range. Noone wanted it.

Re:Er.... question (Score:4, Informative)

by kinnell ( 607819 ) writes: on Monday April 24, 2006 @08:23AM (#15189063)

another Opteron would likely run at multiples of the clockspeed of that thing, and it would also be able to offload work from the *othewr* Opterons, such as disk I/O etc, giving your overall application more performance.
Clockspeed is not a measurement of performance unless you are comparing similar architectures. With FPGAs you can do everything in parallel, whereas microprocessors are inherently sequential. In effect, you can potentially complete hundreds of instructions per clock cycle, whereas a microprocessor will complete 2 or 3.
In practical terms, this product lends itself to compute intensive tasks such as signal processing, not data serving.

Re:Er.... question (Score:3, Informative)

by brunes69 ( 86786 ) writes: <[slashdot] [at] [keirstead.org]> on Monday April 24, 2006 @08:41AM (#15189136)

With FPGAs you can do everything in parallel, whereas microprocessors are inherently sequential. In effect, you can potentially complete hundreds of instructions per clock cycle, whereas a microprocessor will complete 2 or 3.
True, but if the microprocessor's clock speed is hundreds of thousands of times fater than the FPGA, then you are even again. There's no clock speed for this device in the article so we can't really compare.

Re:Er.... question (Score:4, Informative)

by kinnell ( 607819 ) writes: on Monday April 24, 2006 @09:46AM (#15189455)

The Virtex 4 [xilinx.com] FPGAs can be clocked at up to 500MHz, so we are talking about ~10-15 times slower than the processor, depending on the application. Even a simple digital filter would be faster when implemented in the FPGA, and this would only take a small fraction of the FPGA resources.

FPGA vs. General Purpose CPU (Score:4, Informative)

by DeadCatX2 ( 950953 ) writes: on Monday April 24, 2006 @10:38AM (#15189750) Journal

Lots of other comments have made clear the point that it's not easy to program this kind of hardware. Typical software programs run in a very sequential manner. In fact, trying to get cooperative parallel execution of threads is known to be a major sticking point in the average programmer's education.

Hardware, on the other hand, is massively parallel. All the "gates" (*) are all running all the time. It's like multi-threading a program, taken to the limit of infinity. However, if designed correctly, this thing can scale beyond belief, since it's all parallel.

It's also important to note that it's a Virtex4 [xilinx.com] on that card. That's one hell of an FPGA, they sure aren't cutting any corners. I'm not sure which one they're using, but some Virtex4 chips have PowerPC processors at 450 MHz.

This is definitely a niche product for now, due mainly to the lack of people who can write code in Hardware Description Languages (HDLs). But if you can figure it out, and you have an application that works on a massive scale, this may be for you.

Oh, and for all you detractors who are saying "that thing only runs at 500 MHz! How is it supposed to be faster than my 2 GHz AMD chip?" You're forgetting one very important factor. Your AMD chip executes one instruction at a time, and the important instructions are surrounded by instructions whose sole purpose is to control program flow or move data back and forth. However, the XtremeDSP slices of a Virtex4 can each execute a multiply and an add in a single cycle, and there are up to 512 of them in the most hardcore Virtex4 chip, and other logic executing in parallel can control the "program flow" and ferry data back and forth across the bus.

*: Modern FPGAs are actually built out of SRAMs that can implement arbitrary logic functions. They're no longer arrays of gates, so to speak.

Re:About Time! (Score:3, Informative)

by Gyorg_Lavode ( 520114 ) writes: on Monday April 24, 2006 @12:15PM (#15190513)

They only leak if you power them. Leakage current is for transistors that are not changing states but are powered.

10x - 20x performance? You betcha. (Score:3, Informative)

by toybuilder ( 161045 ) writes: on Monday April 24, 2006 @12:28PM (#15190606)

A dedicated co-processor with enough registers to perform a complex calculation without having to constantly ferry register values between memory and the processor, combined with the ability to run several calculations simultaneously will blow the socks off a general purpose CPU for *very specifically designed algorithms*.

There's a market for GPU's on video cards running $1,200+... People that buy them won't be satisfied with standard GPU's no matter how fast their main processors run... The custom acceleration of graphics calculation makes it worthwhile.

Now, imagine doing massive calculations (think three blackboards filled with quantum physics equations) -- and you can see how some scientific/industrial applications would go ga-ga over this stuff...

Re:Berkeley (Score:3, Informative)

by hackstraw ( 262471 ) * writes: on Monday April 24, 2006 @02:33PM (#15191568)

A fast FFT processor, for example, would make the life easier for a lot of Photoshop filters users (with the help of special drivers and plugins), it would also help the GNU Radio quite a bit, as well as other multimedia/signal/data processing applications.

There have been tons of addon cards that do FFTs, TCP offloading NICs, physics engines, or whatever you want. The problem is twofold. 1) These cards are expensive, or at the least nonfree and nonstandard as the rest of the computer and need software support to drive them 2) They often do not give the performance as advertised.

Take for example an FFT card for Photoshop filters. The image is in GPU memory and in system RAM. The image must be sent from system RAM over to the PCI card. Even if the card was sitting off of a HyperTransport, which is about the fastest external bus available on a PC today is only 3.2 GB/s. PCI is between 133 MB/s to 2133 MB/s for the new PCI-X second generation. Its common for memory busses to be in excess of 3.2 GB/s and some of the new Itaniums have something like 10.5 GB/s memory busses now.

Back to price. These cards are a niche product, so the price has to be high because the demand is low. The price of these cards can skyrocket very quickly because its common for these things to have RAM on them for cache and buffering, and this cache is often needed in the 1-4+ gigabyte range, which is not cheap in itself.

I'm not saying that I would not welcome something like an effective FFT offloading engine, but there is so much pre/post processing on the data that needs to come through the main system memory through the CPU, that the offloaders don't give you much.

For high performance computing, memory bandwidth is frequently the bottleneck, and has been for years. High end GPUs are a little different because they have had specialized busses for years (AGP and the like), and they also have the advantage of being told what to do by the CPU, given some data, and then internally processing it, and dumping the data straight to the monitor. The CPU does not need that data back. Its more or less a one way operation, the other offloader cards are usually a 2 way operation. Even in the case of TCP offloader cards, performance often does not keep up with software and general CPU improvements. Also, TCP offloaders don't work very well with things like software firewalls that want their hands in monkeying with the TCP data as well.

So, I believe at this point in time, offloader cards are not too valuable. Maybe for a specific problem or set of problems, but I haven't found one that could significantly improve performance yet.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Start-up Could Kick Opteron into Overdrive 127

Start-up Could Kick Opteron into Overdrive More Login

Start-up Could Kick Opteron into Overdrive

Why read a re-written press release (Score:4, Informative)

A bit more accurate summary (Score:5, Informative)

Re:Kick ass synth? (Score:2, Informative)

Re:neural networks or java? (Score:4, Informative)

Re:Er.... question (Score:4, Informative)

Re:Er.... question (Score:3, Informative)

Re:Er.... question (Score:4, Informative)

FPGA vs. General Purpose CPU (Score:4, Informative)

Re:About Time! (Score:3, Informative)

10x - 20x performance? You betcha. (Score:3, Informative)

Re:Berkeley (Score:3, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot