Boost UltraSPARC T1 Floating Point w/ a Graphics Card?

Boost UltraSPARC T1 Floating Point w/ a Graphics Card? 71

Posted by Cliff on Saturday April 22, 2006 @04:55PM from the computed-outside-of-the-box dept.

alxtoth asks: "All over the web, Sun's UltraSPARC T1 is described as 'not fit for floating point calculations'. Somebody has benchmarked it for HPC applications, and got results that weren't that bad. What if one of the threads could do the floating point in the GPU, as suggested here? Even if the factory setup does not expect an video card, could you insert a low profile PCI-E video card, boot Ubuntu and expect decent performance?"

Boost UltraSPARC T1 Floating Point w/ a Graphics Card?

This discussion has been archived. No new comments can be posted.

Search 71 Comments Log In/Create an Account

Comments Filter:

No, you cannot (Score:5, Insightful)

by keesh ( 202812 ) writes: on Saturday April 22, 2006 @05:09PM (#15181949) Homepage

Sun SPARC kit doesn't use a BIOS. Unfortunately, nearly all modern graphics cards that haven't been specifically designed to work on non-x86* kit rely upon the BIOS to initialise the card. This massively limits the hardware availability. PCI, sadly, is only a hardware standard.

There's been some work by David S Miller on getting BIOS emulation into the Linux kernel so that regular cards can be fooled into working, but it's not there yet and will probably fall foul of Debian's firmware loading policy (does that apply to Ubuntu too?).

Thanks for making me feel old... (Score:5, Insightful)

by pedantic bore ( 740196 ) writes: on Saturday April 22, 2006 @05:29PM (#15181998)

I remember when it was common practice to buy extra hardware to add to your system to implement fast floating point ops. First it was a box (FPS), then a few cards (Sky), then a card (Mercury), then a daughterboard (everyone), then a chip (Weitek)... and then it was on the CPU and everyone expected it to be there.
But Sun realized that the more things change, the more they stay the same; the reason why vendors got away with making floating point an expensive option was that there are lots of workloads where floating point performance is unimportant. So they applied the RISC principle and chose to not waste a lot of silicon on the T1 implementing instructions that are not needed in their target workload, but instead figure out how to get lots of concurrent threads.
Trying to improve floating point perf on a T1 by adding another card is like trying to figure out how to put wheels on a fish. It might be a cool hack and it might solve some particular problem but it doesn't generalize.
If you want floating point perf and tons of threads, wait for the rock chip from Sun (and hope that Sun stays afloat long enough to ship it). It's like a T1 only moreso, with floating point for each thread.

Feh (Score:3, Insightful)

by NitsujTPU ( 19263 ) writes: on Saturday April 22, 2006 @05:45PM (#15182043)

At that point, you're bound by the bandwidth between the graphics card and the CPU. Why not just purchase hardware that works for what you want to use it for in the first place?

Huh? CAD on Macs/Windows??? (Score:5, Insightful)

by PaulBu ( 473180 ) writes: on Saturday April 22, 2006 @08:43PM (#15182599) Homepage

Most real life CAD software (as in, what is used to build chips inside your little computer box or your cellphone) used to be (~8 years ago) on Solaris, occasional HP/AIX, Linux. Now it is Linux, Solaris, the rest are somewhat supported, but not exactly healthy... You can get some FPGA/PCB/Solid 3D CAD on Windows, but it is nowhere near the true industrial-strength quality. Think about it this way, if you pay $100,000 for a seat, it does not really matter how much the hardware is and Sun's was winning due to general stability/availability. IBM (the big Cadence shop) pushed Cadence to release the Linux version of their software simultaneously with the Solaris version about 5 years ago, since then Linux was gaining popularity...

There are no good techical reasons not to recompile something like this for OS-X, but if you can imagine porting a package which comes as a bookshelf of CDs from UN*X to Win API, I'd like some of the stuff you are smoking! ;-)

Paul

GPUs == Worthless Floating Point Precision (Score:4, Insightful)

by mosel-saar-ruwer ( 732341 ) writes: on Saturday April 22, 2006 @10:18PM (#15182875)

nVidia & IBM/Sony/Cell/Playstation can perform only 32-bit single-precision floating point calculations in hardware. [IBM/Sony can, at least in theory, perform 64-bit double-precision floating point calculations, but the implementation involves some weird software emulation thingamabob which invokes a massive performance penalty.]
ATi is even worse - last I checked, they could perform only 24-bit "three-quarters"-precision floating point calculations in hardware.
And just in case you aren't aware, 32-bit single-precision floats are essentially worthless for anyone doing even the simplest mathematical calculations; for instance, with 32-bit single-precision floats, integer granularity is lost at 2 ^ 24 = 16M, i.e.

16777216 + 0 = 16777216
16777216 + 1 = 16777216
16777216 + 2 = 16777218
16777216 + 3 = 16777220
16777216 + 4 = 16777220
16777216 + 5 = 16777220
16777216 + 6 = 16777222
16777216 + 7 = 16777224
16777216 + 8 = 16777224
16777216 + 9 = 16777224
16777216 + 10 = 16777226
16777216 + 11 = 16777228
16777216 + 12 = 16777228
16777216 + 13 = 16777228
16777216 + 14 = 16777230
16777216 + 15 = 16777232
16777216 + 16 = 16777232
etc

Now while 64-bit double-precision floats [or "doubles"] are probably accurate enough for most financial calculations, where, generally speaking, accuracy is only needed to the nearest 1/100th [i.e. to the nearest cent], 64-bit doubles are still more or less worthless to the mathematician, physicist, and engineer.
For instance, consider the work of Professor Kahan at UC-Berkeley:

William Kahan [berkeley.edu]

In particular, read a few of these papers from the late nineties:

PDF File: Roundoff Degrades an Idealized Cantilever [berkeley.edu]
PDF File: How JAVA's Floating-Point Hurts Everyone Everywhere [berkeley.edu]
PDF File: Matlab's Loss is Nobody's Gain [berkeley.edu]

At the time, Kahan was arguing in favor of using the full power of the Intel/AMD 80-bit extended precision doubles [i.e. embedding 64-bit doubles in an 80-bit space, performing calculations with the greater accuracy afforded therein, and then rounding the result back down to 64-bits and returning that as your answer], but, truth be told, the Sine Qua Non of hardware-based calculations is true 128-bit "quad-precision" floating point calculations as performed in hardware.
Sun has a "quad-precision" floating point number for Solaris/SPARC, but, sadly, it's a software hack, and, like IBM/Sony/Cell/Playstation, far too slow to be used in practice.
I believe that IBM makes a chip for the Z-Series mainframe, which can perform 128-bits in hardware, but I imagine that it's prohibitively expensive [if you could even convince IBM to sell it to you in the first place].
The best configuration here would probably look like a fancy-schmantzy Digitial Signal Processor [DSP] chipset, from someone like Texas Instruments, capable of 128-bit hardware calculations, mounted onto a card that would plug into something very fast, like a 16x PCIe bus, which in turn would be connected to a HyperTransport bus [but boy, wouldn't it be really cool if the DSP lay directly on the HyperTransport bus itself?].
By the way, if anyone knows of a company that's making such a card, with stable drivers [or, God forbid, a motherboard with a socket for a 128-bit DSP on the HyperTransport bus], then please tell me about it, 'cause I'd be very interested in purchasing such a thing.

Read the rest of this comment...

Re:GPUs == Worthless Floating Point Precision (Score:1, Insightful)

by Anonymous Coward writes: on Sunday April 23, 2006 @12:29PM (#15185113)

IBM/Sony can, at least in theory, perform 64-bit double-precision floating point calculations, but the implementation involves some weird software emulation thingamabob which invokes a massive performance penalty.
Just, for the record. Cell uses no "software emulation" for their double calculations. It's 7 cycle latency to do two DP multiply-add, which is certainly not slow. The "slow" part is that the throughput is also 7 cycles, meaning that multiple DP MADDs don't pipeline. So, while this cuts the theoretical maximum GFLOPs down significantly (SP MADDs can issue one every cycle, in addition to a non-FP instr), the "in practice" performance is much closer...
and we're still talking (4 flops / 7 cycles) * (8 SPEs) * x Ghz => 18.2 DP Gflops @ 4.0 GHz (pretty freaking fast!)
Oh, and GPUs aren't viable as FPUs because the latency sucks so hard.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Boost UltraSPARC T1 Floating Point w/ a Graphics Card? 71

Boost UltraSPARC T1 Floating Point w/ a Graphics Card? More Login

Boost UltraSPARC T1 Floating Point w/ a Graphics Card?

No, you cannot (Score:5, Insightful)

Thanks for making me feel old... (Score:5, Insightful)

Feh (Score:3, Insightful)

Huh? CAD on Macs/Windows??? (Score:5, Insightful)

GPUs == Worthless Floating Point Precision (Score:4, Insightful)

Re:GPUs == Worthless Floating Point Precision (Score:1, Insightful)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot