Boost UltraSPARC T1 Floating Point w/ a Graphics Card?

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

Boost UltraSPARC T1 Floating Point w/ a Graphics Card? 71

Posted by Cliff on Saturday April 22, 2006 @04:55PM from the computed-outside-of-the-box dept.

alxtoth asks: "All over the web, Sun's UltraSPARC T1 is described as 'not fit for floating point calculations'. Somebody has benchmarked it for HPC applications, and got results that weren't that bad. What if one of the threads could do the floating point in the GPU, as suggested here? Even if the factory setup does not expect an video card, could you insert a low profile PCI-E video card, boot Ubuntu and expect decent performance?"

This discussion has been archived. No new comments can be posted.

Boost UltraSPARC T1 Floating Point w/ a Graphics Card?

Load All Comments

Search 71 Comments Log In/Create an Account

Comments Filter:

- Re:Video card for Sparc? (Score:2)
  
  by Eideewt ( 603267 ) writes:
  
  Well, there might be a reason to if you could use one to boost performance with one.
- Re:Video card for Sparc? (Score:1, Interesting)
  
  by Anonymous Coward writes:
  
  Ever heard of CAD?
  - Re:Video card for Sparc? (Score:2)
    
    by nukem996 ( 624036 ) writes:
    
    Why would you buy a Sun macine for CAD? Most CAD software is for Linux, Mac, or Win. Itd be cheaper to just build a box or buy a Dell.
    - Re:Video card for Sparc? (Score:2)
      
      by MichaelSmith ( 789609 ) writes:
      
      Most CAD software is for Linux, Mac, or Win.
      Just curious. Do you know of any good CAD software for Linux other than qcad?
      - Re:Video card for Sparc? (Score:1)
        
        by pizpot ( 622748 ) writes:
        
        Just curious. Do you know of any good CAD software for Linux other than qcad?
        Well, IBM makes what is it called Catia for linux, and EDS is making Unigraphics NX4 for linux (why it was late). Both were unix products before going to windows too.
      - Re:Video card for Sparc? (Score:2)
        
        by Boone^ ( 151057 ) writes:
        
        All chip design/verification/physical EDA software that Synopsys and Cadence create has Linux binaries. Maybe not "CAD", but it's engineering software nonetheless.
    - Huh? CAD on Macs/Windows??? (Score:5, Insightful)
      
      by PaulBu ( 473180 ) writes: on Saturday April 22, 2006 @08:43PM (#15182599) Homepage
      
      Most real life CAD software (as in, what is used to build chips inside your little computer box or your cellphone) used to be (~8 years ago) on Solaris, occasional HP/AIX, Linux. Now it is Linux, Solaris, the rest are somewhat supported, but not exactly healthy... You can get some FPGA/PCB/Solid 3D CAD on Windows, but it is nowhere near the true industrial-strength quality. Think about it this way, if you pay $100,000 for a seat, it does not really matter how much the hardware is and Sun's was winning due to general stability/availability. IBM (the big Cadence shop) pushed Cadence to release the Linux version of their software simultaneously with the Solaris version about 5 years ago, since then Linux was gaining popularity...
      
      There are no good techical reasons not to recompile something like this for OS-X, but if you can imagine porting a package which comes as a bookshelf of CDs from UN*X to Win API, I'd like some of the stuff you are smoking! ;-)
      
      Paul
      
      Parent Share
      twitter facebook
      - Re:Huh? CAD on Macs/Windows??? (Score:3, Informative)
        
        by nukem996 ( 624036 ) writes:
        
        Ive done some simple CAD stuff in school and all they use is AutoCAD and PTC. I guess I dont know to much about this stuff :\
        
        Re:Huh? CAD on Macs/Windows??? (Score:2)
        
        by Kadin2048 ( 468275 ) writes:
        
        The kind of CAD they're talking about in the *NIX workstation products is like an order of magnitude or more in complexity up from what most people do with AutoCAD. In short, some of those programs (the old "workstation" standbys) make AutoCAD look like something you'd use at Home Depot to lay out your new kitchen, while they themselves could be used to design an oil rig on the North Sea. They're not even close.
        
        The gap may have narrowed from what it once was, but there are still things (particularly in some
        
        Re:Huh? CAD on Macs/Windows??? (Score:2)
        
        by nukem996 ( 624036 ) writes:
        
        heh ok Im just wondering since Im forced to use shitty win machines at school with AutoCad(Intel graphics 2.4ghz etc) while my machine at home runs a 7800 GT and an AMD X2 4400 on Linux.
        
        Re:Huh? CAD on Macs/Windows??? (Score:2)
        
        by drinkypoo ( 153816 ) writes:
        
        (A while back I found what I thought was the most expensive PC I'd ever seen, it was a dual-proc Opteron from IBM -- not even a RISC box! -- that was close to nine grand. I think it was the "Intellistation A Pro" you can Google it.)
        ITYM, "not even a RISC-instruction set box!" since every intel chip since the Pentium and every AMD chip since the Am586 is internally RISC.
        Aside from that nit, you're totally right. I remember in the 90s seeing a video for IBM CAEDS, a CAD program that ran only on RS/6k
      - Re:Huh? CAD on Macs/Windows??? (Score:2)
        
        by drgonzo59 ( 747139 ) writes:
        
        Actually the industrial grade CAD/CAE/CAM programs used to be written for SGI, mainly just because they had better hardware for visualization (and pretty good servers too, we had a couple of quad R12000 machines). I used to work on a popular CAD/CAM/CAE application, the full package was millions and millions of lines of C,C++,Fortran, and custom scripting language. The primary customer and development platform was SGI, everyone had a nice SGI workstation on their desk, but we had ports to HP/UX, Solaris and
    - Re:Video card for Sparc? (Score:3, Informative)
      
      by csirac ( 574795 ) writes:
      
      Most high-end CAD products that matter run on Solaris. It hasn't been until the last few years that they mostly have a Linux option, which is nice.
      - Re:Video card for Sparc? (Score:2)
        
        by nukem996 ( 624036 ) writes:
        
        Most of the classes ive ever taken use Autocad, which is win only. Would you know of a good Linux replacement of Autocad(perferribly compatible to)
        
        Re:Video card for Sparc? (Score:2)
        
        by csirac ( 574795 ) writes:
        
        You could move to Solaris and run AutoCAD for Solaris; otherwise This article [architectafrica.com] about running the Windows version under Linux looks useful.
        
        The CAD applications I'm familiar with are all related to electronic engineering... Cadence, Mentor, OrCAD, EAGLE, etc. Some of these have Solaris versions without a Linux option, some have both. I'm sure there are good generic CAD programs out there for Linux, but I haven't used any.
        
        This link [tech-edv.co.at] looks useful.
No, you cannot (Score:5, Insightful)

by keesh ( 202812 ) writes: on Saturday April 22, 2006 @05:09PM (#15181949) Homepage

Sun SPARC kit doesn't use a BIOS. Unfortunately, nearly all modern graphics cards that haven't been specifically designed to work on non-x86* kit rely upon the BIOS to initialise the card. This massively limits the hardware availability. PCI, sadly, is only a hardware standard.

There's been some work by David S Miller on getting BIOS emulation into the Linux kernel so that regular cards can be fooled into working, but it's not there yet and will probably fall foul of Debian's firmware loading policy (does that apply to Ubuntu too?).

Share
twitter facebook
- Yes you can.. maybe not on SPARC though.. (Score:5, Informative)
  
  by NekoXP ( 67564 ) writes: on Saturday April 22, 2006 @05:17PM (#15181967) Homepage
  
  We produce an Open Firmware solution which includes an x86 emulator to bootstrap x86 hardware, specifically graphics cards and the like.
  
  PowerPC boards, PC graphics chips with x86 BIOS, no driver edits required on the OS side.. it is there like it would be on a PC.
  
  http://metadistribution.org/blog/Blog/78A3C88E-1CE 7-45B8-9C79-420134DD9B8E.html [metadistribution.org]
  http://www.genesippc.com/ [genesippc.com]
  
  Parent Share
  twitter facebook
- Re:No, you cannot (Score:3, Informative)
  
  by Jeff DeMaagd ( 2015 ) writes:
  
  That problem had been solved for Alpha computers around 1992. I was able to choose from any standard PCI video card, though driver support in the OS was a different issue. There may be some patent issues though, so the approach might need to be different.
- Re:No, you cannot (Score:1, Interesting)
  
  by Anonymous Coward writes:
  
  [...]will probably fall foul of Debian's firmware loading policy
  
  No, it won't. The firmware won't be shipped with debian, it would be run directly from the rom that is on the very card that is to be initialized. Debian has shipped XFree86 for a long time, and it supports a similar method to initialize secondary graphics cards that require their bios to set them up to function properly (probably only works on x86 CPUs).
- Re:No, you cannot (Score:2)
  
  by antime ( 739998 ) writes:
  
  Lack of a BIOS can be worked around (eg. the Pegasos boards [pegasosppc.com] have some sort of emulation built into its firmware that allows you to use normal PC graphics cards despite being PPC and OpenFirmware-based), but without drivers you ain't doing jack shit. And that's a very big problem if you're not using an x86 CPU. The open-source r300 driver is making progress but is not near production-quality and AFAIK nothing similar exists for nVidia chips yet, so unless you can convince Ati and nVidia to port their drivers
- Sun Currently OEMs ATI Radeon video cards (Score:2)
  
  by GuyverDH ( 232921 ) writes:
  
  I have yet to see a low profile version, however, I have seen v210s and v240s with this card in them. It could only be a matter of time.
Probably, but it's not an optimal solution (Score:5, Informative)

by the_humeister ( 922869 ) writes: on Saturday April 22, 2006 @05:12PM (#15181955)

Especially since current GPUs don't implement double-precision floating point math. Heh, in that vein you could add a dual Opteron single-board computer into one of the expansion slots...

Share
twitter facebook
- Re:Probably, but it's not an optimal solution (Score:2)
  
  by Jeff DeMaagd ( 2015 ) writes:
  
  Heh, in that vein you could add a dual Opteron single-board computer into one of the expansion slots...
  
  I'm not certain of the cost of the T1 systems, but I would think that if FPU is important, you'd rather just go for a dual-dual-core server. The T1 systems are compatible with more memory though, 32GB for the T1000 vs 16GB for what I've seen in the AMD dual processor workstations.
  - Re:Probably, but it's not an optimal solution (Score:2)
    
    by Wesley Felter ( 138342 ) writes:
    
    64GB two-socket Opteron systems (e.g. IWill DK88) are rare but available for people who need them.
    - Re:Probably, but it's not an optimal solution (Score:2)
      
      by Glonoinha ( 587375 ) writes:
      
      64GB ought to be enough for anybody.
      ~Glonoinha, April 23, 2006
      - Re:Probably, but it's not an optimal solution (Score:2)
        
        by syukton ( 256348 ) writes:
        
        ...anybody not running Vista.
  - Re:Probably, but it's not an optimal solution (Score:2)
    
    by gormanly ( 134067 ) writes:
    
    Actually, the Sun Fire T1000 [sun.com] only supports 16GB - but the T2000 [sun.com] does support 32GB. If you're looking at these, the alternative Opteron solution would be another rack-mounted system rather than a workstation, and the obvious candidate is the Sun Fire V40z [sun.com] with 32GB and 4 dual-core Opterons. Which, btw, are very nice systems.
Thanks for making me feel old... (Score:5, Insightful)

by pedantic bore ( 740196 ) writes: on Saturday April 22, 2006 @05:29PM (#15181998)

I remember when it was common practice to buy extra hardware to add to your system to implement fast floating point ops. First it was a box (FPS), then a few cards (Sky), then a card (Mercury), then a daughterboard (everyone), then a chip (Weitek)... and then it was on the CPU and everyone expected it to be there.
But Sun realized that the more things change, the more they stay the same; the reason why vendors got away with making floating point an expensive option was that there are lots of workloads where floating point performance is unimportant. So they applied the RISC principle and chose to not waste a lot of silicon on the T1 implementing instructions that are not needed in their target workload, but instead figure out how to get lots of concurrent threads.
Trying to improve floating point perf on a T1 by adding another card is like trying to figure out how to put wheels on a fish. It might be a cool hack and it might solve some particular problem but it doesn't generalize.
If you want floating point perf and tons of threads, wait for the rock chip from Sun (and hope that Sun stays afloat long enough to ship it). It's like a T1 only moreso, with floating point for each thread.

Share
twitter facebook
- Re:Thanks for making me feel old... (Score:2)
  
  by fm6 ( 162816 ) writes:
  
  It might be a cool hack and it might solve some particular problem but ...
  
  There's no "but" here. Cool hacks don't happen because they're useful, they happen because they're cool.
- Re:Thanks for making me feel old... (Score:2)
  
  by Doc Ruby ( 173196 ) writes:
  
  Meanwhile, GPU developers have created a component that processes floating point math very quickly, sold for much less $:FLOPS than Sparcs (or any other CPU). Combining a T1 and GPGPU offers "best of breed" economies of scale appropriate to each component, like installing 3rd party memory and HD rather than the expensive Sun brands.
  
  That's why GPGPU is an interesting strategy. GPU APIs offer parallelism, too. When those APIs can be harnessed with bus signalling that's high-enough level symbolically to exploi
  - Re:Thanks for making me feel old... (Score:1, Informative)
    
    by Anonymous Coward writes:
    
    >Combining a T1 and GPGPU offers "best of breed" economies of scale appropriate to each component, like installing 3rd party memory and HD rather than the expensive Sun brands.
    
    Combining a T1 and a GPU offers you jack, since GPUs use single-precision arithmetic.
  - Re:Thanks for making me feel old... (Score:2)
    
    by pedantic bore ( 740196 ) writes:
    
    Well, no, if you want flops/$ then the signal processing chips used in cell phones and MP3 players are the clear winners. There are some real screamers here. But they're a bit complicated to program and don't function well as general purpose processors, which is why they're primarily used in systems where they can be programmed once and then shipped by the million.
    As I wrote before, I'm sure there's some workload where it makes sense to mate a T1 and a GPU (besides the obvious one, i.e., rendering grap
    - Re:Thanks for making me feel old... (Score:4, Interesting)
      
      by Doc Ruby ( 173196 ) writes: on Sunday April 23, 2006 @02:05AM (#15183468) Homepage Journal
      
      Those DSPs you mention aren't CPUs, and they're not available on PCI cards - plus the programmability you mention.
      
      The way to think about the use of GPGPU in a host with its own (GP) CPU is client/server computing. I put together such a system in 1990, a 12MHz 80286, with 4 12.5MFLOPS DSPs (AT&T DSP32c) and an FPGA "scheduler" on the ISA card. The 286 ran a loop sending data and commands to a memory mapped page on the card's SRAM, and copying the page when a status register was set. I had realtime 24bit VGA renderings of megapolygons at 30FPS, all processed on the DSPs. The systems have all scaled up, but the price improvement per FLOPS of the GPUs over the CPU is even better now than then.
      
      As you say, the key is keeping the compute servers full, which amortizes the signalling overhead best, and keeping the signaling across the bus high-level enough that the bandwidth doesn't bottleneck. There are lots of demanding apps now which could use that architecture. Audio compression is my favorite - I'm waiting to stuff a $1000 P4 with 6 $400 dual GPUs, and beat the performance of any <$10K server, scalable down to $1500. That's the kind of host that could really transform telephony.
      
      Parent Share
      twitter facebook
      - Re:Thanks for making me feel old... (Score:2)
        
        by bhima ( 46039 ) writes:
        
        That that this is directly related... but it is interesting and related in the sense that my first effort in DSP work was moer or less bottlenecked at the ISA bus... and lately have been tinkering with a design that certainly would be by a PCI-X or PCI-e bus.
        
        http://www.drccomputer.com/pages/products.html [drccomputer.com]
      - Re:Thanks for making me feel old... (Score:2)
        
        by pedantic bore ( 740196 ) writes:
        
        the price improvement per FLOPS of the GPUs over the CPU is even better now
        Yes, but the price improvement per bandwidth and especially latency of the interconnect between the the two is much worse. Going off-chip for anything has a huge cost; in order for it to make sense, you have to be able to amortize that cost.
        And those DSP chips are CPUs in the conventional sense, although they don't have all the niceties that modern CPUs have (which, ironically, also often used to be implemented as co-processor
      - Damn, brings back fond memories (Score:2)
        
        by marcus ( 1916 ) writes:
        
        I worked on a couple of similar projects using TI C51 and AT&T DSP32 processors. I recall that the 286 could not keep up with the data rate using the Borland C compiler. I had to delve into x86 asm to optimize some loops in order to get it to keep up. The C51 board was a telecom voice processor including PCM modems and such. The DSP32 was a multi-channel(as in T1) DTMF decoder. It ended up running at 98% utilization(50MHz) with a *lot* of hand optimized code...
        
        Fun, bleeding edge, stuff back then.
        
        Re:Damn, brings back fond memories (Score:2)
        
        by Doc Ruby ( 173196 ) writes:
        
        Yeah, the good old days when AT&T made CPUs like the DSP32c with a C language ASM instruction set.
        
        As we can see from the current discussion, those same issues and techniques (or at least architectural patterns) are still relevant. In proportion - about a thousand times faster, but equally across the whole uneven platform.
      - Re:Thanks for making me feel old... (Score:2)
        
        by networkBoy ( 774728 ) writes:
        
        "Those DSPs you mention aren't CPUs, and they're not available on PCI cards"
        
        Since when (on both counts)?
        DSP == Digital Signal _Processor_ which is the Central Processor Unit on several platforms I know of.
        
        http://www.signalogic.com/index.pl?page=m44 [signalogic.com]
        http://www.bittware.com/products/type/dsp-pci.cfm [bittware.com]
        http://www.innovative-dsp.com/products/delfin.htm [innovative-dsp.com]
        http://www.innovative-dsp.com/products/toro.htm [innovative-dsp.com]
        http://www.globalspec.com/FeaturedProducts/Detail/ InnovativeIntegration/CONEJO_64_bit_PCI_DSP_Card/1 1265/0?fromSpotli [globalspec.com]
- Re:Thanks for making me feel old... (Score:1)
  
  by htd2 ( 854946 ) writes:
  
  Sun's origional Motorola 68K based workstations had optional FPU's as did the first "desktop" SPARC workstation the 4/110. Sun workstations or servers equipped with a VME bus also had access to an optional Weitek FPU unit.
  
  Even more exotic was the TAAC-1 which was a wide instruction word processor which could be used for FFT's, imaging etc.
  
  One correction the TII (Niagara II) will be the first heavily multi-threaded SPARC CPU with one FPU per core, it is due out next year with rock being due out in 2008.
  - Re:Thanks for making me feel old... (Score:2)
    
    by LWATCDR ( 28044 ) writes:
    
    There was another system that had an optional FPU. I think it was called the IBM PC. You could get an FPU called the 8087. It was expensive and your software had to be compiled to support which very few programs where.
    Was the Weitek an FPU or a vector processor?
    - Re:Thanks for making me feel old... (Score:1)
      
      by htd2 ( 854946 ) writes:
      
      The weitek unit was an FPU. Like the IBM 8087 you had to compile applications specifically to use the weitek unit.
      - Re:Thanks for making me feel old... (Score:2)
        
        by LWATCDR ( 28044 ) writes:
        
        Do you know if modern compilers for the x86 compile for an FPU and then emulate it if there isn't one? Or do they just expect an FPU these days? Does it depend on your target CPU?
        I almost never need to use floats in my code so I haven't really looked in a long time.
Wait for the T2 (Score:4, Interesting)

by IvyKing ( 732111 ) writes: on Saturday April 22, 2006 @05:32PM (#15182010)

The T2 is supposed to have an FPU for each core, so would be a simpler solution tan trying to use a grpahics card. The T2 is also supposed to have double the number of threads per core and even more memory bandwidth.

Share
twitter facebook
- Re:Wait for the T2 (Score:1)
  
  by drix ( 4602 ) writes:
  
  Are you sure of that? I thought the whole point of the one FPU per chip was to dramatically cut down on power consumption, which is one of Niagara's main selling points.
  - Re:Wait for the T2 (Score:2)
    
    by bbrack ( 842686 ) writes:
    
    there is a FPU per core and the power for niagara2 is still supposed to be remarkably low
    - Re:Wait for the T2 (Score:1)
      
      by htd2 ( 854946 ) writes:
      
      Correct, T2 is expected to be lower power or equivalent to the T1, part of this is because T2 will be built in a 65nm process as opposed to the 90nm process used to fabricate T1's.
      
      The changes in T2 are 2 pipelines per core, up from 1. 8 threads per core, up from 4. FPU per core up from 1 per module. Faster memory subsystem, additional hw support for encryption and nework offload. On chip cache is expected to remain the same.
Feh (Score:3, Insightful)

by NitsujTPU ( 19263 ) writes: on Saturday April 22, 2006 @05:45PM (#15182043)

At that point, you're bound by the bandwidth between the graphics card and the CPU. Why not just purchase hardware that works for what you want to use it for in the first place?

Share
twitter facebook
- Re:Feh (Score:1)
  
  by Mr Z ( 6791 ) writes:
  
  Why not just purchase hardware that works for what you want to use it for in the first place?
  
  What if you want a better solution than the ones that are normally available?
  - Re:Feh (Score:2)
    
    by NitsujTPU ( 19263 ) writes:
    
    This is a workaround, and usually not a very good one. I've seen people do very specialized things by moving the floating point stuff off to video cards, but for general computation, I think it's a rather poor solution.
    
    IE, this is not a better solution thna the ones that are normally available.
Will never work properly.... (Score:4, Informative)

by Fallen Kell ( 165468 ) writes: on Saturday April 22, 2006 @06:38PM (#15182192)

All kinds of problems will arise with a setup like this. Performance will possbily boost for certain things, but they need to be coded properly themselves, but code is not written for a unique setup like this. Multi-threaded code will be under the assumption that all CPU's will have approximitely the same abilities (in other words, they do not split floating point ops into one thread and i/o and int operations into other threads). Any thread for the application will potentially have floating point operations mixed with other operations.
Now even if you custom code an application to do all floating point work in a specific thread, you would need to completely modify the kernel thread management sub-systems. The threads themselves would need meta flag data to signify what "kind" of thread they are so that the "floating point thread(s)" are queued for running on the GPU and not on the T1 (unless there are idle T1 cores and the GPU is already busy).
Now even if you have the above changed, the only thing this will work on is custom made applications, in other words, you will need to completely re-write anything and everything to take advantage of this setup. This really isn't viable when you may possibly be dealing with non-open-source products like Matlab or Oracle. Even with open source products, it will take MAJOR rework to implement a change like this.
The T1 is designed as it is, a multi-core processor that would make a very good NFS Data Server, ftp server, or web host server with highly efficient power usage. It is NOT a database, application, or HPC server core. Too many of the latter operations require too much floating point operations to be run efficiently on the T1. In a pinch you can use it for them, but it will not shine in that application.

Share
twitter facebook
- Re:Will never work properly.... (Score:2)
  
  by LWATCDR ( 28044 ) writes:
  
  Why would a database server need floating point?
  I have never written on but I have written btrees and hash algorithms and they never used floating point.
  For a database server I would guess you would tend to be IO bound.
  You do have a point in that the T1 is a good platform for a web server or file server but not ideal for many other tasks. I wonder how is it's SSL performance is?
- Re:Will never work properly.... (Score:1)
  
  by htd2 ( 854946 ) writes:
  
  Not quite sure how that got modded informative.
  
  DBMS don't require FPU performance since they don't issue floating point instructions. The app server market is also dominated by integer workloads, think Java and J2EE app servers as an example.
  
  The T1 looks like an exceptionally effective Java/J2EE platform from the slew of great benchmark results Sun has published for the paltform. It is also no slouch as a DBMS platform as is SAP results show. It does lack single threaded performance so its going to be
GPUs == Worthless Floating Point Precision (Score:4, Insightful)

by mosel-saar-ruwer ( 732341 ) writes: on Saturday April 22, 2006 @10:18PM (#15182875)

nVidia & IBM/Sony/Cell/Playstation can perform only 32-bit single-precision floating point calculations in hardware. [IBM/Sony can, at least in theory, perform 64-bit double-precision floating point calculations, but the implementation involves some weird software emulation thingamabob which invokes a massive performance penalty.]
ATi is even worse - last I checked, they could perform only 24-bit "three-quarters"-precision floating point calculations in hardware.
And just in case you aren't aware, 32-bit single-precision floats are essentially worthless for anyone doing even the simplest mathematical calculations; for instance, with 32-bit single-precision floats, integer granularity is lost at 2 ^ 24 = 16M, i.e.

16777216 + 0 = 16777216
16777216 + 1 = 16777216
16777216 + 2 = 16777218
16777216 + 3 = 16777220
16777216 + 4 = 16777220
16777216 + 5 = 16777220
16777216 + 6 = 16777222
16777216 + 7 = 16777224
16777216 + 8 = 16777224
16777216 + 9 = 16777224
16777216 + 10 = 16777226
16777216 + 11 = 16777228
16777216 + 12 = 16777228
16777216 + 13 = 16777228
16777216 + 14 = 16777230
16777216 + 15 = 16777232
16777216 + 16 = 16777232
etc

Now while 64-bit double-precision floats [or "doubles"] are probably accurate enough for most financial calculations, where, generally speaking, accuracy is only needed to the nearest 1/100th [i.e. to the nearest cent], 64-bit doubles are still more or less worthless to the mathematician, physicist, and engineer.
For instance, consider the work of Professor Kahan at UC-Berkeley:

William Kahan [berkeley.edu]

In particular, read a few of these papers from the late nineties:

PDF File: Roundoff Degrades an Idealized Cantilever [berkeley.edu]
PDF File: How JAVA's Floating-Point Hurts Everyone Everywhere [berkeley.edu]
PDF File: Matlab's Loss is Nobody's Gain [berkeley.edu]

At the time, Kahan was arguing in favor of using the full power of the Intel/AMD 80-bit extended precision doubles [i.e. embedding 64-bit doubles in an 80-bit space, performing calculations with the greater accuracy afforded therein, and then rounding the result back down to 64-bits and returning that as your answer], but, truth be told, the Sine Qua Non of hardware-based calculations is true 128-bit "quad-precision" floating point calculations as performed in hardware.
Sun has a "quad-precision" floating point number for Solaris/SPARC, but, sadly, it's a software hack, and, like IBM/Sony/Cell/Playstation, far too slow to be used in practice.
I believe that IBM makes a chip for the Z-Series mainframe, which can perform 128-bits in hardware, but I imagine that it's prohibitively expensive [if you could even convince IBM to sell it to you in the first place].
The best configuration here would probably look like a fancy-schmantzy Digitial Signal Processor [DSP] chipset, from someone like Texas Instruments, capable of 128-bit hardware calculations, mounted onto a card that would plug into something very fast, like a 16x PCIe bus, which in turn would be connected to a HyperTransport bus [but boy, wouldn't it be really cool if the DSP lay directly on the HyperTransport bus itself?].
By the way, if anyone knows of a company that's making such a card, with stable drivers [or, God forbid, a motherboard with a socket for a 128-bit DSP on the HyperTransport bus], then please tell me about it, 'cause I'd be very interested in purchasing such a thing.

Read the rest of this comment...

Share
twitter facebook
- Re:GPUs == Worthless Floating Point Precision (Score:2)
  
  by pla ( 258480 ) writes:
  
  truth be told, the Sine Qua Non of hardware-based calculations is true 128-bit "quad-precision" floating point calculations as performed in hardware.
  
  For in-hardware calculation, yes. For a quick approximation or when the result has no serious consequences, yes. For anyone serious about getting the correct answer, no no no
  
  We (by which I mean CS, math, and hard-science folks) have known since the earliest days of floating point that it has inherent, unavoidable flaws that no arbitrary fixed number of
  - Re:GPUs == Worthless Floating Point Precision (Score:1, Informative)
    
    by Anonymous Coward writes:
    
    Unfortunately, and as one of your links mentions, I seriously wonder if many of the current generation of programmers even knows about this issue, nevermind cares (Huh, I sound like a cranky old man now).
    
    Not cranky and old enough.
    
    If you care about your answer, no matter how many bits the FPU supports, you do it in software. Period. You use GMP, and don't round until the final result... and while that might not always prove possible due to having finite memory, I highly doubt we'll ever see even a 1024-bit F
- Re:GPUs == Worthless Floating Point Precision (Score:1, Insightful)
  
  by Anonymous Coward writes:
  
  IBM/Sony can, at least in theory, perform 64-bit double-precision floating point calculations, but the implementation involves some weird software emulation thingamabob which invokes a massive performance penalty.
  Just, for the record. Cell uses no "software emulation" for their double calculations. It's 7 cycle latency to do two DP multiply-add, which is certainly not slow. The "slow" part is that the throughput is also 7 cycles, meaning that multiple DP MADDs don't pipeline. So, while this cuts the t
- Re:GPUs == Worthless Floating Point Precision (Score:2)
  
  by IvyKing ( 732111 ) writes:
  
  At the time, Kahan was arguing in favor of using the full power of the Intel/AMD 80-bit extended precision doubles [i.e. embedding 64-bit doubles in an 80-bit space, performing calculations with the greater accuracy afforded therein, and then rounding the result back down to 64-bits and returning that as your answer], but, truth be told, the Sine Qua Non of hardware-based calculations is true 128-bit "quad-precision" floating point calculations as performed in hardware.
  The CDC 6600's single precision arit
- Re:GPUs == Worthless Floating Point Precision (Score:2)
  
  by ponos ( 122721 ) writes:
  
  And just in case you aren't aware, 32-bit single-precision floats are essentially worthless for anyone doing even the simplest mathematical calculations; for instance, with 32-bit single-precision floats, integer granularity is lost at 2 ^ 24 = 16M, i.e.
  The error in floating point calculations is supposed to be roughly 2^-N, where N is the number of bits. Although some ALGORITHMS can be unstable, because they use series of operations that greatly increase error, many useful algorithms can be accurately
- Re:GPUs == Worthless Floating Point Precision (Score:2)
  
  by merlin_jim ( 302773 ) writes:
  
  wouldn't it be really cool if the DSP lay directly on the HyperTransport bus
  
  You may not be aware, but AMD just released the new HyperTransport spec version - and it includes along with the usual speed and signaling imporvements, externally connected devices.
- Re:GPUs == Worthless Floating Point Precision (Score:2)
  
  by networkBoy ( 774728 ) writes:
  
  Why not the virtex FPGA setup: http://www.theregister.co.uk/2006/04/21/drc_fpga_m odule/ [theregister.co.uk]
  I'm sure quad (or even possibly oct.) precision floats could be implemented in that bad boy.
  As I said in an earlier thread, this has my intel fanboi status at risk...
  -nB
Framebuffer in UST1 (Score:1)

by gentimjs ( 930934 ) writes:

In theory, if you run the mobo outside its normal case, you could throw a supported-on-sparc sun framebuffer in it and have things work .... not that I've got one handy nor would be willing to try and splice it into an atx chassis or whatnot ....

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Re:Video card for Sparc? (Score:2)

Re:Video card for Sparc? (Score:1, Interesting)

Re:Video card for Sparc? (Score:2)

Re:Video card for Sparc? (Score:2)

Re:Video card for Sparc? (Score:1)

Re:Video card for Sparc? (Score:2)

Huh? CAD on Macs/Windows??? (Score:5, Insightful)

Re:Huh? CAD on Macs/Windows??? (Score:3, Informative)

Re:Huh? CAD on Macs/Windows??? (Score:2)

Re:Huh? CAD on Macs/Windows??? (Score:2)

Re:Huh? CAD on Macs/Windows??? (Score:2)

Re:Huh? CAD on Macs/Windows??? (Score:2)

Re:Video card for Sparc? (Score:3, Informative)

Re:Video card for Sparc? (Score:2)

Re:Video card for Sparc? (Score:2)

No, you cannot (Score:5, Insightful)

Yes you can.. maybe not on SPARC though.. (Score:5, Informative)

Re:No, you cannot (Score:3, Informative)

Re:No, you cannot (Score:1, Interesting)

Re:No, you cannot (Score:2)

Sun Currently OEMs ATI Radeon video cards (Score:2)

Probably, but it's not an optimal solution (Score:5, Informative)

Re:Probably, but it's not an optimal solution (Score:2)

Re:Probably, but it's not an optimal solution (Score:2)

Re:Probably, but it's not an optimal solution (Score:2)

Re:Probably, but it's not an optimal solution (Score:2)

Re:Probably, but it's not an optimal solution (Score:2)

Thanks for making me feel old... (Score:5, Insightful)

Re:Thanks for making me feel old... (Score:2)

Re:Thanks for making me feel old... (Score:2)

Re:Thanks for making me feel old... (Score:1, Informative)

Re:Thanks for making me feel old... (Score:2)

Re:Thanks for making me feel old... (Score:4, Interesting)

Re:Thanks for making me feel old... (Score:2)

Re:Thanks for making me feel old... (Score:2)

Damn, brings back fond memories (Score:2)

Re:Damn, brings back fond memories (Score:2)

Re:Thanks for making me feel old... (Score:2)

Re:Thanks for making me feel old... (Score:1)

Re:Thanks for making me feel old... (Score:2)

Re:Thanks for making me feel old... (Score:1)

Re:Thanks for making me feel old... (Score:2)

Wait for the T2 (Score:4, Interesting)

Re:Wait for the T2 (Score:1)

Re:Wait for the T2 (Score:2)

Re:Wait for the T2 (Score:1)

Feh (Score:3, Insightful)

Re:Feh (Score:1)

Re:Feh (Score:2)

Will never work properly.... (Score:4, Informative)

Re:Will never work properly.... (Score:2)

Re:Will never work properly.... (Score:1)

GPUs == Worthless Floating Point Precision (Score:4, Insightful)

Re:GPUs == Worthless Floating Point Precision (Score:2)

Re:GPUs == Worthless Floating Point Precision (Score:1, Informative)

Re:GPUs == Worthless Floating Point Precision (Score:1, Insightful)

Re:GPUs == Worthless Floating Point Precision (Score:2)

Re:GPUs == Worthless Floating Point Precision (Score:2)

Re:GPUs == Worthless Floating Point Precision (Score:2)

Re:GPUs == Worthless Floating Point Precision (Score:2)

Framebuffer in UST1 (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals