Gregory Diamos - Slashdot User

Comment Re:Performance? (Score 1) 71

by Gregory Diamos on Wednesday December 23, 2009 @10:33PM (#30541266) Attached to: An Open Source Compiler From CUDA To X86-Multicore

Here's a graph performance. The GPU version uses NVIDIA's JIT to generate native instructions for a particular GPU so the GPU results here should be more or less the same as if the program was compiled with NVIDIA's static compiler.

Comment Why? (Score 3, Informative) 71

by Gregory Diamos on Wednesday December 23, 2009 @10:13PM (#30541194) Attached to: An Open Source Compiler From CUDA To X86-Multicore

So there seem to be several questions as to why people would want to use CUDA when an open standard exists for the same thing (OpenCL).

Well, honestly, the reason why I wrote this was because when I started, OpenCL did not exist.

I have heard the following reasons why some people prefer CUDA over OpenCL:

The toolchains for OpenCL are still immature. They are getting better, but are not quite as bug-free and high performance as CUDA at this point.
CUDA has more desirable features. For example, CUDA supports many C++ features such as templates and classes in device code that are not part of the OpenCL specification.

Additionally I would like to see a programming model like CUDA or OpenCL replace the most widespread models in industry (threads, openmp, mpi, etc...). CUDA and OpenCL are each examples of Bulk Synchronous Parallel models, which explicitly are designed with the idea that communication latency and core count will increase over time. Although I think that it is a long shot, I would like to see more applications written in these languages so there is a migration path for developers who do not want to write specialized applications for GPUs, but can instead write an application for a CPU that can take advantage of future CPUs with multiple cores, or GPUs with a large degree of fine-grained parallelism.

Most of the codebase for Ocelot could be re-used for OpenCL. The intermediate representation for each language is very similar, with the main differences being in the runtime.

Please try to tear down these arguments, it really does help.

Comment Re:Alternative? (Score 4, Informative) 71

by CDeity on Wednesday December 23, 2009 @06:58PM (#30539802) Attached to: An Open Source Compiler From CUDA To X86-Multicore

The greatest challenges lie in accommodating arbitrary control flow among threads within a cooperative thread array. NVIDIA GPUs are SIMD multiprocessors, but they include a thread activity stack that enables serialization of threads when they reach diverging branches. Without hardware support, this kind of thing becomes difficult on SIMD processors which is why Ocelot doesn't include support for SSE yet. It is also one of the obstacles for supporting AMD/ATI IL at the moment, though solutions are in order.

Translation from PTX to LLVM to multicore x86 does not necessarily throw away information concerning the PTX thread hierarchy initially. The first step is to express a PTX kernel using LLVM instructions and intrinsic function calls. This phase is [theoretically] invertible and no information concerning correctness or parallelism is lost.

To get to multicore from here, a second phase of transformations insert loops around blocks of code within the kernel to implement fine-grain multithreading. This is the part that isn't necessarily invertible or easy to translate back to GPU architectures and is what is referenced in the note you are citing.

Disclosure: I'm one of the core contributors to the Ocelot project.

An Open Source Compiler From CUDA To X86-Multicore 71

Posted by timothy on Wednesday December 23, 2009 @03:06PM from the abstraction-gains-a-layer dept.

Gregory Diamos writes "An open source project, Ocelot, has recently released a just-in-time compiler for CUDA, allowing the same programs to be run on NVIDIA GPUs or x86 CPUs and providing an alternative to OpenCL. A description of the compiler was recently posted on the NVIDIA forums. The compiler works by translating GPU instructions to LLVM and then generating native code for any LLVM target. It has been validated against over 100 CUDA applications. All of the code is available under the New BSD license."

Submission + - Is Code Auditing Of Open Source Apps Necessary?

Submitted by Anonymous Coward on Wednesday December 23, 2009 @10:18AM

An anonymous reader writes: Following Sun Microsystems' decision to release a raft of open source applications to support its secure cloud computing strategy, companies may be wondering if they should conduct security tests of their customized open source software before deployment. whilst the use of encryption and VPNs to extend a secure bridge between a company IT resource and a private cloud facility is very positive — especially now that Amazon is best testing its pay-as-you-go private cloud facility — it's important that the underlying application code is also secure. What do you think?

Submission + - An Open Source Compiler From CUDA to x86-Multicore (google.com)

Submitted by Gregory Diamos on Wednesday December 23, 2009 @09:58AM

Gregory Diamos writes: An open source project, Ocelot, has recently released a just-in-time compiler for CUDA, allowing the same programs to be run on NVIDIA GPUs or x86 CPUs and providing an alternative to OpenCL. A description of the compiler was recently posted on the NVIDIA forums. The compiler works by translating GPU instructions to LLVM and then generating native code for any LLVM target, it has been validated against over 100 CUDA applications. All of the code is available under the New BSD license.

Journal Journal: RFID Killer

Journal by smitty777 on Wednesday December 23, 2009 @09:13AM

Nothing big today - just an RFID Terminator Gun. It basically fries any RFID chip in range. Not sure what good it is, unless you want to play a trick on your friends and family by frying their passports. Big fun.

Toshiba Launches Laptop With Three GPUs 149

Posted by ScuttleMonkey on Friday November 07, 2008 @04:44PM from the battery-killers dept.

arcticstoat writes to mention that Toshiba's latest line of high-powered laptops has three GPUs included. Both the Qosmio X305-Q706 and Q708 come with an integrated GeForce 9400M for day-to-day processing tasks but have a pair of GeForce 9800Ms in SLI that kick in when you need the extra horsepower. "The [Qosmio] X305-Q706 costs $1,999 US (£1,257) in the US, although we haven't seen any UK pricing on the laptops yet. The system comes with a 2.2GHz Core 2 Duo P8400 and 4GB of RAM, while the costlier X305-Q708 comes with a quad-core 2.53GHz Core 2 Extreme QX9300 CPU."

Comment Re:Don't complain (Score 1) 86

by CDeity on Sunday June 01, 2008 @12:11AM (#23614637) Attached to: Havok Releases Free Version For PC Developers

Havok wasn't obligated to do this. It is a kind (and perhaps savvy) gesture.

They weren't obligated to do it, but let's be honest: they were somewhat forced to. NVIDIA bought PhysX not too long ago and announced they were implementing it with CUDA so GPUs could provide physics acceleration.

The NVIDIA PhysX binary-only SDK has been available for a while now.

http://developer.nvidia.com/object/physx.htm

Kindness? Neigh: competition.

Slashdot Top Deals