Please create an account to participate in the Slashdot moderation system


Forgot your password?
Check out the new SourceForge HTML5 internet speed test! No Flash necessary and runs on all devices. Also, Slashdot's Facebook page has a chat bot now. Message it for stories and more. ×

Comment Re:Alternative? (Score 4, Informative) 71

The greatest challenges lie in accommodating arbitrary control flow among threads within a cooperative thread array. NVIDIA GPUs are SIMD multiprocessors, but they include a thread activity stack that enables serialization of threads when they reach diverging branches. Without hardware support, this kind of thing becomes difficult on SIMD processors which is why Ocelot doesn't include support for SSE yet. It is also one of the obstacles for supporting AMD/ATI IL at the moment, though solutions are in order.

Translation from PTX to LLVM to multicore x86 does not necessarily throw away information concerning the PTX thread hierarchy initially. The first step is to express a PTX kernel using LLVM instructions and intrinsic function calls. This phase is [theoretically] invertible and no information concerning correctness or parallelism is lost.

To get to multicore from here, a second phase of transformations insert loops around blocks of code within the kernel to implement fine-grain multithreading. This is the part that isn't necessarily invertible or easy to translate back to GPU architectures and is what is referenced in the note you are citing.

Disclosure: I'm one of the core contributors to the Ocelot project.

Slashdot Top Deals

The trouble with money is it costs too much!