melonakos - Slashdot User

Comment Relatively Painless = GPU Libraries (Score 1) 198

by melonakos on Saturday July 20, 2013 @12:49AM (#44335127) Attached to: Ask Slashdot: What Is the Most Painless Intro To GPU Programming?

Always awesome to see GPU computing getting Slashdot love!

Do I have to learn CUDA/OpenCL — which seems a daunting task to me — or is there a simpler way?

You do NOT have to learn CUDA or OpenCL. You can use libraries or compilers. GPU libraries tend to give better performance than GPU compilers (e.g. OpenACC) and tend to be able to handle more algorithms. That is because compilers are simply not smart enough to do things as well as expert programmers who meticulously hand-tune kernels and put them in libraries. Any number of libraries are available. There are many poorly supported libraries out there, so you may have to search around to find good ones. I suggest one below.

What, currently, is the most painless way to start playing with GPU programming? Surely there must a be a 'relatively painless' way out there, with which one can begin to learn how to harness the GPU?"

My colleagues and I at AccelerEyes have dedicated the last 6 years of our lives to trying to help people find exactly what you're looking for - "a relatively painless" way to harness the GPU. The result is our ArrayFire library for CUDA or OpenCL. I know it's uncool to toot one's horn, but the GPU computing community is small enough that people know each other and we're all working together to build out the ecosystem. There are many different contributions to GPU computing by many different groups. Our group's specialty in the ecosystem has always been the "relatively painless" contribution coupled with great performance. The reason people like our stuff is because we do nothing but work on squeezing out the most performance possible. Then we wrap up those kernels into convenient library calls that can be plugged in like math functions to your code with much less burden than writing the CUDA or OpenCL from scratch.

Happy to answer any further questions you may have about specific libraries, compilers, or GPU programming approaches. We eat, drink, and breathe everything CUDA/OpenCL.

BTW, we also encourage learning expert CUDA/OpenCL development. It is tough, no doubt about that. It is time-consuming and for many developers is not worth the added development complexity and lengthened development time. It sounds like you are probably in the boat of not caring about becoming an expert in low-level details, rather just wanting to get better performance to achieve a goal and be done with it. Is that correct?

Perhaps a Visual Programming Language or 'VPL' that lets you connect boxes/nodes and access the GPU very simply?

Labview does not have good support for GPUs. Many ArrayFire users are building custom Labview blocks so that they can program the GPUs more simply. I can connect you to some of those users if you wish (just shoot me a note to john@accelereyes.com).

I'm unaware of another graphical box/nodes package that supports GPUs.

---

While I'm at it, I know this post is going to be read by many expert CUDA/OpenCL developers out there. If you're interested in writing CUDA/OpenCL code daily, we're hiring (see my email above) :)

Comment Re:Open Source the Libraries (Score 2) 89

by melonakos on Wednesday December 14, 2011 @01:01PM (#38371472) Attached to: NVIDIA Releases Source To CUDA Compiler

Also, OpenCL is not going anywhere, even if someone figured out how to get CUDA code to run well on ATI GPUs. In addition to many other reasons which I'm are getting discussed in these comments, OpenCL is gaining a lot of traction by mobile GPU vendors too (e.g. ARM Mali, Imagination PowerVR, Qualcomm Adreno, etc).

Comment Open Source the Libraries (Score 4, Interesting) 89

by melonakos on Wednesday December 14, 2011 @12:46PM (#38371220) Attached to: NVIDIA Releases Source To CUDA Compiler

IMO, open sourcing their GPU libraries would be a much bigger deal than only open sourcing the compiler. I would like to see CUBLAS, CUFFT, CUSPARSE, CURAND, etc all get opened up to the community.

The pain is not in compiling GPU code; rather, the pain is in writing good GPU code. The major difference between NVIDIA and AMD (and the major edge NVIDIA has over AMD) is not as much the compiler as it is the libraries.

Of course, I'm biased, because I work at AccelerEyes and we do GPU consulting with our freely available, but not open source, ArrayFire GPU library, which has both CUDA and OpenCL versions.

Comment Re:So... (Score 1) 89

by melonakos on Wednesday December 14, 2011 @12:36PM (#38371068) Attached to: NVIDIA Releases Source To CUDA Compiler

My guess is there will be some academic projects, like Ocelot, that will take a stab at this. But I doubt it will be a better path than using OpenCL directly as supported by AMD/ATI.

Submission + - Music Beat Analysis: MATLAB GPU code with Jacket (accelereyes.com)

Submitted by melonakos on Thursday August 11, 2011 @04:21PM

melonakos writes: The main reason for the speed up is that previously the bands were put through the functions serially, but with the capabilities of JACKET/CUDA it was possible to put all six bands through most calculations at once, thereby greatly reducing the time for computation. The computationally-intensive core of the analyzer was observed to run 15 times faster!

Comment Re:great! (Score 1) 89

by melonakos on Tuesday May 24, 2011 @02:03PM (#36230370) Attached to: Matlab Integrates GPU Support For UberMath Computation

CONV and FILTER2 both call CONV2 in MATLAB

Comment Re:great! (Score 1) 89

by melonakos on Monday May 23, 2011 @10:54PM (#36224474) Attached to: Matlab Integrates GPU Support For UberMath Computation

I don't agree with either statements:

1) Expert convolutions on the GPU (that work well for both separable/non-separable cases, arbitrary input matrix sizes, and arbitrary kernel sizes) are extremely difficult. I don't think you can be our implementation. If you can, I will try to entice you away from other pursuits in life.

2) CONV2 (i.e. convolutions) are very useful in many applications and often make more sense that pursuing some sort of other arithmetic expression. I do agree with your statement though that algorithm/implementation choice is critical and is a decision that should come before optimization efforts. I just think convolutions are an essential tool to which many problems are best boiled down.

Comment Re:beware: bad benchmarks (Score 1) 89

by melonakos on Monday May 23, 2011 @05:11PM (#36221866) Attached to: Matlab Integrates GPU Support For UberMath Computation

Ah, and I should add that for the Python community there is libJacket which will go to v1.0 on Jun 1st. If you want to get early beta access to our Python stuff, email me (email address in my big post above).

Comment Re:beware: bad benchmarks (Score 1) 89

by melonakos on Monday May 23, 2011 @05:08PM (#36221842) Attached to: Matlab Integrates GPU Support For UberMath Computation

Please see my explanation to your other comment on this above. Thanks for pointing out that I need to get our marketing guys to post more information to avoid this confusion. Also, we ship a dozen example with Jacket that you can run to get code and back-to-back comparisons. Hope that helps.

Comment Re:great! (Score 1) 89

by melonakos on Monday May 23, 2011 @04:44PM (#36221612) Attached to: Matlab Integrates GPU Support For UberMath Computation

Hand optimization is really tough. Try for instance to beat Jacket's CONV2 (yes, I'm talking about straightup convolutions) by hand. If you can do that, I'll will expend all my energies to drop whatever else you are doing and to join us at AccelerEyes :)

Jacket is meant to be a luxury as was mentioned elsewhere... providing a faster, better approach to what you could try to reinvent by hand if you had infinite energy.

The Canny Edge benchmark is a full blown application (of which Canny Edge detection is the major component). The image sizes that are processed are listed in the graph, but there are tons of images being processed in the course of running the full application. We should make that clearer on our website... thanks for pointing that out.

Comment Re:great! (Score 1) 89

by melonakos on Monday May 23, 2011 @04:38PM (#36221542) Attached to: Matlab Integrates GPU Support For UberMath Computation

To address the topic of open source:

People have been saying that open source would swamp Jacket since we launched in 2007. The reality is that it is too stinking hard to build good stuff open source (i.e. where the developers aren't paid), when there isn't an enormous user community to fuel the effort in intangible benefits back to the contributors. Otherwise, we'd open source Jacket and try to live off the service contracts like every other open source project.

So we end up pricing the software inline with what people are used to paying for addons to MATLAB. And Jacket is great so we end up doing really well with this model.

While GPU computing in MATLAB is too small a niche, M-programming in general is ripe for the open sourcing. Octave has never gained any steam and has been around so long that it is stale. Scilab seems good but is stuck in Europe. We would be thrilled to participate with the community in building something that delivers more promise overall. What is certain is that MathWorks has a greater stranglehold on science/engineering than Microsoft does on operating systems.

Comment Re:cdb read and collate next? (Score 1) 89

by melonakos on Monday May 23, 2011 @04:17PM (#36221294) Attached to: Matlab Integrates GPU Support For UberMath Computation

If you can post some quick code you have in mind, I'll let you know how it might perform using GPUs in MATLAB.

Comment 3 Years in the Making (Score 4, Informative) 89

by melonakos on Monday May 23, 2011 @01:58PM (#36219738) Attached to: Matlab Integrates GPU Support For UberMath Computation

I'm CEO of AccelerEyes and have been submitting Slashdot articles referencing updates about using GPUs with MATLAB for several years now. It's great to see it finally getting through, albeit via a reference to the "fake" GPU support which the MathWorks threw into PCT in an attempt to curtail the great success we continue to have with Jacket.

For a full explanation of why I say "fake", read, http://www.accelereyes.com/products/compare

For a brief explanation of why I say "fake" GPU support consider the question, what does supporting GPUs mean? If you can run an FFT are you content? Or do you want to use INV, SVD, EIG, RAND, and the list goes on and on. Jacket has 10X the functionality of PCT-GPU.

Why else is the PCT-GPU implementation weak? Well, it is so poorly constructed (shoehorned into their legacy Java system), that it is rarely more beneficial to use the GPU than the CPU with the PCT-GPU implementation. It takes 600 cycles to load-then-store global memory on the GPU (required in each kernel call). The main innovation that led us to build Jacket is the ability to generate as few kernels as possible to eliminate as many 600 cycle roundtrip transfers as possible. For example, Jacket's runtime system may only launch one kernel for every 20 lines of code. PCT-GPU on the other hand is limited to launching a GPU kernel for every basic function call.

Jacket also has a GFOR loop which is the only parallel FOR-loop for GPUs, http://wiki.accelereyes.com/wiki/index.php/GFOR_Usage

I'm not aware of any MATLAB programmer that has had a good experience with PCT-GPU.

Finally, because I'm so thrilled at this getting slashdotted (despite it being a link promoting PCT-GPU), I'm be happy to offer free 3 month Jacket subscriptions to anyone that emails me in the next 48 hours with the word "slashdot" in the subject, at john.melonakos@accelereyes.com

Cheers!

PS: Roblimo, if we can get some blurb love in your summary on the main slashdot.org page, it would really mean a ton to all our guys that have worked on this project for the last 4 years!

Submission + - MATLAB built for Speed: Jacket tops 700 GFLOPS on (accelereyes.com)

Submitted by melonakos on Tuesday September 21, 2010 @07:30PM

melonakos writes: http://blog.accelereyes.com/blog/2010/09/21/jacket_at_700_gflops/

Bookmark MATLAB built for Speed: Jacket tops 700 GFLOPS on GPU SGEMM (faster than CUBLAS) (accelereyes.com)

by melonakos on Tuesday September 21, 2010 @07:29PM

Slashdot Top Deals