James_Intel - Slashdot User

Comment Re:Performance Comparison? (Score 1) 158

by James_Intel on Friday July 27, 2007 @11:14AM (#20011143) Attached to: Intel Releases Threading Library Under GPL 2

TBB's not a replacement for OpenMP or MPI. If either work for you - you should use them. OpenMP is usually use as a shared memory programming model, and works well for Fortran and much C code. TBB is aimed for C++ and C programs, also shared memory. MPI is a distributed memory programming model.

First and foremost - TBB is easy to use, and still high performance.

A critical item for high performance in parallel programs is scaling - and TBB helps get better scaling more easily than you would find with hand coding. But OpenMP and MPI generally encourage/lead to scaling.

Another key on shared memory machines is managing caches well. TBB, again, does very well.

Benchmarks... it would be good to get ideas on what we should show. Here is what we have looked at / seen:
(1) Comparing with code written using pthreads/windows threads, TBB code is much easier to write and debug. We've seen serious programmers who don't write lots of parallel code be unable to get the hand threaded code to work, but get TBB to work. In each case I've seen - the programmer had to stop with hand threading because they just had to go work on something else after their effort to add pthreads to a big program didn't work well enough. We've seen experienced programmers get good scaling with TBB the first time, something they have to spend time on with hand coded threads.
(2) If a program can be written with OpenMP, we've seen comparable performance between OpenMP and TBB on first implementations of code - but then further tuning can lead to OpenMP out performing TBB if the OpenMP dynamic scheduling can be turned off as an advantage (use 'static scheduling' for a boost in speed). TBB is always dynamic, and will get beat in such cases. Of course, dynamic scheduling can be a huge win for many cases with problems that are even a little bit irregular.
(3) Distributed memory code (using MPI or a 'cluster' version of OpenMP) - usually out performs everything else. This is because, as a developer, you need to work out how to write a program with minimal dependencies between code running on each node of the computer. This work, is not easy... but once done (if it can be done) - the program has few synchronizations, not memory contention, etc. I don't recommend MPI for anyone thinking of coding for 2-4 cores and starting parallelism for the first time.

Slashdot Top Deals