Basically, libdispatch just creates a thread pool for each separate task, then uses some clever magic involving an inter-process semaphore to keep them blocking so that no more than enough threads (ie: the number of CPUs) are running at any given time. Nifty, because it means little change needed to be made to xnu.
libdispatch is also, theoretically, portable to other platforms, as long as one can provide a blocks runtime and a compiler capable of handling blocks. I noticed a patch on LLVM's mailing list today providing a Linux port of the blocks runtime, and llvm-gcc and clang both are capable of handling blocks and running on Linux.