Why haven't you written such a thing before? Because it's too much hassle. Which is the very reason threading is underused.
LOL. Actually there's a better reason such a thread launch facility doesn't commonly get written - which is that, in most circumstances, it really doesn't help performance that much, if at all - and the added complexity makes for a big net minus. There are a number of issues:
Firstly, spawning threads is expensive. Yes, on Linux it's "cheap", but that's "cheap" compared to other implementations - it's still a lot compared to doing a modest amount of work on the local CPU. (Why is it so expensive? Basically because there's a lot of housekeeping to do. In addition to the kernel creating new kernel structures for the new thread of execution (similar to creating a process), the process's thread library must allocate a stack for the new thread (involving modifying the process's page tables), iterate through all loaded shared libraries in order to allocate any thread-local storage they require, and so on, requiring multiple syscalls, a TLB flush, at least one context switch, and so on. To some extent the impact of this overhead can be reduced by maintaining a pool of ready-created threads, but this either takes away control of performance (if done automatically by your language/library) or substantially increases complexity (if you implement it yourself, since you then have to synchronise the threads carefully).
The second problem is that, unless you're very careful, extra threads don't buy you much performance, and can indeed hurt. Take the example you gave - doing some processing on each struct in an array, where each such struct contains an int and a double (16 bytes total, including alignment padding). With 64-byte cache lines (typical on x86), there are 4 such structs per cache line. If you distribute the processing over threads running on different cores, then instead of one core waiting for the cache line to come in to main memory, and then processing the 4 structs very rapidly (since they're now all in cache), you'll have 4 cores each waiting for the data to be available - i.e. up to a 4x slowdown for memory-bound tasks. And that's assuming the structure is only read from; if it's written to as well then the cache line will have to bounce between cores, and the multithreading slowdown will be many times worse. Now, if you ensure that structs in the same cache line get processed by the same core (ideally in sequence, and by the same kernel thread), then you do potentially get a big speedup - provided you don't hit any other gotchas - but the C++ code you're promoting doesn't seem to guarantee this in any way.
Third, and perhaps most importantly, data dependencies matter. In your example you're detaching all the threads; this is not realistic, because that means you cannot ever depend on their operations having finished. In the vast majority of cases you do need to know when an operation has finished: you're generally doing work for a reason - i.e. that you're going to use the result - and you can't begin to use that result until you know it has been produced. That, in of itself, adds complexity: you have to analyse your program's dataflow much more carefully in the presence of threads, because C/C++ will quite happily let you use a variable before another thread has finished assigning to it, without any sort of warning or exception. The analysis can certainly be done, and synchronisation put in place to eliminate the problems - but that is further overhead, both in the program's performance but also in the complexity of the program itself, and hence the time taken to write it (and especially to enhance it later, when the synchronisation model may not be so fresh in one's mind).
Used correctly and in the right circumstances, threads on an N-core system can give a N-times speedup (or greater, due to caching effects). Used badly, at best they'll reduce performance, and usually they'll increase complexity and lead to subtle bugs that are hard to debug.
The new thread features in modern C++ are very cool, but the fact they didn't exist before is not what's been preventing competent programmers from using threads all over the place :)