Although generality is good, we might ask what the "ideal" abstraction for a particular application is. In my opinion, it is a programming language that is designed precisely for that application: one in which a person can quickly and effectively develop a complete software system. It is not general at all; it should capture precisely the semantics of the application domain - no more and no less. In my opinion, a domain-specific language is the "ultimate abstraction".
One approach is not to directly program the GPU, but to use library provided (domain-specific) high-level parallel primitives (map,fold,reduce,..) to describe the computation. The library in question then compiles the final low-level code. These libraries are often implemented as domain-specific embedded languages. Topic is a subject for active research, but some more or less mature implementations already exist, some of which are:
- thrust provides STL-like algorithms for C++ while targeting CUDA and OpenMP as backends.
- ArBB implements a parallel array programming library for C++ built on a general purpose virtual machine targeting SSE, AVX and possibly MIC in the future.
- accelerate is an embedded language for array computations in Haskell and at the moment implements backends for CUDA and ArBB.