To achieve these results, the author (Anton) recently updated the code with a new version that includes a faster implementation of database JOIN. The code leverages the Thrust library for fast SORT, SELECT, and SET parallel algorithms. It also uses the CUDPP library to implement a parallel hash JOIN.
While the codebase is not a complete implementation of SQL, it can execute several queries from TPC-H (an industry standard data-warehousing benchmark). For Query 1 (SELECT, GROUP-BY) Alenka processes a 100GB dataset in 9.5 seconds, compared to 42.3 seconds on the HP system. For Query 3 (JOIN, GROUP-BY, SORT), Alenka takes 5.3 seconds, compared to 4.3 seconds for the HP system.
It will be interesting to see if Alenka can offer similar results for the entire TPC-H benchmark suite, or if other database implementations can be accelerated by GPUs.
The source code for the Alenka system is available on github.
Link to Original Source