Comment Why hashing? (Score 1) 164
In using Oracle RDBMS, I see that for very large data set queries, using a hash join causes lots of disk activity (lots of paging, going to swap)
Though hash functions are fast, this performance is from scanning though a hash table that's fully mapped in memory.
Once your hash table gets too big for the available memory, you start using disk space (unindexed, sequential full reads)
Isn't this a bottleneck in a distributed database that relies on hash functions?
Wouldn't you want to have a distributed DB based on a distributed version of a B-Tree descendant (B+Tree, B*Tree,B**Tree)
that would use memory AND storage and scale out more than just the available memory on all your nodes?
Not only that, but you'd likely have better performance on range scans.
Just thinking...