Nerval's Lobster writes: Facebook stores its warehouse data in a set of enormous Hadoop/HDFS-based clusters. That helps the social network wrestle with the enormous amounts of user information it needs to store and analyze every day; but at a certain point (namely, once the warehouse grew to petabyte scale), its network administrators decided they needed something other than Hadoop MapReduce and Hive to process that data in a fully optimized way. Enter Presto, Facebook’s very own distributed SQL query engine designed with a focus on speed. The platform supports standard ANSI SQL, which means it’s capable of everything from complex queries and aggregations to joins and window functions. Presto also boasts scalability and flexibility; for example, with the addition of key plugins, it can handle Facebook data not stored in HDFS clusters, such as HBase and custom systems. It doesn’t rely on MapReduce for processing, which allows it to process queries at speed. Facebook engineers began developing Presto near the end of 2012 and rolled it out to the entire company the following spring; employees currently use it to process roughly 30,000 queries (totaling around one petabyte of data) per day. And now that the platform’s stable enough, Facebook is open-sourcing it via Github and a dedicated Website. Now all you need is an enormous amount of data that threatens to overwhelm your current setup.
"For a male and female to live continuously together is... biologically
speaking, an extremely unnatural condition."
-- Robert Briffault