Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?
Slashdot Deals: Cyber Monday Sale! Courses ranging from coding to project management - all eLearning deals 25% off with coupon code "CYBERMONDAY25". ×

Submission + - Facebook Open-Sources Its Presto SQL Query Engine (slashdot.org)

Nerval's Lobster writes: Facebook stores its warehouse data in a set of enormous Hadoop/HDFS-based clusters. That helps the social network wrestle with the enormous amounts of user information it needs to store and analyze every day; but at a certain point (namely, once the warehouse grew to petabyte scale), its network administrators decided they needed something other than Hadoop MapReduce and Hive to process that data in a fully optimized way. Enter Presto, Facebook’s very own distributed SQL query engine designed with a focus on speed. The platform supports standard ANSI SQL, which means it’s capable of everything from complex queries and aggregations to joins and window functions. Presto also boasts scalability and flexibility; for example, with the addition of key plugins, it can handle Facebook data not stored in HDFS clusters, such as HBase and custom systems. It doesn’t rely on MapReduce for processing, which allows it to process queries at speed. Facebook engineers began developing Presto near the end of 2012 and rolled it out to the entire company the following spring; employees currently use it to process roughly 30,000 queries (totaling around one petabyte of data) per day. And now that the platform’s stable enough, Facebook is open-sourcing it via Github and a dedicated Website. Now all you need is an enormous amount of data that threatens to overwhelm your current setup.
This discussion was created for logged-in users only, but now has been archived. No new comments can be posted.

Facebook Open-Sources Its Presto SQL Query Engine

Comments Filter:

SCCS, the source motel! Programs check in and never check out! -- Ken Thompson