Become a fan of Slashdot on Facebook


Forgot your password?
DEAL: For $25 - Add A Second Phone Number To Your Smartphone for life! Use promo code SLASHDOT25. Also, Slashdot's Facebook page has a chat bot now. Message it for stories and more. Check out the new SourceForge HTML5 Internet speed test! ×

Comment Article contains plenty of misleading comments (Score 5, Insightful) 37

First of all, this article isn't a comparison or matchup - it's just a speculative post by someone who has done very little research and obviously lacks domain knowledge in the space. There is no mention of use cases, data sizes, performance, costs.

Hadoop is an open-source framework for distributed data processing, specifically an implementation of the MapReduce framework. BigQuery is a hosted service that allows you to run queries over massive datasets via an API. There are tools built on top of Hadoop that allow for fast querying over large datasets (Impala), and there are even tools that are not Hadoop based that provide this as well (Spark + Shark). However, actually using these tools is a whole different game - the author makes so mention of how many nodes/VM are required to compare the query performance of BigQuery.

Then there's data sizes. The author makes a strange claim that BigQuery "queries don’t run instantly; one of the samples took 3.3 seconds to grind through 3.49 Gigabytes of data. But that’s clearly fine for quick lookups." Huhn? What tool(s) are you comparing against? BigQuery allows users to run full table aggregate ad-hoc queries over really really big datasets (i.e. terabytes). In public talks, Google has demonstrated that it is possible to run regular expression match queries, with sums and aggregations, over several terabytes of data in under a minute. In order to do this with a MapReduce-based system, what needs to be done - perhaps use something like Hive, or write a custom MapReduce function - and what is the performance in this case? For the same use case, what is the cost of using some of the "OLAP" tools that the author describes? Would love to see some benchmarks.

Re: "In the end, BigQuery is just another database."

Huhn? BigQuery is not a database at all - it doesn't support CRUD operations on data - rather it is an append-only analytics tool. And conversely, databases, relational or not, aren't really the right tools for full table scan ad-hoc queries over many terabytes, which is what BigQuery is designed to do. BigQuery is a developer's product, and one that can be integrated with existing web apps via RESTful API. Hadoop has it's own development role and story (and tools like Cascading are really great) but it's not designed as the backend for interaction via a RESTful API out of the box - it takes a bit more work to provide Hadoop as a service for developers to integrate with an application.

Re: "The public version of BigQuery probably isn't even used by Google, which likely has something bigger and better that we'll see in five years or so."

BigQuery is based on Google's internal Dremel, which is used everyday by Google. There is a very public research paper describing Dremel (much the same as how Google described MapReduce years ago). Read about what is available in Dremel versus what is available in BigQuery:

The Courts

Submission + - U.S. court denies Webcasters' stay petition (

Michael Manoochehri writes: "Reuters reports that a "federal appeals court has denied a petition by U.S. Internet radio stations seeking to delay a royalty rate hike due July 15 they say could kill the fledgling industry." This royalty rate hike, put forth by the US Copyright Royalty Board, will increase royalty rates for webcast music tremendously, in some cases to more per year than many webcasters bring in from revenue. Save Net Radio, a coalition of webcasters, is telling listeners that "We are appealing to the millions of Internet radio listeners out there, the webcasters they support and the artists and labels we treasure to rise up and make your voices heard again before this vibrant medium is silenced.""

Submission + - Firefox 2.0 contains 'highly critical' security fl (

SpiritGod21 writes: Ruben Francia at posted this morning that "users of Firefox 2.0 and above are being warned of a "highly critical" security glitch that could allow a hacker to execute arbitrary commands and take control of their computer.

"The problem is that Firefox registers the 'firefoxurl://' URI handler and allows firefox to invike arbitrary command line arguments. Using the '-chrome' parameter it is possible to execute arbitrary Javascript in chrome context," security research firm Secunia noted on its Web site.


Submission + - A. Gonzales' Intellectual Property Protection Act

Michael Manoochehri writes: "Wired News, and a bunch of other outlets, have reported that a new proposed law, entitled the "Intellectual Property Protection Act," would for the first time criminalize attempted copyright infringement. The article states: It's easy to pooh-pooh it, but remember: the RIAA and co.'s current anti-consumer legal strategy operates on the assumption an I.P. address is proof of guilt. The text of the law is here. From the Wired article: The IPPA would come down harder on those found to have violated the DMCA, subjecting them to new forfeiture and restitution provisions. "Any property used, or intended to be used, in any manner or part, to commit or facilitate the commission of the offense" of violating the DMCA could be confiscated, according to the text of the legislation."

Slashdot Top Deals

"There is no statute of limitations on stupidity." -- Randomly produced by a computer program called Markov3.