Comment Other approaches to scalable SQL (Score 1) 122
There are also two Hadoop subprojects that either support SQL or will shortly. They both translate SQL queries into map/reduce programs. They are:
There are also two Hadoop subprojects that either support SQL or will shortly. They both translate SQL queries into map/reduce programs. They are:
There isn't a C++ port of Hadoop's map/reduce, but there is a C++ interface to the Java code. It is used by Yahoo's WebMap, which is the largest Hadoop application. It lets you write your mapper and reducer code as C++ classes.
The Hadoop Distributed File System (HDFS) also has C bindings to let C programs access the system. If you want another alternative, the Kosmos File System (KFS) is also a distributed file system and was written in C++. Hadoop includes bindings for HDFS and KFS, so that the application code can transparently use either at run time depending on the path (hdfs://server/path instead of kfs://server/path).
In sorting a terabyte, Hadoop beat Google's time (62 versus 68 seconds). For the petabyte sort, Google was faster (6 hours versus 16 hours). The hardware is of course different. (from Yahoo's blog and Google's blog)
Terabyte:
Machines: Yahoo 1,407 Google 1,000
Disks: Yahoo 5,628 Google 12,000
Petabyte:
Machines: Yahoo 3658 Google 4000
Disks: 14,632 Google: 48,000
Yahoo published their network specifications, but Google did not. Clearly the network speed is very relevant.
The two take away points are: Hadoop is getting faster and it is closing in on Google's performance and scalability.
It's a naive, domestic operating system without any breeding, but I think you'll be amused by its presumption.