Comment Other approaches to scalable SQL (Score 1) 122

by owenomalley on Tuesday July 21, 2009 @05:32PM (#28775273) Attached to: Researchers Create Database-Hadoop Hybrid

There are also two Hadoop subprojects that either support SQL or will shortly. They both translate SQL queries into map/reduce programs. They are:

http://hadoop.apache.org/pig/
http://hadoop.apache.org/hive/

Comment Re:C++ port of Java Hadoop? (Score 1) 139

by owenomalley on Sunday May 17, 2009 @06:59PM (#27989567) Attached to: Open Source Solution Breaks World Sorting Records

There isn't a C++ port of Hadoop's map/reduce, but there is a C++ interface to the Java code. It is used by Yahoo's WebMap, which is the largest Hadoop application. It lets you write your mapper and reducer code as C++ classes.

The Hadoop Distributed File System (HDFS) also has C bindings to let C programs access the system. If you want another alternative, the Kosmos File System (KFS) is also a distributed file system and was written in C++. Hadoop includes bindings for HDFS and KFS, so that the application code can transparently use either at run time depending on the path (hdfs://server/path instead of kfs://server/path).

Comment Re:Not quite as impressive as it sounds (Score 4, Interesting) 139

by owenomalley on Saturday May 16, 2009 @01:06PM (#27979959) Attached to: Open Source Solution Breaks World Sorting Records

In sorting a terabyte, Hadoop beat Google's time (62 versus 68 seconds). For the petabyte sort, Google was faster (6 hours versus 16 hours). The hardware is of course different. (from Yahoo's blog and Google's blog)

Terabyte:
Machines: Yahoo 1,407 Google 1,000
Disks: Yahoo 5,628 Google 12,000
Petabyte:
Machines: Yahoo 3658 Google 4000
Disks: 14,632 Google: 48,000

Yahoo published their network specifications, but Google did not. Clearly the network speed is very relevant.

The two take away points are: Hadoop is getting faster and it is closing in on Google's performance and scalability.

Submission + - Open Source Solution Breaks World Sorting Records

Submitted by allenw on Friday May 15, 2009 @05:18PM

allenw writes: In a recent blog post, Yahoo!'s grid computing team announced that Apache Hadoop was used to break the current world records in the annual GraySort contest in the Gray and Minute sorts in the general purpose (Daytona) category. Apache Hadoop is the only open source software to ever win the competition. Apache Hadoop also won the Terasort competition last year.

Comment Other approaches to scalable SQL (Score 1) 122

Comment Re:C++ port of Java Hadoop? (Score 1) 139

Comment Re:Not quite as impressive as it sounds (Score 4, Interesting) 139

Submission + - Open Source Solution Breaks World Sorting Records

Slashdot Top Deals

Slashdot