owenomalley - Slashdot User

Comment Other approaches to scalable SQL (Score 1) 122

by owenomalley on Tuesday July 21, 2009 @05:32PM (#28775273) Attached to: Researchers Create Database-Hadoop Hybrid

There are also two Hadoop subprojects that either support SQL or will shortly. They both translate SQL queries into map/reduce programs. They are:

http://hadoop.apache.org/pig/
http://hadoop.apache.org/hive/

Comment Re:C++ port of Java Hadoop? (Score 1) 139

by owenomalley on Sunday May 17, 2009 @06:59PM (#27989567) Attached to: Open Source Solution Breaks World Sorting Records

There isn't a C++ port of Hadoop's map/reduce, but there is a C++ interface to the Java code. It is used by Yahoo's WebMap, which is the largest Hadoop application. It lets you write your mapper and reducer code as C++ classes.

The Hadoop Distributed File System (HDFS) also has C bindings to let C programs access the system. If you want another alternative, the Kosmos File System (KFS) is also a distributed file system and was written in C++. Hadoop includes bindings for HDFS and KFS, so that the application code can transparently use either at run time depending on the path (hdfs://server/path instead of kfs://server/path).

Comment Re:Not quite as impressive as it sounds (Score 4, Interesting) 139

by owenomalley on Saturday May 16, 2009 @01:06PM (#27979959) Attached to: Open Source Solution Breaks World Sorting Records

In sorting a terabyte, Hadoop beat Google's time (62 versus 68 seconds). For the petabyte sort, Google was faster (6 hours versus 16 hours). The hardware is of course different. (from Yahoo's blog and Google's blog)

Terabyte:
Machines: Yahoo 1,407 Google 1,000
Disks: Yahoo 5,628 Google 12,000
Petabyte:
Machines: Yahoo 3658 Google 4000
Disks: 14,632 Google: 48,000

Yahoo published their network specifications, but Google did not. Clearly the network speed is very relevant.

The two take away points are: Hadoop is getting faster and it is closing in on Google's performance and scalability.

Submission + - Open Source Solution Breaks World Sorting Records

Submitted by allenw on Friday May 15, 2009 @05:18PM

allenw writes: In a recent blog post, Yahoo!'s grid computing team announced that Apache Hadoop was used to break the current world records in the annual GraySort contest in the Gray and Minute sorts in the general purpose (Daytona) category. Apache Hadoop is the only open source software to ever win the competition. Apache Hadoop also won the Terasort competition last year.

Submission + - Yahoo building open source Map/Reduce and GFS (yahoo.net)

Submitted by

owenomalley

on Wednesday July 25, 2007 @04:21PM

owenomalley writes: "Yahoo is developing Hadoop, which is an open source implementation of key pieces of Google's infrastructure (namely, Map/Reduce and GFS). Hadoop's framework allows you to write applications that reliably process very large datasets (100's of terabytes) efficiently on large (1000+) clusters of computers. Without a framework like Hadoop, writing applications on large clusters requires a lot of duplicated effort as each application deals with distribution, reliability, and reporting. Hadoop handles those parts for you and just requires you to write your application logic.

Hadoop is managed under Apache."

Slashdot Top Deals