Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×

Comment Re:C++ port of Java Hadoop? (Score 1) 139

There isn't a C++ port of Hadoop's map/reduce, but there is a C++ interface to the Java code. It is used by Yahoo's WebMap, which is the largest Hadoop application. It lets you write your mapper and reducer code as C++ classes.

The Hadoop Distributed File System (HDFS) also has C bindings to let C programs access the system. If you want another alternative, the Kosmos File System (KFS) is also a distributed file system and was written in C++. Hadoop includes bindings for HDFS and KFS, so that the application code can transparently use either at run time depending on the path (hdfs://server/path instead of kfs://server/path).

Comment Re:Not quite as impressive as it sounds (Score 4, Interesting) 139

In sorting a terabyte, Hadoop beat Google's time (62 versus 68 seconds). For the petabyte sort, Google was faster (6 hours versus 16 hours). The hardware is of course different. (from Yahoo's blog and Google's blog)

Terabyte:
    Machines: Yahoo 1,407 Google 1,000
    Disks: Yahoo 5,628 Google 12,000
Petabyte:
    Machines: Yahoo 3658 Google 4000
    Disks: 14,632 Google: 48,000

Yahoo published their network specifications, but Google did not. Clearly the network speed is very relevant.

The two take away points are: Hadoop is getting faster and it is closing in on Google's performance and scalability.

Supercomputing

Submission + - Open Source Solution Breaks World Sorting Records

allenw writes: In a recent blog post, Yahoo!'s grid computing team announced that Apache Hadoop was used to break the current world records in the annual GraySort contest in the Gray and Minute sorts in the general purpose (Daytona) category. Apache Hadoop is the only open source software to ever win the competition. Apache Hadoop also won the Terasort competition last year.
Yahoo!

Submission + - Yahoo building open source Map/Reduce and GFS (yahoo.net)

owenomalley writes: "Yahoo is developing Hadoop, which is an open source implementation of key pieces of Google's infrastructure (namely, Map/Reduce and GFS). Hadoop's framework allows you to write applications that reliably process very large datasets (100's of terabytes) efficiently on large (1000+) clusters of computers. Without a framework like Hadoop, writing applications on large clusters requires a lot of duplicated effort as each application deals with distribution, reliability, and reporting. Hadoop handles those parts for you and just requires you to write your application logic.

Hadoop is managed under Apache."

Slashdot Top Deals

It's a naive, domestic operating system without any breeding, but I think you'll be amused by its presumption.

Working...