Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
IBM

Submission + - IBM Designing Superman Servers for World's Largest Telescope (slashdot.org)

Nerval's Lobster writes: "How’s this for a daunting task? By 2017, IBM must develop low-power microservers that can handle 10 times the traffic of today’s Internet—and resist blowing desert sands, to boot. Sound impossible? Hopefully not. Those are the design parameters of the Square Kilometer Array (SKA) Project, the world’s largest radio telescope, located in South Africa and Australia amidst some of the world’s most rugged terrain. It will be up to the SKA-specific business unit of South Africa’s National Research Foundation, IBM, and ASTON (also known as the Netherlands Institute for Radio Astronomy) to jointly design the servers. Scientists from all three organizations will collaborate remotely and at the newly established ASTRON & IBM Center for Exascale Technology in Drenthe, the Netherlands. By peering into the furthest regions of space, the SKA project hopes to glimpse “back in time,” where the radio waves from some of the earliest moments of the universe—before stars were formed—are still detectable. The hardware is powerful enough to pick up an airport radar on a planet 50 light-years away, according to the SKA team."
Books

Submission + - Book Review: Hadoop Beginner's Guide (barnesandnoble.com)

sagecreek writes: "Hadoop is an open-source, Java-based framework for large-scale data processing. Typically, it runs on big clusters of computers working together to crunch large chunks of data. You also can run Hadoop in “single-cluster mode” on a Linux machine, Windows PC or Mac, to learn the technology or do testing and debugging. The Hadoop framework, however, is not quickly mastered. Apache’s Hadoop wiki cautions: “If you do not know about classpaths, how to compile and debug Java code, step back from Hadoop and learn a bit more about Java before proceeding.” But if you are reasonably comfortable with Java, the well-written Hadoop Beginner’s Guide by Garry Turkington can help you start mastering this rising star in the Big Data constellation.

Dr. Turkington is vice president of data engineering and lead architect for London-based Improve Digital. He holds a doctorate in computer science from Queens University of Belfast in Northern Ireland. His Hadoop Beginner’s Guide provides an effective overview of Hadoop and hands-on guidance in how to use it locally, in distributed hardware clusters, and out in the cloud.

Packt Publishing provided a review copy of the book. I have reviewed one other Packt book previously.

Much of the first chapter is devoted to “exploring the trends that led to Hadoop's creation and its enormous success.” This includes brief discussions of Big Data, cloud computing, Amazon Web Services, and the differences between “scale-up” (using increasingly larger computers as data needs grow) and “scale-out” (spreading the data processing onto more and more machines as demand expands).

“One of the most confusing aspects of Hadoop to a newcomer,” Dr. Turkington writes, “is its various components, projects, sub-projects, and their interrelationships.”

His 374-page book emphasizes three major aspects of Hadoop: (1) its common projects; (2) the Hadoop File Distribution System (HFDS); and (3) MapReduce.

“Common projects,” he explains, “comprise a set of libraries and tools that help the Hadoop product work in the real world.”

The HFDS, meanwhile, “is a filesystem unlike most you may have encountered before.” As a distributed filesystem, it can spread data storage across many nodes. “[I]t stores files in blocks typically at least 64 MB in size, much larger than the 4-32 KB seen in most filesystems.” The book briefly describes several features, strengths, weaknesses, and other aspects of HFDS.

Finally, MapReduce is a well-known programming model for processing large data sets. Typically, MapReduce is used with clusters of computers that perform distributed computing. In the “Map” portion of the process, a single problem is split into many subtasks that are then assigned by a master computer to individual computers known as nodes (and there can be sub-nodes). During the “Reduce” part of the task, the master computer gathers up the processed data from the nodes, combines it and outputs a response to the problem that was posed to be solved. (MapReduce libraries are now available for many different computer languages, including Hadoop.)

“The developer focuses on expressing the transformation between source and result data sets, and the Hadoop framework manages all aspects of job execution, parallelization, and coordination,” Dr. Turkington notes. He calls this “possibly the most important aspect of Hadoop. The platform takes responsibility for every aspect of executing the processing across the data. After the user defines the key criteria for the job, everything else becomes the responsibility of the system.”

In this 11-chapter book, the first two chapters introduce Hadoop and explain how to install and run the software.

Three chapters are devoted to learning to work with MapReduce, from beginner to advanced levels. And the author stresses: “In the book, we will be learning how to write MapReduce programs to do some serious data crunching and how to run them on both locally managed and AWS-hosted Hadoop clusters.” [“AWS” is “Amazon Web Services.”]

Chapter 6, titled “When Things Break” zeroes in on Hadoop’s “resilience to failure and an ability to survive failures when they do happen.much of the architecture and design of Hadoop is predicated on executing in an environment where failures are both frequent and expected.” But node failures and numerous other problems still can arise, so the reader is given an overview of potential difficulties and how to handle them.

The next chapter, “Keeping Things Running,” lays out what must be done to properly maintain a Hadoop cluster and keep it tuned and ready to crunch data.

Three of the remaining chapters show how Hadoop can be used elsewhere within an organization’s systems and infrastructure, by personnel who are not trained to write MapReduce programs.

Chapter 8, for example, provides “A Relational View on Data with Hive.” What Hive provides is “a data warehouse that uses MapReduce to analyze data stored on HFDS,” Dr. Turkington notes. “In particular, it provides a query language called HiveQL that closely resembles the common Structured Query Language (SQL) standard.”

Using Hive as an interface to Hadoop “not only accelerates the time required to produce results from data analysis, it significantly broadens who can use Hadoop and MapReduce. Instead of requiring software development skills, anyone with a familiarity with SQL can use Hive,” the author states.

But, as Chapter 9 makes clear, Hive is not a relational database, and it doesn’t fully implement SQL. So the text and code examples in Chapter 9 illustrate (1) how to set up MySQL to work with Hadoop and (2) how to use Sqoop to transfer bulk data between Hadoop and MySQL.

Chapter 10 shows how to set up and run Flume NG. This is a distributed service that collects, aggregates, and moves large amounts of log data from applications to Hadoop's HFDS.

The book’s final chapter, “Where to Go Next,” helps the newcomer see what else is available beyond the Hadoop core product. “There are,” Dr. Turkington emphasizes, “a plethora of related projects and tools that build upon Hadoop and provide specific functionality or alternative approaches to existing ideas.” He provides a quick tour of several of the projects and tools.

A key strength of this beginner’s guide is in how its contents are structured and delivered. Four important headings appear repeatedly in most chapters. The “Time for action” heading singles out step-by-step instructions for performing a particular action. The “What just happened?” heading highlights explanations of “the working of tasks or instructions that you have just completed.” The “Pop quiz” heading, meanwhile, is followed by short, multiple-choice questions that help you gauge your understanding. And the “Have a go hero” heading introduces paragraphs that “set practical challenges and give you ideas for experimenting with what you have learned.”

Hadoop can be downloaded free from the Apache Software Foundation’s Hadoop website.

Dr. Turkington’s book does a good job of describing how to get Hadoop running on Ubuntu and other Linux distributions. But while he assures that “Hadoop does run well on other systems,” he notes in his text: “Windows is supported only as a development platform, and Mac OS X is not formally supported at all.” He refers users to Apache’s Hadoop FAQ wiki for more information. Unfortunately, few details are offered there. So web searches become the best option for finding how-to instructions for Windows and Macs.

Running Hadoop on a Windows PC typically involves installing Cygwin and openSSH, so you can simulate using a Linux PC. But other choices can be found via sites such as Hadoop Wizard and Hadoop on Windows with Eclipse".

To install Hadoop on a Mac running OS X Mountain Lion, you will need to search for websites that offer how-to tips. Here is one example.

There are other ways get access to Hadoop on a single computer, using other operating systems or virtual machines. Again, web searches are necessary. The Cloudera Enterprise Free product is one virtual-machine option to consider.

Once you get past the hurdle of installing and running Hadoop, Garry Turkington’s well-written, well-structured Hadoop Beginner’s Guide can start you moving down the lengthy path to becoming an expert user.

You will have the opportunity, the book's tagline states, to "[l]earn how to crunch big data to extract meaning from the data avalanche.”

(Si Dunn is an author, screenwriter, and technology book reviewer.)"

Google

Submission + - Google talks about the dangers of user content (blogspot.com)

An anonymous reader writes: I stumbled on an interesting, in-depth article on the Google security blog about the dangers faced by modern web applications when hosting any user supplied data. The surprising conclusion is that it's apparently almost impossible to host images or text files safely unless you use a completely separate domain. Is it really that bad? Why after 15 years we still can't get it right?
Sci-Fi

Majority of Americans Think Obama Is Better Suited To Handle an Alien Invasion 305

Geoffrey.landis writes "At last, a public opinion poll that gets the opinions of ordinary Americans on the issues that matter! Apparently, two thirds of Americans polled think that Barack Obama is better suited to defend against an alien invasion than Mitt Romney, according to a survey from National Geographic Channel, done to tout their upcoming TV series 'chasing UFOs'. In follow-up questioning, Americans would rather call on the Hulk (21%) than either Batman (12%) or Spiderman (8%) to save the day. No word on which candidate is most fit to defend America against shambling hordes of undead seeking to destroy civilization in the zombie apocalypse (perhaps that will be brought out in the debates)." The real question of course is how Obama would handle Galactus.
Space

Submission + - No, asteroid 2012 DA14 will not hit us next February (discovermagazine.com)

The Bad Astronomer writes: "News is starting to spread about a small 45-meter-wide asteroid called 2012 DA14 that will make a close pass to Earth on February 15, 2013. However, some of these articles are claiming it has "a good chance" of impacting the Earth. This is simply incorrect; the odds of an impact next year are essentially zero. Farther in the future the odds are unclear; another near pass may occur in 2020, but right now the uncertainties in the asteroid's orbit are too large to know much about that. More observations of DA14 are being made, and we should have better information about future encounters soon."
Privacy

Submission + - Have we lost our Privacy to the Internet? (guardian.co.uk)

An anonymous reader writes: An article in the Guardian, penned by Joss Wright and Tom Chatfield, discusses whether we — as in Internet users in general — are, or indeed are not, giving away way too much information about ourselves to large Corporations that profit handsomely from mining the info. The article talks about how contemporary internet companies — perhaps predictably — are run with a "privacy is dead" motto. It considers what implications having all your private data out on the internet — where it can be seen, searched, shared, retransmitted, perhaps archived forever without your consent — has for the "future of our society" (by which the authors presumably mean the society of the UK). The (rather long) article ends by mentioning that Gmail scans your email, that Facebook apps frequently send your private data right to the app developer, that iPhones are known to log your geographic location, and that some smartphone apps read your address book and messages, then dial home to transmit this info to the company that developed the app.
Privacy

Submission + - Verizon selling "new" iPhones with preloaded data?

dnahelicase writes: One of my employees just went to the Verizon store in town to upgrade his Blackberry Storm to a "new" iPhone 4S. I told him to take it there and they would transfer his contacts, pictures, etc — and I'd setup his work email when he returned. When he did, I was surprised to find "preloaded" apps, such as Facebook, Bank of America, and others — as well as purchased music and a hotmail account — all setup for a recent high school grad 50 miles south of here.

Verizon denies selling him a used phone, and denies this has ever happened before. Has anyone else received a "new" phone with someone else's data on it?

I was able to post to this kid's facebook wall to tell him, as well as emailing him on his two email accounts. Verizon has given me a window into his personal life, his contacts, his music. I can call his (new) girlfriend, his sister, his high school friends. Why didn't they just clear the phone and we wouldn't have been the wiser?
Desktops (Apple)

Submission + - OS X 10.8 Mountain Lion is a cougar I'd take home (bgr.com)

zacharye writes: Apple just did something the company hasn’t done very often in recent years — it completely surprised nearly every single person with the announcement of OS X 10.8 Mountain Lion, the next OS for the Mac. Inching a step closer to bridging the gap between iOS and OS X, Mountain Lion brings practically all of iOS’s featured apps to desktop and laptop computer. From Messages, which I’ve been waiting for ever since iMessage was announced, to built-in iCloud support, Notification Center, Game Center, Reminders, Notes, a much-improved Safari browser, AirPlay and more, the two OSes are practically the same now in terms of system apps. This is the first developer preview of OS X 10.8 Mountain Lion, and the OS will only improve before it is released in the summer. If you’re itching for my thoughts, though, you’ll find them after the break...
Space

Submission + - Felix Baumgartner Is Space Diving in Pursuit of Science (txchnologist.com) 1

An anonymous reader writes: This summer, an Austrian named Felix Baumgartner plans to ride a 600-foot tall balloon halfway up the stratosphere. When he reaches 120,000 feet, he will jump.

What happens next is swathed in mystery, but a few things are certain. For a short time inside his pressurized spacesuit, Baumgartner, a professional BASE jumper, will be the fastest man alive. Thirty seconds after leaping, he’ll exceed the speed of sound in the thin upper atmosphere by traveling almost 700 miles per hour. And if he safely parachutes to the ground between 12 and 15 minutes later, he’ll walk away with at least four new records: the highest skydive, the longest free-fall, the first to reach supersonic speeds in free-fall, and the highest manned balloon ride.

Earth

Submission + - Californian seismologist testifies against scientists in quake manslaughter case (nature.com)

ananyo writes: The courthouse in L’Aquila, Italy, yesterday hosted a highly anticipated hearing in the trial of six seismologists and one government official indicted for manslaughter over their reassurances to the public ahead of a deadly earthquake in 2009 (see http://www.nature.com/news/2011/110526/full/news.2011.325.html). During the hearing, the former head of the Italian Department of Civil Protection turned from key witness into defendant, and a seismologist from California criticized Italy’s top earthquake experts. Lalliana Mualchin, former chief seismologist for the Department of Transportation in California, criticized the Italian analysis, which he says was based on a poor model. If the court agrees with Mualchin, the defendants could face up to 12 years in jail.

Slashdot Top Deals

"Don't try to outweird me, three-eyes. I get stranger things than you free with my breakfast cereal." - Zaphod Beeblebrox in "Hithiker's Guide to the Galaxy"

Working...