Follow Slashdot blog updates by subscribing to our blog RSS feed


Forgot your password?
Slashdot Deals: Deal of the Day - Pay What You Want for the Learn to Code Bundle, includes AngularJS, Python, HTML5, Ruby, and more. ×

An Algorithm That Can Predict Human Behavior Better Than Humans ( 84

Quartz describes an MIT study with the surprising conclusion that at least in some circumstances, an algorithm can not only sift numbers faster than humans (after all, that's what computers are best at), but also discern relevant factors within a complex data set more accurately and more quickly than can teams of humans. In a competition involving 905 human teams, a system called the Data Science Machine, designed by MIT master's student Max Kanter and his advisor, Kalyan Veeramachaneni, beat most of the humans for accuracy and speed in three tests of predictive power, including one about "whether a student would drop out during the next ten days, based on student interactions with resources on an online course." Teams might have looked at how late students turned in their problem sets, or whether they spent any time looking at lecture notes. But instead, MIT News reports, the two most important indicators turned out to be how far ahead of a deadline the student began working on their problem set, and how much time the student spent on the course website. ... The Data Science Machine performed well in this competition. It was also successful in two other competitions, one in which participants had to predict whether a crowd-funded project would be considered “exciting” and another if a customer would become a repeat buyer.
Data Storage

How a Frozen Neutrino Observatory Grapples With Staggering Amounts of Data ( 49

citadrianne writes: Deep beneath the Antarctic ice sheet, sensors buried in a billion tons of ice—a cubic kilometer of frozen H2O—are searching for neutrinos. "We neutrino from the atmosphere every ~10 minutes that we sort of care about, and one neutrino per month that comes from an astrophysical sources that we care about a very great deal," researcher Nathan Whitehorn said. "Each particle interaction takes about 4 microseconds, so we have to sift through data to find the 50 microseconds a year of data we actually care about." Computing facilities manager Gonzalo Merino added, "If the filtered data from the Pole amounts to ~36TB/year, the processed data amounts to near 100TB/year." Because IceCube can't see satellites in geosynchronous orbit from the pole, internet coverage only lasts for six hours a day, Whitehorn explained. The raw data is stored on tape at the pole, and a 400-core cluster makes a first pass at the data to cut it down to around 100GB/day. A 4000-CPU dedicated local cluster crunches the numbers. Their storage system has to handle typical loads of "1-5GB/sec of sustained transfer levels, with thousands of connections in parallel," Merino explained.
United Kingdom

Big Data Attempts To Find Meaning In 40 Years of UK Political Debate ( 44

An anonymous reader writes: International researchers have analyzed 40 years of political speeches from the UK Parliament in an effort to move the study of political theory from social science towards the quantitative analysis offered by Big Data analytics techniques. The group used Python to crunch publicly available data from, comprising 3.7 million individual speeches. A few strange trends emerged in this initial experiment, such as the decline of 'health care' as a trending Parliamentary topic, with 'welfare' consistently on the rise, and the decrease of fervent interest in any particular topic during the more pacific years under Margaret Thatcher and Tony Blair.

Dell, EMC Said To Be In Merger Talks ( 97

itwbennett writes: According to a Wall Street Journal report (paywalled), Dell might buy some or all of storage giant EMC. (The grain of salt here is that the Journal's report cited unnamed sources, and cautioned that the companies might not finalize any agreement.) If the report has it right, though, "a total merger would be one of the biggest deals ever in the technology industry," writes Stephen Lawson for IDG, "with EMC holding a market value of about US$50 billion. It would also bring together two of the most important vendors to enterprise IT departments."

Tracing the Limits of Computation 82

An anonymous reader writes: For more than 40 years, researchers had been trying to find a better way to compare two arbitrary strings of characters, such as the long strings of chemical letters within DNA molecules. The most widely used algorithm is slow and not all that clever: It proceeds step-by-step down the two lists, comparing values at each step. If a better method to calculate this "edit distance" could be found, researchers would be able to quickly compare full genomes or large data sets, and computer scientists would have a powerful new tool with which they could attempt to solve additional problems in the field.

Yet in a paper presented at the ACM Symposium on Theory of Computing, two researchers from the Massachusetts Institute of Technology put forth a mathematical proof that the current best algorithm was "optimal" — in other words, that finding a more efficient way to compute edit distance was mathematically impossible. But researchers aren't quite ready to record the time of death. One significant loophole remains. The impossibility result is only true if another, famously unproven statement called the strong exponential time hypothesis (SETH) is also true.

An AI Hunts the Wild Animals Carrying Ebola 45

the_newsbeagle writes: Outbreaks of infectious diseases like Ebola follow a depressing pattern: People start to get sick, public health authorities get wind of the situation, and an all-out scramble begins to determine where the disease started and how it's spreading. Barbara Han, a code-writing ecologist, hopes her algorithms will put an end to that reactive model. She wants to predict outbreaks and enable authorities to prevent the next pandemic. Han takes a big-data approach, using a machine-learning AI to identify the wild animal species that carry zoonotic diseases and transmit them to humans.

The Difficulty In Getting a Machine To Forget Anything 79

An anonymous reader writes: When personal information ends up in the analytical whirlpool of big data, it almost inevitably becomes orphaned from any permissions framework that the discloser granted for its original use; machine learning systems, commercial and otherwise, end up deriving properties and models from the data until the replication, duplication and derivation of that data can never hoped to be controlled or 'called back' by the originator. But researchers now propose a revision which can be imposed upon existing machine-learning frameworks, interposing a 'summation' layer between user data and the learning system, effectively tokenising the information without anonymising it, and providing an auditable path whereby withdrawal of the user information would ripple through all iterations of systems which have utilized it — genuine 'cancellation' of data.

World's Most Powerful Digital Camera Sees Construction Green Light 89

An anonymous reader writes: The Department of Energy has approved the construction of the Large Synoptic Survey Telecscope's 3.2-gigapixel digital camera, which will be the most advanced in the world. When complete the camera will weigh more than three tons and take such high resolution pictures that it would take 1,500 high-definition televisions to display one of them. According to SLAC: "Starting in 2022, LSST will take digital images of the entire visible southern sky every few nights from atop a mountain called Cerro Pachón in Chile. It will produce a wide, deep and fast survey of the night sky, cataloging by far the largest number of stars and galaxies ever observed. During a 10-year time frame, LSST will detect tens of billions of objects—the first time a telescope will observe more galaxies than there are people on Earth – and will create movies of the sky with unprecedented details. Funding for the camera comes from the DOE, while financial support for the telescope and site facilities, the data management system, and the education and public outreach infrastructure of LSST comes primarily from the National Science Foundation (NSF)."

UNC Scientists Open Source Their Genomic Research 10

ectoman writes: The human genome specifies more than 500 "kinases," enzymes that spur protein synthesis. Four hundred of them are still mysteries to us, even though knowledge about them could spark serious medical innovations. But scientists at the University of North Carolina, Chapel Hill, have initiated an open source effort to map them all—research they think could pioneer a new generation of drug discovery. As members of the Structural Genomics Consortium, the chemical biologists are spearheading a worldwide community project. "We need a community to build a map of what kinases do in biology," one said. "It has to be a community-generated map to get the richness and detail we need to be able to move some of these kinases into drug facilities. But we're just doing the source code. Until someone puts the source code out there and makes it available to everybody, people won't have anything to modify."
Data Storage

Object Storage and POSIX Should Merge 66

storagedude writes: Object storage's low cost and ease of use have made it all the rage, but a few additional features would make it a worthier competitor to POSIX-based file systems, writes Jeff Layton at Enterprise Storage Forum. Byte-level access, easier application portability and a few commands like open, close, read, write and lseek could make object storage a force to be reckoned with.

'Having an object storage system that allows byte-range access is very appealing,' writes Layton. 'It means that rewriting applications to access object storage is now an infinitely easier task. It can also mean that the amount of data touched when reading just a few bytes of a file is greatly reduced (by several orders of magnitude). Conceptually, the idea has great appeal. Because I'm not a file system developer I can't work out the details, but the end result could be something amazing.'

Oracle To Debut Low-Cost SPARC Chip Next Month 92

jfruh writes: Of the many things Oracle acquired when it absorbed Sun, the SPARC processors have not exactly been making headlines. But that may change next month when the company debuts a new, lower-cost chip that will compete with Intel's Xeon. "Debut," in this case, means only an introduction, though -- not a marketplace debut. From the article: [T]he Sparc M7 will have technologies for encryption acceleration and memory protection built into the chip. It will also include coprocessors to accelerate database performance. "The idea of Sonoma is to take exactly those same technologies and bring them down to very low cost points, so that people can use them in cloud computing and for smaller applications, and even for smaller companies who need a lower entry point," [Oracle head of systems John] Fowler said. ... [Fowler] didn’t talk about prices or say how much cheaper the new Sparc systems will be, and it could potentially be years before Sonoma comes to market—Oracle isn’t yet saying. Its engineers are due to discuss Sonoma at the Hot Chips conference in Silicon Valley at the end of the month, so we might learn more then.
Data Storage

Ask Slashdot: How Do You Store a Half-Petabyte of Data? (And Back It Up?) 219

An anonymous reader writes: My workplace has recently had two internal groups step forward with a request for almost a half-petabyte of disk to store data. The first is a research project that will computationally analyze a quarter petabyte of data in 100-200MB blobs. The second is looking to archive an ever increasing amount of mixed media. Buying a SAN large enough for these tasks is easy, but how do you present it back to the clients? And how do you back it up? Both projects have expressed a preference for a single human-navigable directory tree. The solution should involve clustered servers providing the connectivity between storage and client so that there is no system downtime. Many SAN solutions have a maximum volume limit of only 16TB, which means some sort of volume concatenation or spanning would be required, but is that recommended? Is anyone out there managing gigantic storage needs like this? How did you do it? What worked, what failed, and what would you do differently?

As Cloud Growth Booms, Server Farms Get Super-Sized 57

1sockchuck writes: Internet titans are concentrating massive amounts of computing power in regional cloud campuses housing multiple data centers. These huge data hubs, often in rural communities, enable companies to rapidly add server capacity and electric power amid rapid growth of cloud hosting and social sharing. As this growth continues, we'll see more of these cloud campuses, and they'll be bigger than the ones we see today. Some examples from this month: Google filed plans for a mammoth 800,000 square foot data center near Atlanta, Equinix announced 1 million square feet of new data centers on its campus in Silicon Valley, and Facebook began work on a $1 billion server farm in Texas that will span 750,000 square feet.

Police Scanning Every Face At UK Download Festival 134

AmiMoJo writes: Leicestershire Police have announced that they will be scanning every face at the popular UK Download music festival. The announcement article on Police Oracle (paywalled) reads, "the strategically placed cameras will scan faces at the Download Festival site in Donington before comparing it with a database of custody images from across Europe." The stated goal is to catch mobile phone thieves. Last year only 91 of the 120,000 visitors to the festival were arrested, and it isn't clear if the data will be deleted once checked against the database. The linked article provides at least one image of a costume that would probably trip up any facial recognition technology yet devised.

NASA Releases Massive Climate Change Data Set 310

An anonymous reader writes: NASA is releasing global climate change projections to help scientists and planners better understand local and global effects of hazards. The data includes both historical measurements from around the world and simulated projections based on those measurements. "The NASA climate projections provide a detailed view of future temperature and precipitation patterns around the world at a 15.5 mile (25 kilometer) resolution, covering the time period from 1950 to 2100. The 11-terabyte dataset provides daily estimates of maximum and minimum temperatures and precipitation over the entire globe." You can download them and look through the projections yourself at NASA's Climate Model Data Services page.

The two most common things in the Universe are hydrogen and stupidity. -- Harlan Ellison