Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
Businesses

Startup Uses Sensor Networks To Debug Science Experiments (xconomy.com) 7

gthuang88 writes: Environmental factors like temperature, humidity, or lighting often derail life science experiments. Now Elemental Machines, a startup from the founders of Misfit Wearables, is trying to help scientists debug experiments using distributed sensors and machine-learning software to detect anomalies. The product is in beta testing with academic labs and biotech companies. The goal is to help speed up things like biology research and drug development. Wiring up experiments is part of a broader effort to create "smart labs" that automate some of the scientific process.
Businesses

How Uber Profits Even When Its Drivers Aren't Earning Money (vice.com) 179

tedlistens writes: Jay Cassano spoke to Uber drivers about "dead miles" and what work means when your boss is an algorithm, and considers a new frontier of labor concerns and big data. "Uber is the closest thing to an employer we've ever seen in this industry," Bhairavi Desai, founder of the New York Taxi Workers Alliance, told him. "They not only direct every aspect of a driver's workday, they also profit off the entire day through data collection, not just the 'sale of a product.'"
The Media

How To Build a TimesMachine (nytimes.com) 41

necro81 writes: The NY Times has an archive, the TimesMachine, that allows users to find any article from any issue from 1851 to the present day. Most of it is shown in the original typeset context of where an article appeared on a given page — like sifting through a microfiche archive. But when original newspaper scans are 100-MB TIFF files, how can this information be conveyed in an efficient manner to the end user? These are other computational challenges are described in this blog post on how the TimesMachine was realized.
Software

Hunting Malware With GPUs and FPGAs (hackaday.com) 44

szczys writes: Rick Wesson has been working on a solution to identify the same piece of malware that has been altered through polymorphism (a common method of escaping detection). While the bits are scrambled from one example to the next, he has found that using a space filling curve makes it easy to cluster together polymorphically similar malware samples. Forming the fingerprint using these curves is computationally expensive. This is an Internet-scale problem which means he currently needs to inspect 300,000 new samples a day. Switching to a GPU to do the calculation proved four orders of magnitude efficiency over CPUs to reach about 200,000 samples a day. Rick has begun testing FPGA processing, aiming at a goal of processing 10 million samples in four hours using a machine drawing 4000 Watts.
Businesses

Uber Scaling Up Its Data Center Infrastructure (datacenterfrontier.com) 33

1sockchuck writes: Connected cars generate a lot of data. That's translating into big business for data center providers, as evidenced by a major data center expansion by Uber, which needs more storage and compute power to support its global data platform. Uber drivers' mobile phones send location updates every 4 seconds, which is why the design goal for Uber's geospatial index is to handle a million writes per second. It's a reminder that as our cars become mini data centers, the data isn't staying onboard, but will also be offloaded to the data centers of automakers and software companies.
Networking

Enterprise Datacenter Hardware Assumptions May Be In For a Shakeup (acm.org) 100

conner_bw writes: For the entire careers of most practicing computer scientists, a fundamental observation has consistently held true: CPUs are significantly more performant and more expensive than I/O devices. The fact that CPUs can process data at extremely high rates, while simultaneously servicing multiple I/O devices, has had a sweeping impact on the design of both hardware and software for systems of all sizes, for pretty much as long as we've been building them. This assumption, however, is in the process of being completely invalidated.
Businesses

Panasonic To Commercialize Facebook's Blu-Ray Cold Storage Systems (cio.com) 56

itwbennett writes: A couple of years ago, Facebook revealed it was using Blu-ray disks as a cost-efficient way to archive the billions of images that users uploaded to its service. When Facebook users upload photos, they're often viewed frequently in the first week, so Facebook stores them on solid state drives or spinning hard disks. But as time goes on the images get viewed less and less. At a certain point, Facebook dumps them onto high-capacity Blu ray discs, where they might sit for years without being looked at. Now, Panasonic has said it plans to commercialize the technology for other businesses, and is working on new disks that will hold a terabyte of data.
Security

Over 650 TB of Data Up For Grabs From Publicly Exposed MongoDB Database (csoonline.com) 96

itwbennett writes: A scan performed over the past few days by John Matherly, the creator of the Shodan search engine, has found that there are at least 35,000 publicly accessible and insecure MongoDB databases on the Internet, and their number appears to be growing. Combined they expose 684.8 terabytes of data to potential theft. Matherly originally sounded the alarm about this issue back in July, when he found nearly 30,000 unauthenticated MongoDB instances. He decided to revisit the issue after a security researcher named Chris Vickery recently found information exposed in such databases that was associated with 25 million user accounts from various apps and services, including 13 million users of the controversial OS X optimization program MacKeeper, as reported on Slashdot on Wednesday.
AI

An Algorithm That Can Predict Human Behavior Better Than Humans (mit.edu) 84

Quartz describes an MIT study with the surprising conclusion that at least in some circumstances, an algorithm can not only sift numbers faster than humans (after all, that's what computers are best at), but also discern relevant factors within a complex data set more accurately and more quickly than can teams of humans. In a competition involving 905 human teams, a system called the Data Science Machine, designed by MIT master's student Max Kanter and his advisor, Kalyan Veeramachaneni, beat most of the humans for accuracy and speed in three tests of predictive power, including one about "whether a student would drop out during the next ten days, based on student interactions with resources on an online course." Teams might have looked at how late students turned in their problem sets, or whether they spent any time looking at lecture notes. But instead, MIT News reports, the two most important indicators turned out to be how far ahead of a deadline the student began working on their problem set, and how much time the student spent on the course website. ... The Data Science Machine performed well in this competition. It was also successful in two other competitions, one in which participants had to predict whether a crowd-funded project would be considered “exciting” and another if a customer would become a repeat buyer.
Data Storage

How a Frozen Neutrino Observatory Grapples With Staggering Amounts of Data (vice.com) 49

citadrianne writes: Deep beneath the Antarctic ice sheet, sensors buried in a billion tons of ice—a cubic kilometer of frozen H2O—are searching for neutrinos. "We collect...one neutrino from the atmosphere every ~10 minutes that we sort of care about, and one neutrino per month that comes from an astrophysical sources that we care about a very great deal," researcher Nathan Whitehorn said. "Each particle interaction takes about 4 microseconds, so we have to sift through data to find the 50 microseconds a year of data we actually care about." Computing facilities manager Gonzalo Merino added, "If the filtered data from the Pole amounts to ~36TB/year, the processed data amounts to near 100TB/year." Because IceCube can't see satellites in geosynchronous orbit from the pole, internet coverage only lasts for six hours a day, Whitehorn explained. The raw data is stored on tape at the pole, and a 400-core cluster makes a first pass at the data to cut it down to around 100GB/day. A 4000-CPU dedicated local cluster crunches the numbers. Their storage system has to handle typical loads of "1-5GB/sec of sustained transfer levels, with thousands of connections in parallel," Merino explained.
United Kingdom

Big Data Attempts To Find Meaning In 40 Years of UK Political Debate (thestack.com) 44

An anonymous reader writes: International researchers have analyzed 40 years of political speeches from the UK Parliament in an effort to move the study of political theory from social science towards the quantitative analysis offered by Big Data analytics techniques. The group used Python to crunch publicly available data from theyworkforyou.com, comprising 3.7 million individual speeches. A few strange trends emerged in this initial experiment, such as the decline of 'health care' as a trending Parliamentary topic, with 'welfare' consistently on the rise, and the decrease of fervent interest in any particular topic during the more pacific years under Margaret Thatcher and Tony Blair.
Businesses

Dell, EMC Said To Be In Merger Talks (itworld.com) 97

itwbennett writes: According to a Wall Street Journal report (paywalled), Dell might buy some or all of storage giant EMC. (The grain of salt here is that the Journal's report cited unnamed sources, and cautioned that the companies might not finalize any agreement.) If the report has it right, though, "a total merger would be one of the biggest deals ever in the technology industry," writes Stephen Lawson for IDG, "with EMC holding a market value of about US$50 billion. It would also bring together two of the most important vendors to enterprise IT departments."
Math

Tracing the Limits of Computation 82

An anonymous reader writes: For more than 40 years, researchers had been trying to find a better way to compare two arbitrary strings of characters, such as the long strings of chemical letters within DNA molecules. The most widely used algorithm is slow and not all that clever: It proceeds step-by-step down the two lists, comparing values at each step. If a better method to calculate this "edit distance" could be found, researchers would be able to quickly compare full genomes or large data sets, and computer scientists would have a powerful new tool with which they could attempt to solve additional problems in the field.

Yet in a paper presented at the ACM Symposium on Theory of Computing, two researchers from the Massachusetts Institute of Technology put forth a mathematical proof that the current best algorithm was "optimal" — in other words, that finding a more efficient way to compute edit distance was mathematically impossible. But researchers aren't quite ready to record the time of death. One significant loophole remains. The impossibility result is only true if another, famously unproven statement called the strong exponential time hypothesis (SETH) is also true.
AI

An AI Hunts the Wild Animals Carrying Ebola 45

the_newsbeagle writes: Outbreaks of infectious diseases like Ebola follow a depressing pattern: People start to get sick, public health authorities get wind of the situation, and an all-out scramble begins to determine where the disease started and how it's spreading. Barbara Han, a code-writing ecologist, hopes her algorithms will put an end to that reactive model. She wants to predict outbreaks and enable authorities to prevent the next pandemic. Han takes a big-data approach, using a machine-learning AI to identify the wild animal species that carry zoonotic diseases and transmit them to humans.
Google

The Difficulty In Getting a Machine To Forget Anything 79

An anonymous reader writes: When personal information ends up in the analytical whirlpool of big data, it almost inevitably becomes orphaned from any permissions framework that the discloser granted for its original use; machine learning systems, commercial and otherwise, end up deriving properties and models from the data until the replication, duplication and derivation of that data can never hoped to be controlled or 'called back' by the originator. But researchers now propose a revision which can be imposed upon existing machine-learning frameworks, interposing a 'summation' layer between user data and the learning system, effectively tokenising the information without anonymising it, and providing an auditable path whereby withdrawal of the user information would ripple through all iterations of systems which have utilized it — genuine 'cancellation' of data.

Slashdot Top Deals

"There are things that are so serious that you can only joke about them" - Heisenberg

Working...