waderoush writes: If you’re a human searching the Web for the answer to a homework assignment, a health problem, or a trivia question, you need help sifting through billions of pages for the most relevant and reliable one. That’s the problem Google solved back in 1998 with Page Rank, laying the groundwork for a search, advertising, and mobile empire. But today, the challenges involved in organizing the world’s information and making it useful (to quote from Google’s own mission statement) are very different. A growing percentage of Web traffic isn’t from humans at all — it’s from automated agents that only care about specific parts of Web pages. Think of Instapaper, which provides simplified views of news articles, or Siri, which can check the weather or find open tables at a restaurant. To do their jobs, these bots need to understand the inner structure of Web pages. And that’s what Diffbot is helping with. The seven-man startup in Palo Alto, CA, uses computer vision and machine learning to recognize and classify the components of Web pages. Developers at hundreds of companies, from Digg to Onswipe to Pinterest, are tapping Diffbot’s four existing APIs to grab and repurpose specific data from news articles, images, product pages, and home pages (APIs for 16 more page types are in the works). At the same time, Diffbot is building its own global index of structured Web data. Once it's complete, it could become the substrate for a new economy of what CEO Mike Tung calls 'mini-AIs' that create new knowledge from old knowledge. 'Once we have all 20 [page types], we will essentially be able to cover the gamut, and convert most of the Web into a database structure,' Tung says.
The brain is a wonderful organ; it starts working the moment you get up
in the morning, and does not stop until you get to work.