Comment The Shannon Limit is the Big Bottleneck Problem (Score 1) 78
Today unstructured information search findability is limited by the Shannon Limit, this is a fundamental physical limit since all pattern search engines are statistical decoders. Google does a little better than the Shannon limit by looking at which search results are selected, this a communal intelligence technique based on how we "vote" for the right result. Unfortunately this only works well for high volume searches, that's why Google work's best if you know exactly what you're looking for or you're looking for what everyone else is looking for. In commercial search Google is losing out to companies like Amazon that are investing in editorial findability enhancements using lots of work by folks on Mechanical Turk.
EBay is actually doing some of the most interesting work in findability research, that's because they have the 'everything' search problem, this is harder than the popular search problem the Google is mostly concerned with now. Google seems to have given up on technical, scientific, commercial, practical, professional, industrial, and other sorts of specialized search. This sort of information usually has a very low Shannon Limit, that's why professional search usually uses extensive manual indexing such as that provided by Westlaw.
The holy grail of machine translation is automatically extracting the exposition structure (rigor, rationale, rhetoric) from texts, but nearly no progress has been made so far despite decades of research, and here again the problem is the Shannon Limit. Presumably this is the problem that MEMEX must solve in order to succeed, but this can't be solved by machine intelligence alone,
that's a physical impossibility, so even fashionable techniques such as deep learning are out the window.
In the publishing world there are techniques that turn exposition structures into texts, this is authoring automation. This could be used to generate sample texts for some target search, and machine intelligence can be used to score matches between generated samples and search texts. In this way it might be possible to automate educated guessing of the exposition structure, and it partially gets around the Shannon limit with systematic editorial augmentation.