Wikipedia Founder Working on User-Powered Search 74
An anonymous reader writes "Jimmy Wales, founder of the Wikia corporation, has revealed plans to offer a user-driven search engine. Ars Technica reports that the plan is to leverage user preferences to pick the 'best' site for any given search term, while at the same time utilizing advertising for commercial gain. The article admits this may not be the ideal solution: 'Users may be reluctant to contribute to the betterment of a commercial site that may end up being bought by a bigger company. Consider, for example, the tragic death of TV Tome, a comprehensive community-driven television content guide that was eventually bought by CNET and transformed into a garish, excessively commercialized Web 2.0 monstrosity of significantly less value to users.' Just the same, Wales seems very enthusiastic in the Times Online article highlighting this venture."
Time will tell who is right but (Score:5, Informative)
Searching the Web is a very challenging problem (that's why few companies do it): volume of data is huge and one only appreciates value of good algorithms when faced with situation when poor algorithms make stuff run for weeks failing near the end and you have to restart the run to wait another week. You can either try to handle this very big problem, which is very hard even if you have the money (look at Amazon's A9 funded with millions, yet they licensed Google's code and database), or you can try to reduce the problem: only focus on a handful of "important" pages - Yahoo did that when they were human edited directory/search engine hybrid.
It seems to me that Mr Wales entertains the illusion that a very small number of manually checked pages in the Web space will be sufficient to satisfy vast majority (and it has got to be 98%+ as I won't be hopping from one search engine to another) of search queries. If this was the case then we would still be using Yahoo that did pretty much just that, yet almost everyone (including Yahoo) moved to algorithmic search engines because it is the only way to handle billions of pages, and billions of pages you will have to handle: even if you just index homepages of all registered domain names you will be dealing with 100 mln+ pages, that's good 20 times more than articles in Wikipedia and checking pages can be far more duller than reading nice article you have some personal interest in.
What I find ironic that our own concept of the search engine was removed from Wikipedia because we were supposedly "not noteable enough", that's the sign how they handle problem of "too much data" in Wikipedia - they just reduce the problem by reducing datasets greatly, sometimes this is done wrongly, sometimes rightly and it might well work for Wikipedia, but it sure as hell won't work for Web scale searches. Oh, and by the way who said Google and others don't use human reviewers? They sure do, just check TrustRank [wikipedia.org], this link is ranked as #1 match on Google for search TrustRank! Notice what Wikipedia tells us: "While human experts can easily identify spam, it is too expensive to evaluate manually a large number of pages."
Human input plays an important (although fairly unknown as they prefer to keep it secret) role in the state of the art search engines, however suggestion that humans can handle billions of pages and/or that a handful of pages will be sufficient for a general purpose search engine is wrong and a very backwards move that will result in exactly the kind of wrong attitude present in Wikipedia now.
Doomed to repeat it, I guess (Score:4, Informative)
An orphaned ref to Magellan, the human powered search engine [gocee.com]
Didn't work before when there were a lot less sites out there, not likely to work this time, either.
jh
A new social search is already out there. (Score:2, Informative)
Actually this is not going to happen (Score:3, Informative)
On a WMF mailing list Angela said that there was no substance to all this. I had also heard from other channels that there is not much to this.
So even though it is nice to speculate, there is not much to all this.
Thanks,
GerardM
After getting burned with CDDB, forget it (Score:4, Informative)