Forgot your password?
typodupeerror
The Internet

Wikia Search Engine to be Launched on January 7th 189

Posted by timothy
from the wisdom-of-crowds dept.
cagnol writes "The Washington Post reports that Jimmy Wales, the founder of online encyclopedia Wikipedia, has announced the launch of a new open-source search engine, Wikia Search, on January 7th, 2008. The project will allow the community to help rank search results, in a model close to Wikipedia. However the company is a for-profit organization. This new search is supposed to challenge Google and Yahoo."
This discussion has been archived. No new comments can be posted.

Wikia Search Engine to be Launched on January 7th

Comments Filter:
  • Challenging Google? (Score:2, Interesting)

    by sykopomp (1133507) on Tuesday January 01, 2008 @10:06PM (#21878576)
    I guess that's their response to Google's Knol (http://en.wikipedia.org/wiki/Knol) Pity to see things heat up between the 'good guys'.
  • by Bombula (670389) on Tuesday January 01, 2008 @10:09PM (#21878600)
    Since this project would seem to depend on the participation and good-will of users in order to work, my guess is that a nonprofit version will follow shortly afterwards, paralleling the open-source model. I also predict that without the benefit of a massive Microsoft-esque head start, the for-profit version will be put of business in short order.
  • Re:Easily Abused? (Score:5, Interesting)

    by jrothwell97 (968062) <jonathan.notroswell@com> on Tuesday January 01, 2008 @10:15PM (#21878624) Homepage Journal

    Point well made - while spam attacks may be pretty obvious, they could be spread out over time to make them less obvious.

    Additionally, I can see this search engine being very much affected by public mood. For example, say there was a royal death and a certain right-wing 'upmarket' tabloid newspaper [dailyexpress.co.uk] decided to claim that it was a conspiracy by the Government to kill the royal off. This is linked to from said newspaper's web site, and this people improve its ranking. Therefore it floats to the top of the results pile, thus giving it more exposure and setting off a vicious cycle.

    Just a hypothetical situation, but certainly possible. Such a model would also make it possible to carry out smear attacks and to ruin the rankings of competing companies, parties, organisations, whatever - a practice that IMHO should be left to search engine admins.

  • Re:Easily Abused? (Score:3, Interesting)

    by Kjella (173770) on Tuesday January 01, 2008 @11:04PM (#21878884) Homepage

    it's completely at the whims of whoever created it and that's the problem.
    Funny, I prefer it to be under control of someone that's in the business of making good search results rathar than a bunch of wankers/trolls/bots trying to lure me to their site even though there's a hundred others that would be more relevant to my search.
  • by Anonymous Coward on Tuesday January 01, 2008 @11:24PM (#21879010)
    200-400 boxes can handle the crawl/processing/indexing for the current 'important' parts of the web similar to current google crawling. Its handling the query load/replicated availability anywhere near what google does that would scale that to a few 1000 of boxes around the world. Its hard to tell how much storage is required for adwords or all of googles non-web searches and projects. I figure that is where most of their tons of capacity really goes to now.
  • by Odiumjunkie (926074) on Tuesday January 01, 2008 @11:33PM (#21879038) Journal
    I completely agree. I am continually amazed at how good google's input-correction is - if I do a search for 'pale gire', it knows to correct it to 'pale fire [wikipedia.org]', yet if I do a search for 'canadian gire', it's clever enough to work out that I mean 'canadian tire [wikipedia.org]'. I'm also continually amazed that people running other search services haven't yet realised just how vital this feature is - it's probably one of my favourite things about Google. Less so for monosyllables, but it's useful for words like "monosyllables". I'm particulary surprised that prominent online dictionaries don't have similar funcionality, seeing as I would imagine a large portion of their usage is to find the correct spelling of words.
  • by ThreeGigs (239452) on Tuesday January 01, 2008 @11:44PM (#21879086)
    It looks like you've entered some sort of partnership with Grub http://www.grub.org/ [grub.org].
    If so, kudos... Grub's been languishing in not-ready-for-primetime land for far too long, and the ability to crawl your own site to keep results current is a bonus, too.
  • by TaoPhoenix (980487) <TaoPhoenix@yahoo.com> on Wednesday January 02, 2008 @12:18AM (#21879284) Journal
    There have only been two fundamental revenue models of content for 25 years now - EndUser and Advertiser. The ISP's went through the throes of the switch from PerHour to FlatRate in the 1990's, and the RIAA is struggling with it now.

    I don't know anyone who would "pay to search" casual queries. There are some professional databases which do operate on this principle for high powered content.

    From the RIAA threads we learn people don't want to pay as endusers for their content. The post above asks about the advertiser model.

    The absolutely tough part about Free Open Source models is that it takes a MUCH longer cycle for the benefits to wind around the social benefit cycle. The monthly rent/mortgage whips around much sooner. The first person to absolutely nail this problem will be the mogul of the 2010 decade.

  • Mod Parent (Score:3, Interesting)

    by Anonymous Coward on Wednesday January 02, 2008 @12:55AM (#21879480)
    As trollish as parent is perhaps, he is unfortunately speaking a trollish truth.

    Speaking explicitly as a reader of slashdot, with all the group-think biases a site like this introduces, wikipedia is floundering in a mire of their own arrogance, and the dissatisfaction with this needs to be heard.
  • by Stan Vassilev (939229) on Wednesday January 02, 2008 @12:58AM (#21879494)
    Wikipedia receives most of its traffic from its articles appearing in Google's search results, Wikipedia being relevant content, and Google being the top search engine.

    How is Wikipedia to draw traffic to their search engine? Obviously not via Google, as search engines are content free on their own. Integrating it with Wikipedia? But again, Wikipedia is the end target, not a start point, so how could this work.

    I don't think Wikipedia has the strategy or money for this to reach critical mass and show its potential, but it'll be interesting as an experiment.
  • Re:Easily Abused? (Score:2, Interesting)

    by timothy (36799) on Wednesday January 02, 2008 @01:00AM (#21879512) Homepage Journal
    Hey, what would you say to another Slashdot interview so you could answer more questions at greater leisure? :)

    timothy
  • by jwales (97533) on Wednesday January 02, 2008 @01:46AM (#21879696) Homepage
    "You operate under the sham of an open community, yet exclude those outside a very narrow political agenda. Your a fraud, using open source principals as a smokescreen that presents your personal world-view set as fact to the world."

    Actually, no. Wikipedia can be criticized on a lot of grounds, some of them even valid :-); but that it presents my personal-world view or that we exclude people outside a narrow political agenda is just... not grounded in fact.

    Perhaps you'd like to come to my talk page at Wikipedia and tell me what you're upset about.
  • by Baricom (763970) on Wednesday January 02, 2008 @02:59AM (#21880004)
    Google's mentioned a variety of techniques publicly, although there's sure to be some secret sauce as well. The most obvious check would be a dictionary-based spellchecker. They can also look for letter transpositions, misstruck keys, word-form matching, etc.

    They also do a variety of statistical analysis on a ridiculously large data set. For example, if a particular phrase appears over and over again, and all of the words in the query match the phrase save one, it may be more likely that the non-matching word is incorrect.

    Google often (always?) tracks click-throughs on search pages, so it would be able to deduce the accuracy of its suggestion by seeing if a user clicks-through to a given result, and doesn't come back to the search results. Also, Google does correlation between different terms that often appear frequently together.

    It's amazing what kind of stats you can do with a workforce full of Ph.D.s and half a million servers :)
  • Re:What a joke... (Score:3, Interesting)

    by jwales (97533) on Wednesday January 02, 2008 @03:36AM (#21880162) Homepage
    My response? That you are misleading people.

    There are a huge number of sites in the interwiki linnk map:
    http://meta.wikimedia.org/wiki/Interwiki_map [wikimedia.org]

    Including for example, uhm, slashdot. And Citizendium. And Merriam-Webster.

    And finally, I have nothing to do with the list. I've never edited it, never asked anyone to edit it, and I have no input into what goes on it.

    I am sure you will apologize for spreading this information. Right?

"Pull the wool over your own eyes!" -- J.R. "Bob" Dobbs

Working...