Slashdot Log In
Wikia Search Engine to be Launched on January 7th
Posted by
timothy
on Tuesday January 01, @09:01PM
from the wisdom-of-crowds dept.
from the wisdom-of-crowds dept.
cagnol writes "The Washington Post reports that Jimmy Wales, the founder of online encyclopedia Wikipedia, has announced the launch of a new open-source search engine, Wikia Search, on January 7th, 2008. The project will allow the community to help rank search results, in a model close to Wikipedia. However the company is a for-profit organization. This new search is supposed to challenge Google and Yahoo."
Related Stories
[+]
Wikia Search Launches Alpha, Not Ready Yet 107 comments
babooo404 writes "Jimmy Wales' latest project, Search Wikia has launched into alpha this morning. Most reviews have been negative. The system is a 'social search' and uses the Nutch search algorithm. You can friend people along with creating profiles, and the system uses a Wikipedia-style format for 'mini articles.'"
Wikia Search Engine to be Launched on January 7th
|
Log In/Create an Account
| Top
| 189 comments
| Search Discussion
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Challenging Google? (Score:2, Interesting)
Re:Challenging Google? (Score:5, Informative)
It'd be sort of cool if we could create a search engine in a week or two to respond to Knol, but actually it takes a bit longer.
I see Larry and Sergei socially from time to time. I spoke about the search project at Google Zeigeist a few months ago. Going to a google party next month. The media loves a "fight" but really, that's just a nice story arc the press makes up. (Notice: google is not in the search business, google is in the advertising matching business. This search engine doesn't hurt that business at all, indeed it probably makes it marginally less likely we will see the emergence of a proprietary competitor to topple them.)
It is actually possible for people to just enjoy doing cool stuff without being bastards about it. People forget this sometimes, maybe due to the reputation of a certain dominant software provider.
Re:Challenging Google? (Score:4, Interesting)
If so, kudos... Grub's been languishing in not-ready-for-primetime land for far too long, and the ability to crawl your own site to keep results current is a bonus, too.
Re:Challenging Google? (Score:5, Funny)
Re:Challenging Google? (Score:5, Funny)
Oh, come on. The people who matter already know that most Linux users aren't elitist snobs.
Re:Challenging Google's Revenue Model (Score:5, Insightful)
Is content going to ever be totally free? It will be if people understand the inherent rewards of an open society. Information's negligible cost of duplication is the revolutionary model is the thing that is shattering the old models (c.f. http://homes.eff.org/~barlow/EconomyOfIdeas.html [eff.org]). Wikipedia is already doing that. As much as I'm a critic of Jimmy Wales, citizendium, etc. (with their NPOV lunacy), the system he's helped build is saving people's lives and improving quality of life in ways the old world just doesn't understand yet.
Personally, I'm hopeful that as long as we still have the Right to Read (c.f. http://www.gnu.org/philosophy/right-to-read.html [gnu.org]), we're on the path to freedom and salvation. A corporation who makes up a new "model" to take advantage of content producers isn't going to take hold anymore. There's just not a point anymore. The price of content is already quite low for common knowledge. Even if the arbiters of knowledge try to keep it from common knowledge, we can paraphrase it. The greatest risk to real productive use of our knowledge still remains Patents. Information may finally be free, but the freedom to tinker is not.
Easily Abused? (Score:5, Insightful)
Re:Easily Abused? (Score:5, Funny)
Re:Easily Abused? (Score:5, Insightful)
Re:Easily Abused? (Score:5, Interesting)
Point well made - while spam attacks may be pretty obvious, they could be spread out over time to make them less obvious.
Additionally, I can see this search engine being very much affected by public mood. For example, say there was a royal death and a certain right-wing 'upmarket' tabloid newspaper [dailyexpress.co.uk] decided to claim that it was a conspiracy by the Government to kill the royal off. This is linked to from said newspaper's web site, and this people improve its ranking. Therefore it floats to the top of the results pile, thus giving it more exposure and setting off a vicious cycle.
Just a hypothetical situation, but certainly possible. Such a model would also make it possible to carry out smear attacks and to ruin the rankings of competing companies, parties, organisations, whatever - a practice that IMHO should be left to search engine admins.
Re:Easily Abused? (Score:5, Informative)
One of the first lines of defense in the early days will be use of a community (wiki) generated whitelist [wikia.com] of sites to crawl. We will want to work outward from there, but basically the first thing is for us to assess "look, what are the most important must-have sites on the net" and crawl them. One thing that the mainstream media never seems to report very well, mostly because I think they don't get why it is important, is that we are doing everything here under free licenses. The software GPL, the data we generate under free licenses, etc. The aim here is not just to create a good search engine, but to create it and *give it all away* in a way that I think has a chance to restructure the entire search industry. Well, maybe not, maybe so, but what the hell, it'll be fun to see. :-)
Your track record says otherwise (Score:4, Insightful)
The thought that Jimmy Wales, cofounder of Wikipedia could have an open site without abuse is laughable. You operate under the sham of an open community, yet exclude those outside a very narrow political agenda. Your a fraud, using open source principals as a smokescreen that presents your personal world-view set as fact to the world. I don't buy what your selling, and I'm calling your bluff. The sad thing is that this will probably make you a fair amount of money if more people don't start to see through you.
But then the wonderful thing about leading revisionist history is you can substitute your own revisions for reality....
Re:Your track record says otherwise (Score:5, Insightful)
you mean the one that you have been documented [wikipedia.org] (and here [wikitruth.info]) not only editing, but wiping clean the edit history on, trying to bury your tracks?
The game you're playing is dirty and how dare you come here unwilling to meet us on equal ground.
Re:Easily Abused? (Score:5, Insightful)
I thought so. Your solution is already broken.
yeah (Score:4, Funny)
Not only that, Wikipedia is reporting that its marketshare has tripled in the last six months.
Market share of... (Score:5, Funny)
Re:Market share of... (Score:4, Funny)
I don't care how they arrive at a rank! (Score:5, Insightful)
Personally, I don't care how search engines rank the websites they return as long as what is returned is proper, relevant and useful.
Vandalism (Score:2)
My prediction: killed by nonprofit competition (Score:3, Interesting)
What I always wanted (Score:2)
Biased Rankings? (Score:1)
I'm glad you told me (Score:1, Funny)
first things first (Score:5, Insightful)
Re:first things first (Score:5, Insightful)
As long as I need to use google to search Wikipedia, I don't see Wikipedia creating a google killer.
Re:first things first (Score:5, Interesting)
What a joke... (Score:3, Insightful)
If you think wikpedia gets vandalized, wait until there's money involved. Wikpedia for all it's trappings, doesn't directly influence spam. But a search engine... IF, and this is a big IF, this thing becomes mainstream, having the code public will make it very easy for the bot herders to control it. The idea is simply flawed. Google is currently dealing with bot herders attempting to manipulate it's page ranks - while the idea of it being open source sounds great (well, ok it doesn't to me - I don't have the love affair with open source that most slashdotters do - I've never bought into the security myth that there's GOOD coders out there with so much free time on their hands that they are walking OTHER peoples code. I don't like doing that when I'm PAID to do it. Not too mention there just aren't that many good coders out there....but I digress) it's simply going to work right into the hands of the malware crowd - especially now that it's more organized crime than it is vandalism.
EK
Re:What a joke... (Score:5, Informative)
And, if you read the linked article, you would know that *zero* donations from Wikipedia have anything at all to do with this: Wikia is a completely separate organization.
Also don't make the classic mistake of thinking that "open source" automatically means "volunteer coders". It generally does not, and the classic FUD from the proprietary world fails to describe reality for precisely this reason.
And finally, one of the most important concepts here is that of a broad deep whitelist, which is something that I think can be done realiably and well with appropriate tools in the hands of the end users. The entire problem of bot-driven spam comes from a lack of reliable quantities of human oversight in the process. All you have to do to massively spam google is fool a computer. (Well, even then, google does a pretty damned good job of preventing massive spam though of course there are always some problems.) Pretty hard to get that nonsense by a properly organized community effort.
(But of course, the design of a community which can move things forward quickly without a lot of useless work is nontrivial.)
I can see... (Score:3, Funny)
Just in Time for the Election (Score:2)
Fix the wikipedia search! (Score:1)
Google search for Wikia (Score:1)
Someone had to do it
What a great idea! (Score:2)
Hope it works better than wikipedia's search (Score:5, Insightful)
Don't get me wrong, I like wikipedia, but their search on the site is next to worthless.
Well, good news (Score:1)
citation needed (Score:2)
Sooo.... (Score:1)
Re:Sooo.... (Score:5, Informative)
For another thing, Mahalo is "human edited" search results for the top queries, which is not a bad idea of course, but it is not intended to be a full search engine. Mahalo have indicated an interest in replacing their google search backup with our open source alternative, if we get to be good enough, which is obviously a far from foregone conclusion.
Wikia, the place to go for furry fan fiction (Score:4, Insightful)
Wikia has been something of a dud. What Wikia really does is monetize fancruft. Their big wikis are for Star [Trek|Wars|Gate|Craft], Everquest, Marvel comics, Yu-Gi-Oh, and similar subjects. They're the resting place for fan articles thrown out of Wikipedia. [wikia.com]
Wikia's search engine, based on the user demographic they have now, is going to have great coverage of furry fan fiction. [wikia.com]
There's already a good manually-updated search engine. It's called Open Directory [dmoz.org]. It's quite useful as a data source for answering the question "what is this web site about"? It tends to run months behind changes to the web, since it's manually updated. While not many people query DMOZ manually, it's used by Yahoo, Google, etc. to get some basic information about a web site.
As an example of how great Wikia search is going to be, Wales suggested searching for "Tampa hotels". [techcrunch.com] The major search engines return too many bottom-feeder reseller and directory sites for searches like that. As I point out occasionally, we've already solved that problem over at SiteTruth [sitetruth.com], which looks for business legitimacy. Type in "Tampa hotels" there and watch it push the marginal sites to the bottom of the search results. We have that one handled.
Wikipedia works because people are willing to do substantial work for free for a non-profit organization. That doesn't work for a commercial business. You can get people to write about themselves (Myspace, Facebook, etc.) but beyond that, "crowdsourcing" doesn't go very far.
Funny thing is (Score:2)
The problem here is... (Score:3, Interesting)
How is Wikipedia to draw traffic to their search engine? Obviously not via Google, as search engines are content free on their own. Integrating it with Wikipedia? But again, Wikipedia is the end target, not a start point, so how could this work.
I don't think Wikipedia has the strategy or money for this to reach critical mass and show its potential, but it'll be interesting as an experiment.
It's called relevance feedback (Score:2)
And since people are bringing up Google as competition: Google Search has an estimated retrieval accuracy* of around 10%. Not very hard to beat, except that the Internet is a rather large document set. Have you ever browsed to the 50th page of results on Google? Good. Don't.
The problem is that to give decent results an engine needs time, and people are just not prepared to wait. That's why general purpose search engines on the web try to give you the best answer on the top hit. Results deteriorate a little (next 10) then improve again (next 20) then go completely nuts as you proceed. This fits the business plan, and almost everyone is happy. Google may have superb query processing and a decent Index system, but retrieval can be made to improve a lot if, say, there is an option to wait a little and get something better. Maybe Wikia can do this. If the users who get the most "insightful" (ergo time consuming) results get their feedback weighed more heavily than the point-and-click folks, this project can be very interesting.
*accuracy is a complicated metric that involves efficiency (fraction of retrieved that is relevant) and recall (fraction of relevant that is retrieved).
Scary Implications (Score:4, Funny)
Wikipedia fails (Score:2)
Why would the search website work?
I doubt it works.
so.... (Score:2)
Promoting stuff isn't their primary business I guess.
Hope they don't do ads....
Google already does this? (Score:2)
Because of Google's anti-spam techniques this method is very hard to game. Better still, the defences are automatic. On Wikipedia, a lot of vandalism goes unnoticed for a long time because it's all mostly down to human oversight, with only basic anti-vandalism bots.
Personal preference ranking... (Score:2)
What I want is to get rid of things outside my daily context. I work in web development so when I do a search for "CSS 3 column hack" I don't want to get results related to some sport team abbreviated CSS where they have a coach whose a hack but likes to use a 3 column lineup.... stretching a but but you get the idea. More simply put I'd like to filter out results for anything that also ranks high in a variety of categories that have nothing to do with my daily tasks (fishing, fashion, first graders, fanboys... ) and it's not enough to let me use a -fishing in my search...
Also can I please remove a website from the results listing... ie: there are often high ranking sites that I've already bookmarked and read daily... I don't need to see them in my 'search' results, if i need to search them I'll use their own search or use website:www.domain.com 'keywords' (given that I'm using google for this one).
One-Upping the Least Bad (Score:2, Insightful)
Funny to read this today, after I spent a couple of hours yesterday searching Google for something that doesn't exist -- a Plucker [plkr.org] type app for the iPod Classic. What struck me was just how badly Google performed. Any search containing the word "iPod" seems to return pages upon pages of blog entries about the (long since released) iPhone. What one tends to find with a Google search are a lot of loud, content-light blog entries, popping with ads, with short dashed-off articles broken across several pages. "Relevance" in Google seems to have the most to do with activity -- posts per day per site, repeated introductory blurbs on every page, modestly-trafficed forums devoid of meaningful discussion. Google does a pretty decent job with common searches, reasonably well with obscure searches, but very badly with the rest -- the middle of the long tail.
Google rose to prominence by being the best of a pretty weak set of players. It's still only the least bad solution, and there are a lot of things it does poorly. In classic AltaVista, you could type a few words of a song in quotes and find the title and lyrics. Type a long quoted string into Google, and you're likely to come up with nothing.
If Wikia manages to best Google in any type of search I'll applaud it. Search choices beyond Google and Trying to Be Google would be most welcome.
We already have this? (Score:2, Insightful)
Wikipedia content is either right or wrong. It's not meant to be subjective, hence it can be patrolled and corrected. Now they want to apply it to subjective content; I don't see that making sense, albeit at first glance. User A is a technocrat who loves Monty Python. Hardly an isolated case. Use B is a 15yr old who likes whatever he/she likes this week. There's no "patrolling" this, except to address systematic abuse.
The concept is fine for slashdot, or any "closed" system, where the users generally share a common set of expectations. At
Expand this out to the general internet user, and the result will, of course, reflect the general focus of human society. That will be interesting, to say the least, though I'll bet $5 that anything entertainment- and religion-based will always be at the top of the results. Is that what people want? Ipso facto perhaps, but sure as hell not I.
Let's keep in mind that (no offence to anyone specific) ~80% of Americans believe in God, less than 50% subscribe to Darwin, ~30% believe in "UFOs, witches and astrology" (if you can believe this poll [physorg.com] that is). Of course, smart people believe weird things [sciam.com] too.
Add to this, that 81% [dogeatdogfilms.com] of those who have seen two or more "Police Academy" movies believe that O.J. is innocent, and you have a recipe for disaster.
Make search usable and teach people to use it (Score:1)
Compete with Google and Yahoo? When pigs fly! (Score:2, Insightful)
Re:Search Engine based in Wiki? (Score:1)
Re:the worst idea ever (Score:1)
...but there's a big difference between "knowing the secret" and actually being able to break it. A "the secret" to breaking RSA is factoring really big numbers, but you can't actually do that.
It sounds like the "secret" to breaking this new system, like Wikipedia, would be to overwhelm the community that is guarding the data. We know that Wikipedia is working fine (for the most part), but things get a bit more complicated with search. Wikipedia, at least, knows when every single edit occurs. But with a whitelist or "reputation" list of URLs, there's no notification when domains (or subdomains or such) change hands (I think?), and re-vetting too often is probably untenable. And you don't really want results based upon the URL's reliability of staying on the whitelist, people might want relevance based on the most-recent data right now, sometimes even if it might disappear under a nasty registration/subscription barrier in a week.
But we'll see whether Wales is onto something good here, I guess. :)
Re:Why does everything have to be communist (Score:1)