Nutch: An Open Source Search Engine

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

Nutch: An Open Source Search Engine 291

Posted by Hemos on Wednesday August 13, 2003 @04:51PM from the but-will-it-matter dept.

Anonymous Coward writes "Someone forwarded me this site working to create an open source search engine called Nutch. In the age of weighted rankings on search engines for profits, there's an obvious need for an unbiased search engine. After all, isn't a search engine supposed to be for finding relevant data, not as an indirect and sometimes slimy method of advertising? Nutch is clearly in their intial stages, but it would certainly get my vote." You can find the project on SF.net, and also read the Business 2.0 article on it.

This discussion has been archived. No new comments can be posted.

Nutch: An Open Source Search Engine

Load All Comments

Search 291 Comments Log In/Create an Account

Comments Filter:

Patents. (Score:5, Interesting)

by Christopher Thomas ( 11717 ) writes: on Wednesday August 13, 2003 @04:52PM (#6689370)

I hope the authours of this project do their homework. My impression is that most of the good search and indexing schemes have already been patented, which will make it difficult to release such a project without stepping on someone's toes.

Share
twitter facebook
- Re:Patents. (Score:2, Interesting)
  
  by socrates32 ( 650558 ) writes:
  
  "most of the good search and indexing schemes have already been patented" Not at all... just the easy ones.
  If this is to be cheap to run, it will probably have to be distributed, and thus a very different architecture than most of what we've seen up to now.
  - - - Re:Patents. (Score:2, Informative)
        
        by alwayslurking ( 555708 ) writes:
        
        I still don't think you can describe google's setup as distributed. They have multiple data centers each running a very large cluster and containing a similar, but not identical, snapshot of the database, indices, etc. A truly distributed engine is likely to require an innovative step or three to emulate that with no centralised control, unknown hardware and bandwidth resources and the real possibility that some "clients" may be corrupted by their owners to distort results. I haven't got any arguments abou
- Re:Patents. (Score:5, Insightful)
  
  by Feztaa ( 633745 ) writes: on Wednesday August 13, 2003 @05:23PM (#6689678) Homepage
  
  I hope the authours of this project do their homework. My impression is that most of the good search and indexing schemes have already been patented, which will make it difficult to release such a project without stepping on someone's toes.
  
  Hmmm, I just realized something... with patents, you end up stepping on people's toes. Without patents, you get to stand on their shoulders. Which do you think is the better vantage point?
  
  Parent Share
  twitter facebook
  - Re:Patents. (Score:2, Insightful)
    
    by SpaceCadetTrav ( 641261 ) writes:
    
    Depends... are you the one standing on the top or the bottom?
    - Re:Patents. (Score:5, Insightful)
      
      by AstroDrabb ( 534369 ) writes: on Wednesday August 13, 2003 @08:31PM (#6690956)
      
      Does it matter? There are no innovations. ALL knowledge is based on prior knowlegde. Look in any field of study and you will soon learn that advancement is not possible without prior knowledge. What we know about computer science today is thanks to the knowledge gained by those before us. It is this way in EVERY field, Astronomy, Medical Science, Mathmatics, etc. Humankind does not grow by leaps and bounds, we grow by incremental improvements. I have not heard of ONE discovery/innovation in which the discovery/innovator was not educated in prior knowledge. Now the question we need to ask ourselves, and especially the government is do we really want the advancement of our society to be hindered by monetary interests of the greedy?
      
      Parent Share
      twitter facebook
  - Re:Patents. (Score:2)
    
    by mblase ( 200735 ) writes:
    
    with patents, you end up stepping on people's toes. Without patents, you get to stand on their shoulders.
    
    On the other hand, if you're the one didn't get the patent, you stand the risk of being crushed when too many people show up for a free piggyback ride.
  - Re:Patents. (Score:4, Insightful)
    
    by X ( 1235 ) writes: <x@xman.org> on Wednesday August 13, 2003 @05:52PM (#6689926) Homepage Journal
    
    In practice you may be right, but the intent of patents is the reverse. The key thing to think about is that without patents there is an incentive to keep ideas secret. So, you end up standing *beside* people until the idea comes out. If something gets patented, it is public knowledge, and you can stand on the person's shoulders so long as you pay them a "small" fee. Even without their consent you can do research that takes advantage of the knowledge in the patent.
    
    Of course, in practice patents are a mess. ;-)
    
    Parent Share
    twitter facebook
- - Re:Lucene (index and search engine) (Score:5, Informative)
    
    by cpeterso ( 19082 ) writes: on Wednesday August 13, 2003 @05:45PM (#6689877) Homepage
    
    Lucene and Nutch are related:
    
    http://scriptingnews.userland.com/2003/08/13#When: 12:20:53PM [userland.com]
    
    Paul Nakada, via email: "It appears that the coding muscle for Nutch is Doug Cutting, the author of Lucene, an Apache Project open source search engine. We use it here at salesforce and have a huge amount of respect for Doug's coding."
    
    Parent Share
    twitter facebook
The purpose of a search engine (Score:2)

by Stalemate ( 105992 ) writes:

After all, isn't a search engine supposed to be for finding relevant data, not as an indirect and sometimes slimy method of advertising?

I'm pretty sure a search engine is supposed to be for whatever purpose the people making it want it to be.
- Re:The purpose of a search engine (Score:3, Funny)
  
  by yamcha666 ( 519244 ) writes:
  
  I'm pretty sure a search engine is supposed to be for whatever purpose the people making it want it to be.
  
  And I'm sure many Slashdotters would love a search engine dedicated to find pr0n and anti-Microsoft propaganda. Right?
  - Re:The purpose of a search engine (Score:4, Funny)
    
    by AVryhof ( 142320 ) writes: <amos@NospaM.vryhofresearch.com> on Wednesday August 13, 2003 @05:12PM (#6689558) Homepage
    
    Here you go.
    
    Porn [sublimedirectory.com]
    
    Anti-Microsoft Propoganda. [slashdot.org]
    
    Parent Share
    twitter facebook
Google? (Score:5, Informative)

by devphaeton ( 695736 ) writes: on Wednesday August 13, 2003 @04:54PM (#6689378)

Last i heard google still doesn't accept bribes for page ranking.

inobtrusive adverts on the right hand column nonwithstanding.

Share
twitter facebook
- Re:Google? (Score:2)
  
  by billstr78 ( 535271 ) writes:
  
  Yeah, and the last I heard: this was the only search engine anyone used.
- Re:Google? (Score:4, Insightful)
  
  by delcielo ( 217760 ) writes: on Wednesday August 13, 2003 @05:06PM (#6689500) Journal
  
  I have to agree. And I don't see my allegiance to Google as a sell-out. I see it as a reward for good work.
  
  Parent Share
  twitter facebook
- Re:Google? (Score:2)
  
  by capedgirardeau ( 531367 ) writes:
  
  Maybe they dont take money, I can't really say for sure, but they do adjust the rankings for some pages.
  
  They have been called on it before as I recall and refused to reveal what their criteria was for when they would manually adjust a page's rank.
  
  Read their support pages, no where do they say they do not manually adjust the page ranks.
  
  But they are still the best thing in town.
  - Re:Google? (Score:3, Informative)
    
    by fireboy1919 ( 257783 ) writes:
    
    Yeah, they been known to do that when people make server farms to attempt to influence the rankings of google. It is in their best interest to ensure that the pages that people actually want to see come up first, not the advertisers pages.
    
    That's why people use google. If they stacked the deck supporting places people don't care about - advertisers pages, for instance, then we'd all jump ship and use another search engine.
    
    They're like the Swiss and Consumer Reports. Part of the reason they make money is
- Anyone ever heard of grub? (Score:2, Informative)
  
  by nadadogg ( 652178 ) writes:
  
  Grub is another open-source search engine, I have the client running right now, its nice and distributed, I think this kind of idea is great.
- Re:Google? (Score:3, Interesting)
  
  by g1zmo ( 315166 ) writes:
  
  See this article [msn.com] on slate for some interesting ideas on why Google's page-ranking system is being undermined due to the evolution of ecommerce and price-comparing portals.
  - Re:Google? (Score:3, Informative)
    
    by RedWizzard ( 192002 ) writes:
    
    See this article on slate for some interesting ideas on why Google's page-ranking system is being undermined due to the evolution of ecommerce and price-comparing portals.
    
    That article has already been dealt with on Slashdot (here [slashdot.org]). Using a bit of intelligence when searching will avoid the problems cited.
Slimey adverts? (Score:3, Insightful)

by Acidic_Diarrhea ( 641390 ) writes: on Wednesday August 13, 2003 @04:54PM (#6689384) Homepage Journal

Yes, having advertising affecting search results is not good for the end user but (and I'm just bringing this up as a discussion topic), in what other ways can a search engine make money? It's clear that running a search engine has costs associated with it. To offset these costs, it seems like advertising is the only way to go. Now I can see that some search engines handle this in a more "slimey" way than others (I am happy with Google) but this project seems to want to avoid advertising at all costs. Where does the money come from then?
Also of note is that companies can still influence search engines in slimey ways - Google can be manipulated to make a page rank higher, although Google keeps an eye on this activity and works around it.

Share
twitter facebook
- Re:Slimey adverts? (Score:3, Funny)
  
  by M.C. Hampster ( 541262 ) writes:
  
  To offset these costs, it seems like advertising is the only way to go. Now I can see that some search engines handle this in a more "slimey" way than others (I am happy with Google) but this project seems to want to avoid advertising at all costs. Where does the money come from then?
  
  You speak blasphemy! How dare you speak of such practical issues as money when talking about free software!
- Re:Slimey adverts? (Score:5, Insightful)
  
  by Anonymous Coward writes: on Wednesday August 13, 2003 @05:14PM (#6689573)
  
  This project is the SOFTWARE to run a search engine. Not a corporation that needs to generate income to justify the resources required to run the search engine.
  
  Anyone could take this source code and with enough money, challenge Google.com as the top search engine.
  
  I see this project as a competitor to shrink wrapped search engines. IE google appliance [google.ca] or maybe even Folio based products. Typically corporations have many documents that need to be indexed and searchable to their needs.
  
  I haven't seen this on the homepage but it doesn't list what content it can index. I hope it can at least index PDF's and popular Office documents.. Maybe even Media files? And what XML indexed fields? Or external metadata?
  
  Parent Share
  twitter facebook
  - Shameless plug for SWISH++ (Score:4, Informative)
    
    by pauljlucas ( 529435 ) writes: on Wednesday August 13, 2003 @11:34PM (#6692182) Homepage Journal
    
    I see this project as a competitor to shrink wrapped search engines. IE google appliance or maybe even Folio based products. Typically corporations have many documents that need to be indexed and searchable to their needs.
    
    SWISH++ [mac.com] fills this niche nicely. It can index hundreds of thousands of documents very quickly, indexes not only HTML, but e-mail, news, man pages, LaTeX, RTF, and even the ID3 tags of MP3 files; can apply filters on-the-fly (convert PDF to text, then index that), can do incremental indexing, and can run as a multi-threaded search daemon.
    
    Parent Share
    twitter facebook
- Re:Slimey adverts? (Score:2, Insightful)
  
  by Blue Lozenge ( 444566 ) * writes:
  
  Yes, having advertising affecting search results is not good for the end user but (and I'm just bringing this up as a discussion topic), in what other ways can a search engine make money?
  Uhh... how about having advertising that does not affect search results. [google.com] You see... ads on google are relevant to your search criteria, yet are separate from the results.
  - Re:Slimey adverts? (Score:2)
    
    by Steven Blanchley ( 655585 ) writes:
    
    You see... ads on google are relevant to your search criteria, yet are separate from the results.
    More and more the first part of that is becoming untrue. I searched for "hiccups" about an hour ago and got an ad for eBay. I'm starting to tune out the ads completely, since they're no longer relevant.
- Re:Slimey adverts? (Score:2)
  
  by mblase ( 200735 ) writes:
  
  Yes, having advertising affecting search results is not good for the end user but (and I'm just bringing this up as a discussion topic), in what other ways can a search engine make money.
  
  Indeed. Open source is great when you're talking about just software. A web-based search engine, however, involves a LOT of hardware and bandwidth as well, all of which cost mucho bucks.
  
  The only other option I can see is for the search software to be open and for miscellaneous companies to take it (for free) and build t
- Advertising != Manipulating the rankings (Score:3)
  
  by alexhmit01 ( 104757 ) writes:
  
  On Google.com, it is VERY clear what are paid ads and what are "real" results. With MSN, for example, they list Featured Site (you pay MSN), followed by Overture (you pay per click), following by the Looksmart Directory Listings (used to just pay for submission, for the past year, Looksmart charges $0.15/click for those results).
  
  After the "paid" listings come the Inktomi listings. Those crawler based listings include PFI (pay for inclusion, you pay for daily spidering, but no "boost" in rankings) and the
Biased listings (Score:5, Insightful)

by Champaign ( 307086 ) writes: on Wednesday August 13, 2003 @04:55PM (#6689392) Homepage Journal

I think many commercial search engines have learned that biasing themselves to sites who have paid them is a good way to errode consumer confidence, and damage their readership/userbase. Just as newspapers have to at least provide the image of objectivity, the same demands are on search engines.

I'm quite comfortable with how Google does this (present commercial links clearly marked to the side), and am not convinced a non-commercial (open source) alternative is needed.

Share
twitter facebook
just don't get it (Score:4, Insightful)

by Astrorunner ( 316100 ) writes: on Wednesday August 13, 2003 @04:55PM (#6689401) Journal

I think that you absolutely have to have a closed source algorithm for ranking pages, because otherwise you'll get people who will simply tune their pages to be high on the list. I can see how making the majority of the search engine open source would be beneficial, but the algorithm itself? Its like saying "Here's the keys to my car" and thinking that, because everyone has access to the keys, no one's going to drive away with it. Sure, everyone has the opportunity to make your search engine better, but never underestimate the tenacity of a web-wanna-be-millionaire.

Share
twitter facebook
- Re:just don't get it (Score:5, Insightful)
  
  by cduffy ( 652 ) writes: <charles+slashdot@dyfis.net> on Wednesday August 13, 2003 @05:03PM (#6689475)
  
  Think about cryptosystems: The whole point about the really good ones is that you can know the algorithm, but still not break it. Granted, pulling that off for a search engine is prone to be much, much harder -- but I *do* believe it's well within the realm of possibility. Ambitious in the extreme? Certainly... but there's something to be said for high-risk-high-reward projects.
  
  Parent Share
  twitter facebook
If it's like every other SourceForge project... (Score:2, Insightful)

by realmolo ( 574068 ) writes:

Here's what I expect to see on the webpage in a few months: "Currently Nutch is in the alpha stage- it doesn't index any web pages, doesn't return any results, and has no user interface. Programmer's needed!" Google has WON the search engine war, probably forever. Find some other mountain to climb, guys.
- Search engine game is NOT over (Score:5, Insightful)
  
  by AtariAmarok ( 451306 ) writes: on Wednesday August 13, 2003 @05:00PM (#6689449)
  
  "Google has WON the search engine war, probably forever. Find some other mountain to climb, guys."
  
  At one time, Oldsmobile won the auto company wars. Where are they now?
  
  IBM ruled the PC roost. Hmmmm....
  
  Command-line OS's were king. But now???
  
  Altavista and infoseek and Lycos were search engine kings at one time. Whither this trio?
  
  The point is, it is not over.
  
  Parent Share
  twitter facebook
- I wouldn't count on it (Score:3, Informative)
  
  by Wesley Felter ( 138342 ) writes:
  
  Nutch has four developers, one of whom is Doug Cutting [sourceforge.net] who wrote several indexing engines. They count Alexa founder Brewster Kahle as a "friend" and are sponsored by Overture.
Accuracy is relevance (Score:3, Informative)

by AtariAmarok ( 451306 ) writes: on Wednesday August 13, 2003 @04:56PM (#6689407)

To me, accuracy is the most important "Relevance".

The problem with Google is that there are errors in it: you ask for something and sometimes you get something else.

A search on "to be or not to be" produces an error (non-matching results) in three of the first ten results: a 30% search failure rate. It used to be worse, when most of the links were bad.

Since it seems like Google will never fix this problem, I'm looking forward to something with all of Google's great features, plus accuracy.

Share
twitter facebook
- Re:Accuracy is relevance (Score:4, Informative)
  
  by binaryDigit ( 557647 ) writes: on Wednesday August 13, 2003 @05:06PM (#6689499)
  
  A search on "to be or not to be" produces an error (non-matching results) in three of the first ten results: a 30% search failure rate. It used to be worse, when most of the links were bad.
  
  This is a bit of a misrepesentation. Google will toss the words 'to' 'be' and 'or'. So you effectively end up searching on 'not'. It does this to eliminate words that show up to frequently and make the searches faster (and the overloading of the word 'or'). If you really want that text, then either quote the whole thing, or place a '+' in front of those words, which will give you exactly what you're looking for. So there is no problem with it's acurracy when you understand the proper way to ask it for something.
  
  Parent Share
  twitter facebook
  - Re:Accuracy is relevance (Score:2)
    
    by antibryce ( 124264 ) writes:
    
    Thank you for pointing that out. It seems most people when pointing out problems with google are really just highlighting their lack of understanding of how it works.
    
    Imagine if I complained that Linux needed lots more work because when I'm at the command line I get an error from typing "move my email inbox to the floppy disk."
  - That's the problem (Score:2)
    
    by AtariAmarok ( 451306 ) writes:
    
    "Google will toss the words 'to' 'be' and 'or'."
    
    That is the problem. The reason I put such words in phrases is because I want an exact match.
    
    " It does this to eliminate words that show up to frequently and make the searches faster"
    
    I would hope that Google solves this by getting faster servers, instead of producing bad results. Besides, if I did not want the results to include all the words in the phrase, I would not have included them in the phrase in the first place.
    
    " If you really want that text,
    - - Details. (Score:2)
        
        by AtariAmarok ( 451306 ) writes:
        
        I use www.google.com (not www.goohle.com or gogle!).
        
        The third result is a site with bee cartoons. It contains "2Bee", etc. Close, but not a match. (The word referring to that insect was not in my search request).
        
        Link 9 goes to a book at Amazon called "Or Not To Be". That partial phrase appears throughout the link. However, the entire phrase that I asked for does not appear.
        
        Link 10 is to the papermsce site. It contains no funny but false variations on the phrase, nor any fragments lerger than "to be" fo
  - Re:Accuracy is relevance (Score:2)
    
    by WTFmonkey ( 652603 ) writes:
    
    Well, if you slap some double-quotes around it (which I'm assuming is what was intended), you get accurate, but maybe not what you were [probably] looking for.
    
    The first link is about Barium Enemas, I shit you not.
    The second is about BeOS, and the third is some randomass link at funbrain.com.
    In the fourth we finally get some Shakepeare.
    Point is, these are all links that "capitalized" on the "to be or not to be" cliche and so are accurate results. Although, probably not what you were looking for. Next tim
    - They were not accurate. (Score:2)
      
      by AtariAmarok ( 451306 ) writes:
      
      Three of these top 10 links were not accurate results. I searched on the phrase "to be or not to be", not variations or mispellings. Phrases that capitalize on it, but do not match it, are close (but not accurate matches).
      - Re:They were not accurate. (Score:2)
        
        by WTFmonkey ( 652603 ) writes:
        
        The first to were exact matches. Look at the source. That phrase appears no less than four times. In a shakespeare play, it only appears once. By that criteria, the first two were actually more accurate. You're right about that "bee" thing, that was just weird. But the first Shakespeare link is just as inaccurate, because the only place it contains that phrase is in the URL--and even there, without spaces. BUT, you would still consider that a positive search result, if you were blindly looking for "t
        
        You are right: 40% error rate. (Score:2)
        
        by AtariAmarok ( 451306 ) writes:
        
        You are right: one more of these was a bad result. That's 40% error.
- Re:Accuracy is relevance (Score:2)
  
  by ColdGrits ( 204506 ) writes:
  
  Funny, I just tried a search on "To be or not to be" and of the first 10 results, all 10 were related to the phrase "To be or not to be".
  
  You did put "" round the phrase, didn't you?
- - Re:Accuracy is relevance (Score:2)
    
    by AtariAmarok ( 451306 ) writes:
    
    " Why is a non-zero failure rate such an abominable thing? At some times, maybe finding something you weren't expecting is a positive."
    
    If you reach into the freezer without really looking, thinking that you are grabbing a freezer-pop, and get an 8 month old leg of lamb instead, are you going to shrug and eat the lamb anyway?
    
    " Why is a non-zero failure rate such an abominable thing? "
    
    Come to think of it, I have to ask. Which development team has Steve Ballmer assigned you to?
    - Re:Accuracy is relevance (Score:4, Informative)
      
      by randyest ( 589159 ) writes: on Wednesday August 13, 2003 @06:42PM (#6690276) Homepage
      
      If you reach into the freezer without really looking, thinking that you are grabbing a freezer-pop, and get an 8 month old leg of lamb instead, are you going to shrug and eat the lamb anyway?
      
      Of course not. I'd put it back and try more carefully to get what I want. I, what's the word I'm looking for, . . . wait for it . . . refine my search :)
      
      Regarding your comments above about google inaccuracy: I searched for +"to be or not to be" [google.com] and consider the first page of 10 hits to definitely be 100% "correct". In fact, all of the 104,00 results that I checked (about 50, hehe) are 100% correct in that the sites on the list, or the sites linking to the sites on the list, contain the phrase "to be or not to be". Check the '2bee or nottoobee' link in google's cache and where you normally see the search term highlight colors, you'll see
      
      These terms only appear in links pointing to this page: to be or not to be
      
      Just because you wanted "Shakespeare" doesn't mean that "Shakespeare" is any more correct as an "answer" to "to be or not to be". If it were more popular (on the web), I'm confident that it would be higher on the list. That is, whether we like it or not, on the current www there are exactly 3 things more relevant to that famous phrase than Shakespeare, and they are, in order: barium enemas, beOS, and a kids' grammar game starring a bee. Or, more acurately and revealingly: an article about barium enemas titled "To BE or Not to BE?", an article about BeOS titled "TO Be OR NOT TO be?", and a kids' grammar game starring a bee called "2Bee or Nottoobee" which is linked to by sites containing the phrase "to be or not to be" in or near those links.
      
      Lucky for us that ol' Bill is still in the top 10 at all, I'd say.
      
      Parent Share
      twitter facebook
Seems pretty pointless (Score:5, Insightful)

by cryptochrome ( 303529 ) writes: on Wednesday August 13, 2003 @04:57PM (#6689419) Journal

Free and open code is good and all... but the one real cost of a search engine is RUNNING it. It requires a far from trivial amount bandwidth and hardware, and somebody has to pay for all of it. Unless someone comes up with a novel P2P solution (and many are trying) it just won't happen.

What they should be doing is pressuring the existing search engine companies for some integrity.

Share
twitter facebook
- Re:Seems pretty pointless (Score:2, Interesting)
  
  by jawtheshark ( 198669 ) * writes:
  
  Yes, that would hold true if you want to index the WWW. But what about indexing an intranet? Now businesses are paying Google for indexing servers (not that I think it is bad), but an Opensource searchengine could save costs for medium sized businesses. Just toss in another Quad Xeon with a few Gigs of RAM and it will do fine for a normal intranet.
Forget It. (Score:2)

by Boss, Pointy Haired ( 537010 ) writes:

In the commercial Internet, the mechanism by which you find commercial sites must be
paid for

by the sites which you find, otherwise basic economics breaks down and it will not work (abuse etc.).

Thousands of companies provide $product - free search engines simply direct all users to one supplier of $product. That's not right.

Searching for a supplier of $product is not like searching for information - it is not something that can be done outside of payment by the supplier of $product.
- Re:Forget It. (Score:2)
  
  by mblase ( 200735 ) writes:
  
  Thousands of companies provide $product - free search engines simply direct all users to one supplier of $product. That's not right.
  
  Neither is that sentence. One subject per verb, please?
Nutch? (Score:2)

by burgburgburg ( 574866 ) writes:

Acronym, non-obvious pun, obscure reference?
The FAQ doesn't explain the name.
- Re:Nutch? (Score:3, Funny)
  
  by qwerty823 ( 126234 ) writes:
  
  who knows... but as soon as they get it working, they can use it to search for a better name!
- The answer is "Nutch"... (Score:2)
  
  by Gudlyf ( 544445 ) writes:
  
  *Opens sealed envelope*
  The question is: "What did Sean Connery say when he saw the reviews for 'League of Extraordinary Gentlemen?"
  - SNL Celebrity Jeopardy Quote (Score:2)
    
    by burgburgburg ( 574866 ) writes:
    
    [Connery brogue]
    Hey Trebek, tell your mother I had a good time last night.
    You suck, Trebek. I hate you and your ass.
    [/Connery brogue]
that business2.0 article.. (Score:2)

by joeldg ( 518249 ) writes:

it reads more like some strange marketing propaganda than anything.

That project has no releases, has nothing in cvs and very scant details on what it even "is" ..

There are many many projects out there with so much more info available, why is this one that has not released anything getting so much attention?
- - Re:that business2.0 article.. (Score:2)
    
    by joeldg ( 518249 ) writes:
    
    well..
    
    then I stand corrected.
    
    sf cvs must have hiccuped when I went to look.
not a good idea.... (Score:4, Interesting)

by edrugtrader ( 442064 ) writes: on Wednesday August 13, 2003 @04:59PM (#6689435) Homepage

google is already ideal... the weight of search results is not sold, just text ads.

people are already 'googlebombing' to try and get better rankings by signing up tons of domains and cross linking them all with the keyword that they want to be #1...

if the algorithm that determined how #1 is determined was public, then the best possible strategy to cheat the system could be demised... instead of paying for weight to the search engines you would be paying to web developers to make the search engine think you were #1. and as a web developer i feel that.... oh... wait, proceed.

Share
twitter facebook
- The funny thing is (Score:2)
  
  by Omkar ( 618823 ) writes:
  
  My /. page [slashdot.org] became the #1 result for "Omkar" before I posted a single journal article. Google is great, but as this illustrates, it's certainly not infalliable.
- Re:not a good idea.... (Score:3, Informative)
  
  by curunir ( 98273 ) * writes:
  
  You've entirely missed the point of this project.
  
  I highly doubt that Nutch is going to offer an alternative to Google in the area of web search. What they seem to be doing is offering an alternative in the area of Enterprise search.
  
  Currently, the company that I work for pays Verity (used to be Inktomi, before that Infoseek) tens of thousands of dollars a year for the use of their software. We use their software to make our own site searchable. If Nutch offered us a free alternative to our Ultraseek ser
Bandwidth Costs (Score:2)

by NDPTAL85 ( 260093 ) writes:

Who's going to pay for them if its a non-profit open source project? Bandwidth doesn't grow on trees you know.

And slimy adverts? Google has slimy adverts? I thought they only had relevant adverts? Oh well I guess we need another dot.com that will go bust in 6 months or so.
Can this work? (Score:5, Insightful)

by jmkaza ( 173878 ) writes: on Wednesday August 13, 2003 @05:00PM (#6689444)

I think the idea is good in principle, but could it actually succeed? Google gets hit with millions of request each day. They've got hardware that can support thousands of slashdottings a day and a fat pipe to feed all of that info out. That takes alot of money. Financing an open source project is difficult enough, but financing an open source service such as that would seem next to impossible. Ideas?

The other major problem would be that, with the ranking criteria being available for all to see, it would be relatively simple to manipulate page rankings.

Share
twitter facebook
- Re:Can this work? (Score:3, Insightful)
  
  by casio282 ( 468834 ) writes:
  
  I think it's a fabulous idea, the kind of idea that make me slap my head and say "why didn't I think of this?" You're right -- the biggest obstacle to producing a truly free (as in speech, natch) search engine solution is not in producing the software (patent minefield notwithstanding), but in the "physical" costs of hardware and bandwidth.
  
  I think to way to overcome this obstacle is to develop a distributed system...run a nutch node on your server, host a few GBs of index data. There could be master nodes
- Re:Can this work? (Score:2)
  
  by Wesley Felter ( 138342 ) writes:
  
  Nutch is not a service; it's just the software. Running it is up to you.
A Tough Challenge (Score:5, Interesting)

by Cloudmark ( 309003 ) writes: on Wednesday August 13, 2003 @05:01PM (#6689455) Homepage

One of the biggest issues with running a search-engine, open-source or otherwise, is that you can't eliminate bias in the results. No matter what scheme you put in place to handle rankings, someone will find a way to take advantage of it. It's a fact of any major system - there's always a way to twist it. Part of the challenge that Google and similar sites face is that they have to work constantly to protect themselves from systems designed to take advantage of their algorithm. While a completely unbiased search service would be nice, I think it would require the impossible. It would require that no one out here took advantage of it to further their own interests, be they political, commercial, or otherwise. That's fairly unlikely.

With most of the major engines today including Google, they make an effort to prevent horribly unbalanced results (recent controversy over blogs outweighing professional sites in the rankings due to linking and other factors). Some even admit (again, Google does) to manually messing with the rankings a little. If you search for suicide methods, they will bend the engine to make sure you get reasons why you shouldn't commit suicide before you get the how-to. That's in their own public docs. It's also discussed in Wired.

I honestly don't know if open-source could do a better job. The algorithm might be better (likely, given the manpower), but would it really be that much fairer?

Share
twitter facebook
- Re:A Tough Challenge (Score:2)
  
  by rmohr02 ( 208447 ) writes:
  
  In fact, since the algorithm would be completely open, it would probably be easier to subvert. I'm sure Google has enough trouble working against people who guess at their algorithms, so you could imagine the trouble when people know the algorithm. Then again, many of the people who attempt to subvert search engines are probably fans of open source, and, as you said, there might be more manpower to work against them. Merely comparing open and closed search engines, it's a hard sell either way, but in thi
- Re:A Tough Challenge (Score:2)
  
  by PMuse ( 320639 ) writes:
  
  there's an obvious need for an unbiased search engine
  
  Umm, all search engines are biased. That is, each must choose a way to present results. Not to mention a way to acquire data and a way to compare criteria to the data. Trying to "eliminate" bias is futile. What searchers need is to know what the bias of a search engine is. Then they can decide whether that engine will serve for their task. Then they can know what "the results" mean.
  
  A program that calculates "averages" might return median, mode, m
- Re:A Tough Challenge (Score:2)
  
  by PMuse ( 320639 ) writes:
  
  After all, isn't a search engine supposed to be for finding relevant data, not as an indirect and sometimes slimy method of advertising
  
  True enough -- a search engine that gives you results based on how much entrants paid for placement is good only for finding companies who paid a lot for placement.
  
  Of course, sometimes that's what you're looking for -- ever notice that large, full-service businesses often have large, full-color ads in the print yellow pages, while use of a cheap basic listing correlates w
Business 2.0 is paid access only (Score:2)

by prostoalex ( 308614 ) writes:

To read the second page of this article use subscriber code 079751240X.

Go to "Magazine subscribers: Enter here", then "Sign in using the account number on your subscription label" and enter the account number above.

Courtesy of TechDirt.com [techdirt.com]
Nutch will never get out of alpha stage (Score:2, Insightful)

by xannik ( 534808 ) writes:

I fail to see the point of such an endeavor. Without advertising Nutch can not possibly hope to become a serious contender with search engines such as google or overture. Advertising provides the money that enables search engines to have lots of bandwith to send those results quickly back to users, lots of computing power to quickly process each search, even the ability to hire people to research into new areas for better search results. Even if the search engine is selling its resources to other portals l
Are they thinking too big? (Score:3, Insightful)

by xanderwilson ( 662093 ) writes: on Wednesday August 13, 2003 @05:03PM (#6689472) Homepage

I think they're setting themselves up for something that will get too big and too expensive before it can get finished, and they'll have to figure out a way to (gasp) get some funding beyond donations.

I don't see a solution in one great open-source, independent search engine, but many individual specialized search engines, each mastering their own niche area of specialty stands a chance to compete, especially if run by people who focus on their areas of expertise. Alternative news search engines, music search engines, literary search engines, etc. each run by people who know what to filter in and out.

If Nutch.org could create the technology that would allow each of these search engines to exist autonomously, it could also be the hub/portal/start-page/blahblahblah that links all these engines and databases together.

Alex.

Share
twitter facebook
A suggestion that Google adopted (Score:2)

by afflatus_com ( 121694 ) writes:

I wrote to Google some time back with an algorithm suggestion that was adopted by them. It is certainly welcome to an open source search engine. It is a minor improvement, but every bit helps.
For citations of most websites, some of the citing people will link to http://www.someplace.com, and some will link to http://someplace.com.
Therefore, include a comparison of the pages returned by each query, and if they are the same page returned, then summate the reverse citations to calculate their total rank.
Distributed Open Search Network (Score:2, Interesting)

by Massacrifice ( 249974 ) writes:

It'd be nice if they could make distributed. Kinda like P2P search engines, but for the web. That way, the main searching server farm wouldn't be tied to any company in particular. That would give Google a run for their money, and would keep Microsoft at bay for another while.

Being open an open search network, some peer servers could specialize in searching what they're hosting, making it possible to index otherwise dynamically generated content. These specialized hosts would act as "search plugins" for so
The answer is "Nutch" (Score:4, Funny)

by Gudlyf ( 544445 ) writes: <<moc.ketsilaer> <ta> <fyldug>> on Wednesday August 13, 2003 @05:11PM (#6689548) Homepage Journal

*Blows open envelope*
The answer: "What did Sean Connery say when he saw the reviews for 'League of Extraordinary Gentlemen?"

Share
twitter facebook
Well (Score:2)

by CausticWindow ( 632215 ) writes:

It's not the technology that prevents thousands of google clones to pop up. It's the simple fact that to initially succeed, you need either a lot of cash or heavy backers.

It't not like Google's pagerank is so unique that it's impossible to do better any other way. It's just that 1) you have to do better or equal, 2) people have to know about you.

Point 2 equals lot of cash.
Unbiased Searching is Absurd or Useless (Score:2)

by smack.addict ( 116174 ) writes:

An unbiased search engine is completely useless. In short, an unbiased search engine would either list results randomly or according to useless biases such as alphabetical listings.
Any useful search engine will have an algorithm for ranking page relevance. Because search engine placement is so important to business, there will always be people out there who attempt to optimize (and in some cases, abuse) their pages to boost search engine ranking.
The most useful search engine is the one whose biases matc
- Unbiased is good enough for me (Score:2)
  
  by AtariAmarok ( 451306 ) writes:
  
  " An unbiased search engine is completely useless."
  
  Unbiased is fine for me. When I search, I am just looking for matches. That is all. I don't care so much about ranking decisions as long as the search produces accurate results. (that is, words or phrases found in the resulting documents).
What about the hardware? (Score:2)

by foo fighter ( 151863 ) writes:

How are they going to afford the massive hardware and bandwidth costs associated with running a tier 1 search engine?
Let's check out the credits page... (Score:3, Interesting)

by baggachipz ( 686602 ) writes: on Wednesday August 13, 2003 @05:22PM (#6689654)

Ooh, what's this?

Overture Research has donated hardware and helped to fund development.

So, even an "open source," "unbiased" search engine is funded by a commercial search organization.

Share
twitter facebook
funding (Score:3, Interesting)

by bindaaas ( 659754 ) writes: on Wednesday August 13, 2003 @05:27PM (#6689712) Homepage

let's see where is the funding coming from. Project is funded by overture [overture.com] which is to be bought by Yahoo [yahoo.com]. More info is here [corporate-ir.net]. Hmm.. So i guess Yahoo needs a revival...

Share
twitter facebook
Distributed (Score:2)

by verloren ( 523497 ) writes:

I'll take the 'A's, hands up who wants to work on the 'B's...

Cheers, Paul
Can I contribute to the source code? (Score:2)

by mblase ( 200735 ) writes:

for (i=0; i<intMaxSearchResults; i++) { if (searchResultURL.host="www.myfavoritedomain.com") intSearchRanking = 1; else intSearchRanking = 1000; }
Distributing the Power (Score:4, Interesting)

by FsG ( 648587 ) writes: on Wednesday August 13, 2003 @05:39PM (#6689823)

I think having an open source search engine that people can modify and deploy would be an excellent thing, and here is why. Currently, google has the complete power to highlight or censor anything on the web. So far, they have used this power wisely, but that's no guarantee that it'll always be so. If they go public, you may find this power being used to increase the shareholders' wealth, rather than in the highest standards of fairness as it is today.

With that in mind, how would this project help? It would allow webmasters to quickly & easily modify it for their needs, and deploy their own niche engines; in other words, Google would be supplemented by 10,000 niche search engines, each focusing on a specific field (microsoft propaganda, for instance). This would create a balance of power, ensuring that no single search engine accumulates an insane amount of control over the web as a whole.

Share
twitter facebook
Here is the Google cache for it ;) (Score:2, Funny)

by evil-osm ( 203438 ) writes:

Nutch [google.ca]
Bias: Inevitable (Score:3, Insightful)

by handy_vandal ( 606174 ) writes: on Wednesday August 13, 2003 @05:42PM (#6689848) Homepage Journal

"In the age of weighted rankings on search engines for profits, there's an obvious need for an unbiased search engine."

Bias is inevitable -- we're talking about ranking, which necessarily means bias.

The question is: what bias do you want? What bias suits your purposes?

My ideal search engine would offer a variety of biases from which to pick.

Share
twitter facebook
Beowulf Cluster of Server Farms (Score:2)

by Superfreaker ( 581067 ) writes:

I can't see this OS project getting too much traction. One quickly realizes when setting out to build a search engine, that it takes a ton of computing power in the means of pipe, drive space, and database space. I found out the hard way.

It may be fun for some small intranet stuff though....
Search Engine Monoculture (Score:5, Interesting)

by peachawat ( 466977 ) writes: on Wednesday August 13, 2003 @05:52PM (#6689929) Journal

Why is it that when it comes to OS, everyone is bitching and screaming how bad monoculture created by Microsoft Windows is, but otherwise feeling warm and fuzzy and swear to god Google is and always be the only search engine they use?

The point is, are you really comfortable to have one, and only one, effective search engine? No matter how well it searches?

O'Reilly [userland.com] put it best :

Actually, Nutch has no ambitions to dethrone Google. It's just trying to provide an open source reference implementation of search to help keep Google and other search engines honest, by letting people compare the results of an engine whose algorithms and methodologies are transparent and accessible. It also aims to give a platform for people outside of the search heavyweights to research new search algorithms.

Share
twitter facebook
Irrational fear of money (Score:4, Informative)

by KalvinB ( 205500 ) writes: on Wednesday August 13, 2003 @06:02PM (#6689996) Homepage

That's nice that they want to open source the engine but that's the least of a search engine. They're going to need multiple high end servers to process the searches and plenty of bandwidth to get the results to the users.

How do they plan to pay for that? Apparently advertising is out. And we just had another monephobe complaining about lack of funds for his accounting software who expected people to donate because he couldn't figure out that maybe, just maybe he should find a way to sell his product in some form while also keeping one form free. I can get RedHat for free OR pay money to get a hard copy with some bonus stuff. Net result is that RedHat makes money and everyone is happy. Those who refuse to pay don't have to and those who are willing to pay have a reason to. Most people are not going to just give you money out of the goodness of their heart and accept nothing in return if they don't have to. Why do you think PBS gives you gifts with your donations?

I'd be more impressed with such undertakings if the owners weren't convinced the bandwidth fairy was real and that money will fall from the sky like mana.

When someone comes along who recognizes that the bandwidth fairy doesn't exist and that money needs to be aquired through marketing to get any real amount then I'll think twice before laughing it off.

Free is a pretty dream but free don't pay the bills.

Ben

Share
twitter facebook
Large scale and DB (Score:3, Interesting)

by webhat ( 558203 ) writes: <{slashdot} {at} {specialbrands.net}> on Wednesday August 13, 2003 @08:38PM (#6690995) Homepage Journal

I was looking over the site and a number of things concerned me.

Firstly the choice of Java, personally I have no gripe about this. And reading that a choice was made to use language-independent formats is a good idea. My main concern is for the larger scaling and distribution over multiple machines.
At present I make the educated guess that a project on this scale, in Java, would still be best run on a `hardware base as uniform as possible', like UltraSparc 450's with a fibre back-plain.

My second concern is that there is so much choice of indexing and searching technique that there are sure to be some problem due to Patent restrictions.
Just browsing the US patent office gave me a couple of possible Patent nasties;
6,463,428 or 6,278,992. (And about 10 others I glanced at...)

Lastly DB, in the short time I've been looking at the code it seems to me that a choice was made to implement a DB build for the problem. Although this could be a good thing, it is usually better to reuse existing products. I found SleepyCat (DB4) to match the requirements. And if the choice is final read this. [1]

I hope these comments are useful to somebody at least.

[1] http://www.xlnt-software.com/xml_dl.html

Share
twitter facebook
Some commentary... (Score:3, Insightful)

by Colm Buckley ( 589428 ) writes: <colm@tuatha.org> on Wednesday August 13, 2003 @08:39PM (#6691008) Homepage
I have a few comments on this development:
- The article as posted contains some pretty snide commentary, apparently designed to intimate that all current search engines deliberately weight their results in favour of their advertisers. This is demonstrably not the case; in fact, with Google providing a strong, well-publicised counterexample, to do so would be suicide for any search engine with pretentions to market leadership.
- The principal difficulty with an open-source search engine algorithm is that it would definitively be open to abuse. Once the ranking algorithm was known, it would be fairly trivial to develop ways to subvert it. One of the reasons why this hasn't happened to Google is because the details of the ranking algorithm are closed. There is a largish industry devoted to figuring out how to influence Google (which is why Google keep tweaking their algorithm). A search engine using an open algorithm would very quickly become unusable as this industry figured out how to play the system.
- The funding from Overture is very suspicious, to be honest. Overture, assuming the Yahoo! takeover is given the all-clear, will soon be part of one of the largest commercial search engines, and with a history of business practices which are, shall we say, perhaps less than totally congruent with the open-source ideals.
- Running a large, successful search engine requires vast, dedicated resources. I don't know the exact scale of the Yahoo!, Google or MSN search operations, but I'll warrant that they're surprising to anyone who's expecting to run a search engine from a couple of thousand distributed nodes.
An open search engine application is a nice idea, but unfortunately it's one of those applications which are essentially useless without an enormous ASP architecture behind it. An earlier poster indicated that it might be useful for searching and indexing intranets and the like, analogously to the Google Search Appliance. This is indeed a valid potential application, but then, HT://Dig exists already. Is this dramatically better?
Share
twitter facebook
Comments and suggestions... (Score:3, Insightful)

by rice_burners_suck ( 243660 ) writes: on Wednesday August 13, 2003 @09:25PM (#6691341)
Suppose you have just finished developing a free software search engine. And suppose it has the best algorithms in the world and the ratings are weighted based on some sort of moderation system.
This is exactly like the problem the mice had one day. They couldn't come out of their mouse hole because there was a dangerous cat prowling around. One day, as food was getting scarce and everyone was afraid to leave the hole, the mice called a meeting to discuss the problem. One excited young mouse came up with the most wonderful idea: Let's put a bell around the cat's neck, so that when the cat is nearby, the mice would have advance warning and could escape! All the mice got excited at this proposal, until a very old, very wise mouse came over and asked, "And who will tie the bell around the cat's neck?"
What I'm trying to say is: If the search engine is free software and companies don't pay to increase their ranking... who will pay for the bandwidth to host the engine? I can tell you this much:
- Individuals will not pay a fee to perform a search unless this search engine gives them some incredibly compelling reasons to do so. Open moderation will not likely fulfill that requirement.
- Companies will not pay to increase their ranking because that is the definition of this project. They will not pay to search for the same reason that individuals won't pay.
- The government probably won't pay because there are plenty of "free" (cost) search engine around. That is, unless someone can give them an incredibly compelling reason to do so.
- Universities probably won't pay for the same reasons as everyone else.
Proposed solution? Make it a distributed search engine, like SETI@home, or the DNS.
This is much easier said than done because:
1. RAID-like distributed storage technology would have to be developed, so that the indexing database could be distributed among all computers worldwide that donate bandwidth and storage. This would have to guarantee statistically that all the data will be available at any point in time even if people turn off their computers for extended periods of time. However, this technology could make reliable clustered storage a reality, and the resulting free software implementation could be licensed for corporate use for an exhorbitant price, which would go to the EFF, FSF and other organizations that develop free software and/or support the development thereof.
2. An efficient P2P-like protocol, along with a network topology of some sort (like the DNS system has) would have to be developed to support the searching; It would have to be damn fast and, like before, very resiliant to computers being shut off, chunks of data becoming lost at any moment, etc. Furthermore, changes would need to propogate at blazing speeds so that new items on the Internet could be found shortly after appearing.
3. Bandwidth and disk quota would need to be managed at each participating host, so that limits set by the user are not exceeded.
Governments, companies, universities and individuals would likely support an effort like this by donating some bandwidth and storage, rather than money.
In the spirit of worldwide computing on the Internet, I hope this makes some amount of sense.
Share
twitter facebook
- - Re:Hook it up to slashdot! (Score:3, Informative)
    
    by Anonymous Coward writes:
    
    Just use google. Search for "SEARCH-STRING site:slashdot.org"
    - Re:Hook it up to slashdot! (Score:3, Insightful)
      
      by Steven Blanchley ( 655585 ) writes:
      
      No, many comments don't end up getting indexed by Google, and recent discussions aren't indexed at all. I've tried that method in the past with little success.
    - Re:Hook it up to slashdot! (Score:4, Informative)
      
      by randyest ( 589159 ) writes: on Wednesday August 13, 2003 @06:13PM (#6690072) Homepage
      
      167 posts and no mention of ht://dig [htdig.org]? It's a great open source search engine, and I've been using it daily (well, cron really uses it now, not me) to spider about 100 sites on my intranet, which has servers all over the world.
      
      While not currently designed for massive whole-web spidering (it's aimed at single websites or intranets), ht://dig is a great starting point (and a lot further along than the Nutch 'nascent effort' mentioned in the story). Some database optimization to ht://dig seems easier than starting over with Nutch. Plus, the name 'Nutch' sucks.
      
      Parent Share
      twitter facebook
    - Re:Hook it up to slashdot! (Score:4, Informative)
      
      by lvdrproject ( 626577 ) writes: on Wednesday August 13, 2003 @08:34PM (#6690972) Homepage
      
      Interestingly enough, if i had read this story a few months ago, i would've said "Poppycock! Google should be good enough for anyone!". But lately i've been noticing that Google turns up a lot of garbage results. Like, if you search for something "generic" (like, no brand name or product name or anything like that), you're going to find a whole bunch of results that just lead to pop-up search sites.
      For example, look at the results [google.com] for the search 'convert wmv mpeg'. The first three results lead to the same exact search site. (Whether they have pop-ups or not, i can't tell, because i block them.) The fourth result is another search site. And then the last three are the same as the first three.
      Of course, this obviously works with stuff you'd expect it to, like 'mp3s' and 'warez' and 'porn', but it works with legitimate stuff too. I wonder if there'll be anything to combat this trend, whether it be implemented by Google or by someone else....
      
      Parent Share
      twitter facebook
  - Re:Hook it up to slashdot! (Score:2)
    
    by SpaceCadetTrav ( 641261 ) writes:
    
    Piece of cake, but you'd probably have to make an investment in to some proprietary software, which is forbidden here.
- Re:Not making nutch sense (Score:3, Funny)
  
  by AtariAmarok ( 451306 ) writes:
  
  Don't worry. It is just a stepping stone to full project maturity reached when it is fully coded in Borland Turbo Pascal.
- Hardware and Bandwidth (Score:2)
  
  by metalhed77 ( 250273 ) writes:
  
  according to http://www.nutch.org/docs/credits.html the Internet Archive is hosting nutch, and Overture has given them hardware. Sounds pretty sweet. Probably not the 20,000 strong linux cluster google has going though.
- Re:Hardware? (Score:2, Informative)
  
  by AsparagusChallenge ( 611475 ) writes:
  
  Don't worry too much. This is software, not a service. When available it may be implemented by someone and be the infrastructure of a company, which may then provide bugfixes and development to the original project. Or it may not. Who knows.
- Re:Seems like /. (Score:2)
  
  by qtp ( 461286 ) writes:
  
  You mean like the moderators are going to do to the rest of your posts?
- Re:"written in Java" ?!? -- trashcan. Next ? (Score:3, Interesting)
  
  by forkboy ( 8644 ) writes:
  
  That's all they're teaching the kids in college these days. Seriously. At the school I go to (i'm not a CS major) you have to take C/C++ as an elective. The core CS curriculum is all Java. I don't think they even teach assembly there. Good schools are probably different of course, but who can afford good school anymore?
  
  I met someone the other day who had an an associates in Computer Science from a community college and had never used anything but an AS/400 and a Mac. (Not even Windows! Seriously!)

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Patents. (Score:5, Interesting)

Re:Patents. (Score:2, Interesting)

Re:Patents. (Score:2, Informative)

Re:Patents. (Score:5, Insightful)

Re:Patents. (Score:2, Insightful)

Re:Patents. (Score:5, Insightful)

Re:Patents. (Score:2)

Re:Patents. (Score:4, Insightful)

Re:Lucene (index and search engine) (Score:5, Informative)

The purpose of a search engine (Score:2)

Re:The purpose of a search engine (Score:3, Funny)

Re:The purpose of a search engine (Score:4, Funny)

Google? (Score:5, Informative)

Re:Google? (Score:2)

Re:Google? (Score:4, Insightful)

Re:Google? (Score:2)

Re:Google? (Score:3, Informative)

Anyone ever heard of grub? (Score:2, Informative)

Re:Google? (Score:3, Interesting)

Re:Google? (Score:3, Informative)

Slimey adverts? (Score:3, Insightful)

Re:Slimey adverts? (Score:3, Funny)

Re:Slimey adverts? (Score:5, Insightful)

Shameless plug for SWISH++ (Score:4, Informative)

Re:Slimey adverts? (Score:2, Insightful)

Re:Slimey adverts? (Score:2)

Re:Slimey adverts? (Score:2)

Advertising != Manipulating the rankings (Score:3)

Biased listings (Score:5, Insightful)

just don't get it (Score:4, Insightful)

Re:just don't get it (Score:5, Insightful)

If it's like every other SourceForge project... (Score:2, Insightful)

Search engine game is NOT over (Score:5, Insightful)

I wouldn't count on it (Score:3, Informative)

Accuracy is relevance (Score:3, Informative)

Re:Accuracy is relevance (Score:4, Informative)

Re:Accuracy is relevance (Score:2)

That's the problem (Score:2)

Details. (Score:2)

Re:Accuracy is relevance (Score:2)

They were not accurate. (Score:2)

Re:They were not accurate. (Score:2)

You are right: 40% error rate. (Score:2)

Re:Accuracy is relevance (Score:2)

Re:Accuracy is relevance (Score:2)

Re:Accuracy is relevance (Score:4, Informative)

Seems pretty pointless (Score:5, Insightful)

Re:Seems pretty pointless (Score:2, Interesting)

Forget It. (Score:2)

Re:Forget It. (Score:2)

Nutch? (Score:2)

Re:Nutch? (Score:3, Funny)

The answer is "Nutch"... (Score:2)

SNL Celebrity Jeopardy Quote (Score:2)

that business2.0 article.. (Score:2)

Re:that business2.0 article.. (Score:2)

not a good idea.... (Score:4, Interesting)

The funny thing is (Score:2)

Re:not a good idea.... (Score:3, Informative)

Bandwidth Costs (Score:2)

Can this work? (Score:5, Insightful)

Re:Can this work? (Score:3, Insightful)

Re:Can this work? (Score:2)

A Tough Challenge (Score:5, Interesting)

Re:A Tough Challenge (Score:2)

Re:A Tough Challenge (Score:2)

Re:A Tough Challenge (Score:2)

Business 2.0 is paid access only (Score:2)

Nutch will never get out of alpha stage (Score:2, Insightful)

Are they thinking too big? (Score:3, Insightful)

A suggestion that Google adopted (Score:2)

Distributed Open Search Network (Score:2, Interesting)

The answer is "Nutch" (Score:4, Funny)

Well (Score:2)

Unbiased Searching is Absurd or Useless (Score:2)

Unbiased is good enough for me (Score:2)

What about the hardware? (Score:2)

Let's check out the credits page... (Score:3, Interesting)

funding (Score:3, Interesting)

Distributed (Score:2)