Forgot your password?
typodupeerror

Comment Re:Distributed effort ? (Score 2) 86

Ummmm - no patent - its already been done.

The main problem, as other posters have commented in doing anything like this in a co-operative fashion is the large commercial value of the results. It also requires those taking part to have a significant amount of bandwidth (to pull in all of the content and then to exchange indexes).

The spidering part of the process is one of the least processor intensive - once you've completed it you're left with a large glob of data. You then need to convert that into an inverted index, which would still be large and then need passing to a central server, which would then have to do further processing in order to actually merge it in to the whole.

The Harvest Indexing system (http://www.tardis.ed.ac.uk/harvest) sought to develop a system like this. It seperated the searching and crawling tasks, so it would be possible to have a large number of crawlers (probably topologically close to the sites they were indexing), which then gave their results to an indexing system which collated them and presented them to the world.

The problem here is that you've still got one large, monolothic system at the indexing end. TERENA, as part of the TF-CHIC project developed a referral system (based on WHOIS+) to allow there to be one central gateway which then passed search requests to a large number of individual engines, each of which could run different software. Kind of like a fancy metasearch engine.

Originally the plan for devolving things locally was that if the indexes were generated by people who know the pages, then you'll get a higher standard of index. Aliweb, for instance, had a file per server which contained an index of all of the objects available on that server.

The problem with this is easily shown up by metatag abuse. If the person running the spider has a commerical interest in the sites they're indexing, they'll often go and fabricate the index so that their sites appear higher on searches.

Cheers.

Simon.

Slashdot Top Deals

All power corrupts, but we need electricity.

Working...