P2P Web searches 80
prostoalex writes "Researchers at UCLA are looking for easier ways to implement Web searches by using peer-to-peer techniques to decrease the workload. 'Queries need to be passed along only a few links rather than flooded throughout the network, which keeps search-related traffic low,' reports Technology Research News."
Too many people trying to use p2p (Score:1, Interesting)
Re:Too many people trying to use p2p (Score:1)
Re:Too many people trying to use p2p (Score:4, Insightful)
If you'd take some time to actually read the article, you'd see that the story is about research that addresses congestion problems with existing p2p methods.
Besides, much if not most traffic from p2p networks is from file downloads, not query routing. Moving files to a centralized server isn't going to reduce that traffic at all. In fact, the bottlenecks that result can make congestion even worse.
Moving files to central servers only seems to help congestion because central servers with anything interesting to download tend to be shut down quickly.
Re:Too many people trying to use p2p (Score:2)
Which is why I still prefer walking everywhere, using chalk and slate for taking notes, and refuse to use a zipper...
In other words, wtf?!?!
Re:Too many people trying to use p2p (Score:2, Funny)
And why I like using my RPN calculator to change the TV station...
Re:Too many people trying to use p2p (Score:2)
Re:Too many people trying to use p2p (Score:1)
Zippers are obsolescent, you insensitive Luddite.
KFG
Re:Too many people trying to use p2p (Score:2, Interesting)
the point is that new technologies are adopted when they improve on an existing method. we already have super-fast, super robust, complete search technologies that are not p2p
google is already so fast as i would not notice it is if were any faster. the best a p2p search technology could achieve would be equivalent speed with the addition of the consumption of my bandwidth.
Re:Too many people trying to use p2p (Score:2, Interesting)
The magic of p2p is that you can build the same out of 'thin air'. There are no expensive server rooms and gigabit lines but just a bunch of nodes that are slightly more complicated than simple clients. You use it, you provide it. Fair game and you get exactly the kind of service you want without strings attached. At least theoretically -
Re:Too many people trying to use p2p (Score:5, Insightful)
You're right, but consider this:
The entertainment industry is trying very hard to convince the US government that all P2P can be used for is copyright infringement, so it should be banned completely.
Any non-infringing use obviously proves them wrong, no matter how out there it is.
Right now, I think we need as many off-the-wall uses as possible for P2P, even if it's not the most efficient way to accomplish the task.
Calling mass attention to these uses wouldn't hurt, either.
Re:Too many people trying to use p2p (Score:1)
I think proper cache is enough for web search, as long as you have a farm with tons of memory holding up everything, the search result can be fairly fast.
Re:BOOM! (Score:1)
You tricked me into RTFAing!!!
What P2P search offers [Re: BOOM!] (Score:2)
--
Try Nuggets [mynuggets.net], the mobile search engine. We answer your questions via SMS, across the UK.
If it's P2P... (Score:5, Insightful)
Re:If it's P2P... (Score:4, Interesting)
> In this last step, all of the initially queried
> nodes percolate the query throughout the network
> so that the query is guaranteed to reach a core
> sub-network of highly-connected nodes. "Since a
> copy of the query is in one of the nodes in the
> core network, and since the content list of a
> node is cached at one of these high-degree
> nodes, one is guaranteed to find the content as
> long as at least one node in the network has
> it," said Roychowdhury.
So in other words, the "major sharers", i.e. nodes which are "high degree", i.e. have a lot of connections, form the "core network", and collectively host the entire index. However, this is starting to lose the advantages of being a peer-to-peer network. Obviously, you can't have it both ways.
I foresee.. (Score:5, Interesting)
Save the server load on the main google server!
**Plus maybe some smart guy will figure out how to trade mp3s over the GoOgLe-P2p network!
Re:I foresee.. (Score:5, Insightful)
Error 404: No such main server found.
Google is such a distributed computing network that when a single computer in a cluster fails, they've discovered that it'd cost them more to go to the broken node and repair it than the vaule of the computing resources they've lost. Google just lets such failed computers sit useless, and waits until there are enough downed computers to justify sending in the repair people.
Besides, P2P services to respond to your Google query would mean that your query would end up in the hands of a dreaded "untrusted third party", and I don't think anybody here wants all of their searches available to their next door neighbor.
Re:I foresee.. (Score:1)
Wow, rather than letting it sit there useless and depreciating, I rather they find some cheap and efficient means to just sell that machine (cheaply!) outright, and then they order a new replacement to go back into that empty pigeonhole.
Re:I foresee.. (Score:3, Informative)
Google is such a distributed computing network that when a single computer in a cluster fails, they've discovered that it'd cost them more to go to the broken node and repair it than the vaule of the computing resources they've lost.
This is nothing more than just a myth. They continually have job postings looking for Data Center Technicians, whose entire job is to crawl through their massive cluster and repair downed nodes. I should know, I interviewed for the position just a month or two ago.
Re:I foresee.. (Score:2)
Re:I foresee.. (Score:1)
One of the coolest things I've seen was a little ticker that WebCrawler used to run that was just a constant stream of random search queries other people had made. You could click on any of them as they scrolled by and it would bring up the results.
Totally anonymous, very addictive. Sad to see it gone.
Re:I foresee.. (Score:3, Informative)
me too [metaspy.com].
Re:I foresee.. (Score:3, Interesting)
Re:I foresee.. (Score:2)
I'm more worried about my next door neigbbour being able to serve up the search results!
"Google Appliance" does your intranet (Score:1)
Last time I checked, (Score:5, Funny)
Results 1 - 10 of about 6,290,000 for p2p [definition]. (0.19 seconds)
Re:Last time I checked, (Score:5, Informative)
Google's even encuraging this behavior by linking their free websearch feature with their AdSense service, and giving publishers a share of the AdWords revenue when a search that came from their site results in an ad click.
Re:Last time I checked, (Score:2, Informative)
I am just about to put a 50,000 message mailing list archive online and the search facility will be Google, which is far better than any of the other solutions I've investigated.
Re:Last time I checked, (Score:1)
I agree - many times I see a search box on a website, with no "advanced search" link and you never know how it'll work. Usually you find that (unlike Google) it'll match any word and not all of them, so you lots of really irrelevant material. You don't know what boolean operators it supports etc etc.
Another quite simple advantage of using a Google search on your website is that it's a cons
Re:Last time I checked, (Score:2)
But we need a Free search engine, so we don't depend on any big corporation to run our lives, and P2P is the way to overcome the huge cost of running a single system to serve the whole Internet.
Timing Google (Re:Last time I checked,) (Score:2)
UCLA discovers ultrapeers! (Score:5, Interesting)
Re:UCLA discovers ultrapeers! (Score:5, Interesting)
About a year ago, right before starting my senior year at UCLA, I was offered an opportunity to work on this P2P project. At the time it was called "Gnucla," and was being developed by the UCLA EE department's Complex Networks Group. I turned it down, because I had already committed to working on a p2p system in the CS department. But since in all honesty their research was more novel than ours (and my friend was in their group), I subscribed to their mailing list and kept informed on what they were doing.
What they've done isn't find a novel way of picking ultrapeers. Let's review what motivated ultrapeers -- in the beginning, there was Gnutella. Gnutella was a power-law based network. What this meant is that there was no real "topology" to it, unlike peer to peer networks that were emerging and based on Distributed Hash Tables (such as Chord [mit.edu], Pastry [microsoft.com], Kademlia [psu.edu] [on which Coral [nyu.edu] is based]). It had nice properties: a low diameter, and very resilient to attacks common on p2p networks. (Loads of peers dropping simultaneously could not partition the network, unlike, say, in Pastry -- unless they are high degree nodes.) But the big problem was that to search the network, you had to flood it. And that generated so much traffic that the network eventually tore itself apart under its own load.
So someone thought that maybe if only a few, select, high-capacity nodes participated in the power-law network, it wouldn't tear itself apart because they could handle the load. These would become the ultrapeers. The nodes that couldn't handle the demands of a flooding, power-law network would connect to ultrapeers and let the ultrapeers take note of their shared files, and handle search requests for them. Thus, when a peer searches, no peer connected to an ultrapeer ever sees the search unless they have the file being searched for, because the searching happens at a level above them. Between low-capacity nodes and ultrapeers, it's much like a client-server model. Between ultrapeers, it's still a power-law network.
But the ultrapeer network has problems in itself, so this group sought to find a way to search a power-law based network, such as Gnutella, without flooding. They exploited the fact that, in a power-law network, select nodes have very high degree connectivity. If you take a random walk on a power-law based network (meaning, starting from your own PC, randomly jump to a node connected to you, randomly jump to a node connected to that node, etc...) you'll end up at or passing through a node with very high connectivity. Thus, they were a natrual spot rendezvous point for clients wishing to share files, and clients wishing to download files. Perhaps, in this sense, they are an "ultrapeer," but we haven't separated the network into two different architectures like before. The network is still entirely power-law based, and retains all its wonderful properties.
But that's not the entire story, just the gist of it. There are other neat tricks to it... Trust me, this is really good stuff we're talking about here. They recently won Best Paper Award at the 2004 IEEE International Conference on Peer-to-Peer Computing [google.com]. (See paper here [femto.org].)
"Brunet," as they call it, is designed to be a framework for any peer-to-peer application that could exploit the percolation search outlined above. Google-like searching is just one possible approach (and perhaps a little unrealistic...). Right now I can tell you that they have a chat program in the works, and it is working well. The framework should be released when it's ready.
Please don't flood me with questions -- remember, I'm not actually in their research group
- sm
The Ask Slashdot section (Score:5, Funny)
Q: What is $search_term and how does it work?
A: A simple google search shows that $search_term is $blahblah and you use it like $this (repeated a hundred times)
Add another hundred replies about how the poster should search before submitting, and how AskSlashdot is degenerating into AskPeopleToGoogleForYou, and there you have it. P2P searching in all its glory.
islands of users (Score:3, Informative)
Re:islands of users (Score:1)
Re:islands of users (Score:2)
Exactly. They wouldn't.
Do note that I was talking about areas of users not being able to connect to anyone else. P2P is not the same as explicit IP addresses like on the web. For example, it would be a lot harder for me to get to slashdot by only clicking on links than by typing the address into my browser's bar.
In other news... (Score:2)
This was already tried... (Score:5, Interesting)
Re:This was already tried... (Score:2)
The brillient aspect of Infrasearch (later JXTASearch [jxta.org]) is that unlike most peer-to-peer search implementations, it doesn't just act like a metasearch engine, broadcasting or propagating a query to a bunch of specialized indexing nodes and then aggregating
Huh? (Score:3, Insightful)
Google, Yahoo etc of course crawl the web at large, but even if you want to throw a peer network at crawling, aren't you mitigating freshness?
What I can see is a DNS-like system for propogating metadata in to the interior of the network, and maybe a caching mechanism as a result...not sure if this is what they mean.
There already is distributed crawling (Score:3, Interesting)
An alternative idea for complete indexing.... (Score:5, Interesting)
Every website has DNS servers so what if that same company that ran the DNS servers indexed the pages of the sites that it hosted? Daily?
Wouldn't that then provide a complete index of the web?
Start a search and somehow get the results back through that distributed method. Haven't figure that out yet...... but if you can...
PROFIT!!!!!
Re:An alternative idea for complete indexing.... (Score:2, Interesting)
Re:An alternative idea for complete indexing.... (Score:3, Informative)
There is a mailing list for people involved with writing and running web crawlers (aka spider or robots), and several years ago there was a lot of talk about making crawling and indexing more efficient by enchancing the 'robot exclusion protocol' (i.e. robots.txt) by creating
another senseless Slashdot story title (Score:4, Interesting)
It's an ariticle describing a new p2p query routing method. Nothing more. There's already a lot of such algorithms out there. This one seems to exhibit some nice completness properties that hold in idealized scale free networks. But I'm not convinced such a theoretical property would hold in the real world. While p2p networks tend to be roughly scale free, the "roughly" and "tend to be" qualifiers are what make such theoretical properties unlikely to hold in practice.
Nice to see they plan to release some software based on the technique though.
Re:another senseless Slashdot story title (Score:1)
Re:another senseless Slashdot story title (Score:2)
It's important to get the scaling right. Many of the P2P networks out there have algorithms that scale very badly. There's way too much unnecessary P2P traffic. The earliest P2P algorithms were horribly inefficient. There's been some progress, but not enough. Kids should be able to find the latest pirated Britney Spears video in about 2 hops, without blithering all over the planet looking for it. There's probably a copy on the local cable LAN segment, after all, and that's where it should come fr
Ants p2p Impliments A Distributed search engine (Score:4, Interesting)
Ants P2P is designed to protect the identity of its users by using a series of middle-men nodes to transfer files from the source to destination. As additional security, transfers are Point to Point secured and EndPoint to EndPoint secured.
1. Distributed search Engine - Each node performs periodic random queries over the network and keeps an indexed table of the results it gets. When you do a query you will get files with or without sources. If you get files simply indexed (without a source), you can schedule the download. As soon as Ants finds a valid source, it will begin the download. This will also solve the problem of unprocessed queries. This way you will get almost all the files in the network that match your query with a single search.
http://sourceforge.net/projects/antsp2p/ [sourceforge.net]
P2P is a cheap excuse for a system.. (Score:4, Interesting)
In this case a couple of text links which may intrest me (Google refrence : check).
I don't want to have to share my bandwith with 50 other people so they can do the same. If you want to run a service, website or game server you should pay for it. Don't start passing off the bandwith bill onto us users.
Either get used to the heat (price) or get out of the kitchen (market).
Mmm, buzzwords. (Score:3, Insightful)
Step 2) Add the word 'p2p' in front of it.
Step 3) ???
Step 4) Profit
I assume Step 3) is now as simple as "show name of new product with 'p2p' in the subject and explain how its NOT related to pirating movies or music" (to increase investor confidence they're not going to get taken to town by the RIAA/MPAA), then its just sit back and watch the fat investment/grant dollars roll in!
Hehe... (Score:2)
And if the whole net gets too congested... (Score:2)
Well, actually they might be on to something as I said in a comment on a post some months ago (Why can't I peruse all my comments? (sans subscription)) and also, I noted that a p2p encrypted backup technology would be a good idea, which was then taken off and written about [pbs.org]
I said, it'll be peer to peer everything. (in this case, p2p raid, for redundancy, not performance) using certs.
No link, but anyone else read (Score:2)
Sorry, I hope that makes sense in context.
AnomicHTTPProxy (Score:1)
--http://www.anomic.de/AnomicHTT P Proxy/index.html
"If the index-sharing someday works fine, maybe the browser producer like Opera or Konqueror would like to use the p2p-se to index the browser's cache and therefore provide each user with an open-source, free
the old ideas... (Score:1)
Harvest [sourceforge.net]
BugBear
old stuff (Score:1)
basically it was a big, fat broadcast of all queries to all hosts, regardless of whether it mattered to that host or not. only very few clients could cope with the linear growing bandwith requirement. the other just "missed" the queries and so the net fragmented.
there were a lot of people who knew this.
one of the first "academic" solutions that came up (at least to my knowing) was p-grid (http://www.p-grid.org/), which uses extremely intere