Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
The Internet

"Random Walkers" may speed P2P networks 129

sean23007 writes "New Scientist posts an article about an innovative new method of controlling P2P traffic to maximize speed over a very large network. The idea, thought up by researchers at Princeton, Berkeley, AT&T, and Cisco, involves sending random "walkers" around the network, looking for a particular file, which would theoretically yield much better search speed than such other networks as Gnutella. They claim this could result in a network very capable of facilitating a massive distributed supercomputer."
This discussion has been archived. No new comments can be posted.

"Random Walkers" may speed P2P networks

Comments Filter:
  • by Space Coyote ( 413320 ) on Saturday July 06, 2002 @11:36AM (#3833094) Homepage
    It shouldn't be too difficult to build a system like this that could operate over something like gnutella. You could monitor the search queries and record the most popular requests, then you could use the random walker technique to search out the best sources for these requested files. Anybody else running the same software could enter in a search query and have it look first in the list of popular items before doing the standard crawl-type search over the gnutella network.
    • I still want to see an option on GNUtella to automatically download Songs from a top10 list [billboard.com]

      Wouldn't be too hard:

      cron - schedule the script execution while I'm asleep
      wget - to download the html
      perl - to strip out the file names and write out a "wanted list"

      maybe a script would even burn CD-RWs of the top weekly songs.

      I'm sure that these songs are the ones creating the most traffic, so it is almost the same concept. I like my idea better because you don't need to manually type a bunch of stuff.

      • <flamebait type="offtopic">
        Maybe it's just my local crowd, but no software engineer I know would ever build this because the top 10 songs consistantly suck. So no one I know would ever have a need to build this.

        Scratch that. I might want to write something like this, except in reverse. I want a Gnutella client that automatically ignores all songs that have ever been on a top 10 list. Of course the lists would be updated weekly, to filter out any new crap that comes along.
        </flamebait>

        Besides, I'm partial to getting full albums, rather than singles (despite accusations of theft that will inevitably be charged against me). IMO, any band that can't make a decent dozen songs doesn't deserve my ear-time for a 3 minute single. One hit wonders (said "just about all pop artists) be damned.

        Hopefully the "walker" technology could be used to find all songs of a particular album or even artist. Since most MP3 collectors don't bother to include the album name in the filename, maybe (extending on your idea), future p2p clients could interface to a database like CDDB to obtain a list of songs in an album, and to even verify that the downloaded songs are of the correct length (give or take a few seconds). Anyone up for this challenge?
        • Check out the different lists on Billboard.com [billboard.com] Sometime. I agree with you that most of the top 40 stuff sucks (i.e. Britney Spears, Boybands, Etc.)
          I happen to prefer the Modern Rock [billboard.com] list myself.

          Perhaps what we all need is a "don't play their own instruments" Filter :-)
          Seriously though, an ignore list, or a feature like Amazon's "other customers who bought this also liked...." feature might have potential.

          One this is for sure, Hard Drive companies would love this because we'd all fill our harddrives fast.
    • Gotta love these vague "shouldn't be too difficult" statements that litter /. How many of you guys actually go and try programming this "not too difficult" stuff in your own time? Where are the results of previous "not too difficult" projects? The cynic in me says it's just idle pissing-in-the-wind speculation from wanker kids wearing "got root?" T-shirts.
  • .. why anybody is even putting any effort into this... we all know google is going to come up with something amazing, free, and fast that'll put everything else to shame
  • by StandardCell ( 589682 ) on Saturday July 06, 2002 @11:40AM (#3833114)
    "...such file networks have particular difficulty locating uncommon documents, and that walkers are especially bad at it. This is because there are relatively few walkers, which might have to visit every node on the network to find the file.

    "Langley also says the different connection speeds and processing power of the individual computers in the network may complicate the model: "Random walkers are going to end up at the well connected nodes."


    My biggest pleasure from P2P is finding obscure or rare music. I could care less how many copies of N*Sync or Britney Spears there are. Give me rare studio cuts or bootleg recordings of the Grateful Dead any day. This strategy won't help searches for those types of items, but it sure will help the sheeple get their music.
    • I'm getting moded down for this, but.... You're Coyboyneal arn't you.?.
    • "Give me rare studio cuts or bootleg recordings of the Grateful Dead any day."

      Try www.etree.org [etree.org]. Since the Grateful Dead were cool with bootlegs, there are more reliable distribution means than P2P systems.

  • by Anonymous Coward
    So if the walkers end up at the well-connected nodes most of the time, aren't those the nodes that are most likely to be able to serve files at a decent clip?

    This sounds like how much of the Internet has progressed - along with decentralization of information, is a centralization of content in a convenient form.

    btw - the optimal number of random walkers used is 42.
  • by Juhaa ( 588855 ) on Saturday July 06, 2002 @11:45AM (#3833130)
    If your on a marginally slow network (say upto a T1 (24 lines at 64 kbps)) and if you enable full node peering support on a Gnutella like network, you know why Random Walkers is needed. Cause the networks have become so huge, most of the bandwidth is taken by message searches. Even if you are not a node peer you would still be saturated by these.

    The Random Walker idea isn't new, it was tried a few times before as well on the orginal Gnutella network when they first came out. But, I don't think Random Walkers would be the answer to this network, the bigger the network gets, even Random Walkers would saturate the bandwidth and all you'd have is these bots finding the optimal path to files.

    Presently the network works much like how old routhers worked. Search packets are broadcasted to peering nodes which then broadcast the same packets off to their own neighbors. But the problem is that their own neigbors are sometimes the ones that sent the orignial packets. If this was to be done efficently (and how routers implement it), they need to create buffers to hold searches, once a search is recived a search would not be propagated to the other peers, it would be held for a given time period (say a few milliseconds), the node would then wait for the same search to appear from other neighoring nodes, it if does then it wouldnt send this search off down that node. This would cut down a lot of the wasted bandwidth and I am not sure why Gnutella ppl didn't model p2p after these routing methods.

    Walkers in an intellient routing setup would be the most optimial way of doing p2p, hopefully this would loose up some of on the congestion on the current p2p networks and let people with less bandwidth access to the shared files with the least sacrifice on bandwitdh for searches.


    • buffering searches would definately solve the problem.

      Why dont we buffer the searches and cache the route somehow, on certain computers so that when anyone else searches it goes directly down the preset buffered path?

      Ok since you have the solution, go implement it, code it!
    • How about things like.

      I'm looking for that too searching.

      Only letting nodes return results if they have an uptime of more that 10mins or so.

      Nodes caching recient searches and performance stats.

      Quick searches (hits popular search caches) and deep searches.
      • I agree different heuristics should be used on different querries. A node can be promoted up the ladder depending on the speed, the quality (see the fake mp3 problem), and the realiablity of service it provides.

        Searches need to be cached and stop and forwarding techquies need to be used instead of the current boadcast methods. We need to store the search results for a given amount of time and intelligently compare it which peers that would give the best results.

        Also we need efficent QoS methods and dynamically adapting those to the network enivornment. QoS on modem connections should be much more different to say someone on OC3 connections.

        Also as someone mentioned we need better methods of keeping track of efficent routes, I believe the whole thing lies in making the nodes behave more like routers and intelligent routers in that. We have almost limitless amount of memory for routing storage, why not use it?
    • What bothers me about p2p is the fact that 1 search = 1 packet. Every packet has overhead, a lot of it. There are people out there running (a guess) 99 searches at a time, and resending the search every few minutes/seconds.
      what about making all searches that originated from a single client take 1 packet. Add in an optional compression, so that people who need it can enable it, and you can easily go from the 100,000 user threshold to probably 500,000 users, without needing these 'walkers.' walkers are very bad for finding rare stuff. So they only help people looking for the most common stuff. but yeah, with better routing, beter packet design, gnutella would go a lot farther.
      • > What bothers me about p2p is the fact that 1 search = 1 packet.
        This is not true on the gnutella network. A search query is a packet, a gnutella packet containing a search for "britney" is 15 bytes or so. Every gnutella node passes on search querys to all their peers by putting the search querys into a TCP socket going to the peer. The computers TCP/IP stack breaks up the stream of data into packets, addresse the packets to the other computer and sends then out onto the internet. With TCP overhead, about 97 searchs for "britney" can fit in one 1500 byte TCP packets.

        Ultrapeers were reccently added to gnutella which has helped to make it slightly less horribly ineffient. A search query will usually be 1 per packet when talking to an ultrapeer but only over the one hop to the ultrapeer.

        You mention compression, I agree that compressing the search querys and result going over gnutella connections would make a big difference.Its such a good idea it's been included in version 0.6 of the protcol spec see http://groups.yahoo.com/group/the_gdf/message/7053
    • The Gnut client will drop any connections that it sees too many duplicate packets on. However, it doesn't help with the badwidth problem that I can see, since the bandwidth consumption is due to the rebroadcasting nature of the network and not necessarily the fact that you're recieving duplicate packets.
    • Walkers in an intellient routing setup would be the most optimial way of doing p2p, hopefully this would loose up some of on the congestion on the current p2p networks and let people with less bandwidth access to the shared files with the least sacrifice on bandwitdh for searches.
      Freenet [freenetproject.org] does something quite like this. Requests are routed based on local heuristic knowledge of the network, information gravitates towards the demand for that information through a form of dynamic mirroring.

      this [freenetproject.org] paper (PDF) gives a great overview.

    • Presently the network works much like how old routhers worked. Search packets are broadcasted to peering nodes which then broadcast the same packets off to their own neighbors. But the problem is that their own neigbors are sometimes the ones that sent the orignial packets. If this was to be done efficently (and how routers implement it), they need to create buffers to hold searches, once a search is recived a search would not be propagated to the other peers, it would be held for a given time period (say a few milliseconds), the node would then wait for the same search to appear from other neighoring nodes, it if does then it wouldnt send this search off down that node. This would cut down a lot of the wasted bandwidth and I am not sure why Gnutella ppl didn't model p2p after these routing methods.

      What you're essentially proposing is that each client has a hashtable for all the searches. The length of time that stores probably shouldn't be a few milliseconds. This technique can reduce traffic from the duplicates you mention (circiling around in the graph of nodes) and reduce traffic from 2 peers asking for the same search. What needs to be done is that the hashtable should also store the TTL of each search - then if the TTL of the incoming search is greater than the TTL of the hashed search, it gets sent out. Otherwise it gets propagated on, and the results received overwrite the old entry in the hashtable. This way you even improve the maximum lengths of searched.
    • the problem is that their own neigbors are sometimes the ones that sent the orignial packets
      No, that's not the problem. Your analogy to routers is incredibly inaccurate. Routers don't request routing updates every time someone uses a route... Indeed, router's functions are far different.

      The problem with Gnutella is this; each query must be sent to every node. That's a lot of traffic! The TTL can help prevent the broadcasts from going out of control, but the more effecient searches, the fewer the results that come back.

      What needs to be done to solve this problem? I think it's simple... Store lists of shared files from each node.

      Each node whould maintain a list of files it is sharing, the hash for each file, and the hash for the list itself. Each node collects the lists of every node it is connected to, periodically requests the hash, if the hash is different, requests the new list (or perhaps, like CVS, just the diff).
      So, when my node connects to another Gnutella node, that node sends me a cumulative list of all the shared files it knows about, as well as it's list of shared files. After that, my node will periodically request the hashes, to ensure nothing has changed, and can request individual host's file lists from my directly connected nodes.

      Now THAT is how Gnutella should work.

      One problem with that system, is full disclosure... Anyone can then, easilly decide who is sharing the most files, and target them. With the current Gnutella scheme, disclosure is more limited, but the bandwidth and scalability is a large price to pay for controlled, limited disclosure.

      One more thing Gnutella needs; queuing of downloads/uploads. Right now, even if you've been trying to get a file for hours, someone who's just requested it can get it first, it's just a case of requesting it when there's a download slot. I'd much rather see a FIFO queue of uploads and downloads, and a message passing system, so that each node knows it has 10 other nodes in the queue before itself.
  • by r00tarded ( 553054 ) on Saturday July 06, 2002 @11:49AM (#3833148)
    to know the path you must walk the path
  • by axehat ( 590580 )
    Yes, it does seem that p2p networks are illegal, I wonder how these people got funding. I haven't heard yet of anything that google is doing with p2p networks, but the mighty gods they are, I'm sure they will roll out something very nice.
    • Blockquoth the poster:

      Yes, it does seem that p2p networks are illegal

      Well, it may seem like they are illegal to those who can't separate the software and the content. Oh, and didn't they tell you? Driving is illegal. You might kill somebody.

      • Aha, funny man. I meant that the media / government says they are illegal. I know that transferring things that are not copyrighted is not illegal, but the media / government doesn't see it that way. Sorry, I shall have to emphasize my point better than 'it does seem', key word there being 'seem'.
        • Hmm... well p2p doesn't seem illegal to me, mostly because it isn't, so that's how I work on it.

          Anyway, the media is/are afraid of it, and so transparently so that I'd think more people would realize exactly what's going on.
  • The research paper (Score:4, Informative)

    by sitcoman ( 161336 ) on Saturday July 06, 2002 @11:53AM (#3833163)
    The paper that was presented, entitled Search and Replication in Unstructured Peer-to-Peer Networks, can be found here [princeton.edu] (top of the page)
  • by popeydotcom ( 114724 ) on Saturday July 06, 2002 @11:54AM (#3833171) Homepage
    Why let some algorithm trawl around your network when you can let some human dweeb do it for you. The problem with robots - or computers in general, is that they can be fooled [slashdot.org] into thinking something is something else.

    Personally I use Filemirrors [filemirrors.com]. It shows what people are really downloading.
  • More Q&A (Score:5, Funny)

    by sam_handelman ( 519767 ) <samuel.handelmanNO@SPAMgmail.com> on Saturday July 06, 2002 @11:55AM (#3833177) Journal
    Q: Can we use it to predict the weather?
    A: Yes.

    Q: Well, that's not bad. Can it model the behavior of biological molecules?
    A: You betcha.

    Q: Still, that's rather pedestrian. Can it find large prime numbers?
    A: Numbers so big we can't even say them.

    Q: Hmm.. almost there. Can it evaluate complex object-relational predicates to get me EXACTLY the porn I want?
    A: Er..... Yes.

    Q: Excellent! Now we're talking. From the output, can we say conclusively that all of the porn which I want has been found?
    A: Please go away.

    Q: What about porn that I don't want - gay porn, scatalogical stuff. Can you guarantee I won't get that?
    A: Fine, yes. It predicts what you want from your past behavior and is always right. Happy?

    Q: Isn't that an invasion of my privacy?
    A: Arghhh!
  • by Jeremiah Blatz ( 173527 ) on Saturday July 06, 2002 @12:02PM (#3833208) Homepage
    Ohh, bad pun, sorry.

    But really, I think the way to think about P2P downloading is to use non-interactive search agents. These guys are on the right path, the random walkers appear to be a bit brighter than an exponentially widening search.

    The problem with current p2p networks is that the database is constantly churning. It's not like the web, where data is relatively stable. Two identical searches performed within minutes of each other will return different results. The problem, of course, is that polling the network with these huge searches inflicts a massive bandwidth cost.

    A better system would be to have a search agent. You would enter a set of search criteria (with a richer syntax than the existing model), and every 30 seconds or so the software would fire off an agent. The agent does a random walk, then downloads anything that matches. You then go on about your business. When the agents have collected a sufficient number of results. it alerts you.

    This lacks the immediate satisfaction of searching and getting results right now, but when you're looking for obscure stuff, you don't get that satisfaction anyway.

    Also note that the autonomous downloading assumes you're looking for something reasonably small, like an mp3, and not Girls Gone Wild [mrcranky.com].

    Of course, it also has the advantage that if gives an incentive for hosts to remain connected longer, adding to the stability of the entire system.
    • The problem with current p2p networks is that the database is constantly churning. It's not like the web, where data is relatively stable. Two identical searches performed within minutes of each other will return different results. The problem, of course, is that polling the network with these huge searches inflicts a massive bandwidth cost.
      And to add to this (excellent) point, even where the content may be fairly stable, the IP addresses aren't. Several times I've inherited the IP of someone who's been using P2P stuff, and each time I wind up getting hammered on Gnutella and Kazaa ports for hours - sometimes days - as a result of my predecessor's filesharing. (And even if I fire up a Gnutella servent so that inbound download requests get answered with "Not Found," the fscking servents don't care and will continue to hammer away as if the file might magically appear on my HD... But that's another rant altogether.)

      So you might have a user who always shares the exact same files, and as such his "node" is static and the content is always available from him... Though if he gets a new IP address every time he reboots, any stored listing of his shared files will quickly become outdated.

      I'm not sure a random crawler would be too effective for this application. It does work for network mapping, and I believe Limewire has been doing it for some time. In order to compile a reasonably accurate list of files, though, such a crawler would need to either a) purge any entries more than an hour or so old, or b) constantly verify the validity of each result. Choice a) would give us a listing that's no more accurate than what the network's own search functionality provides, and as you mentioned, choice b) would consume an enormous amount of bandwidth. Neither option seems too appealing.

      Maybe if IPv6 ever kicks in, and IPs are reasonably static across the board, such a crawler might work; at least half of the problem would be solved.

      Shaun
  • What if walkers visiting a computer, left information about what they were searching for, and which way they went. And once found, walk back exactly the same route, to say where it was found (or not found anymore).
    This way there would be a distributed dynamic map of content, and the chance that rare things are found increases with the number of nodes that had to be visited to find it.
  • by jeffehobbs ( 419930 ) on Saturday July 06, 2002 @12:07PM (#3833234) Homepage

    It occured to me a while ago that the ability of a peer-to-peer's network to provide a particular file would go up dramatically if less emphasis was put on "real-time" searching and instead in a sort-of subscription model -- the user would request a file, and the request could propagate overnight from client to client while the network itself decided the best/most reliable way to provide that particular file to that particular user... beyond privacy concerns and TTL spoofing on requests (which are probably both non-trivial problems with this idea) I'd certainly be willing to wait a couple days to get a particular file or set of files as long as I didn't have to "babysit" the client and as long as I could be assured that the file would be full and complete when it finally arrived.

    ~jeff
    • "It occured to me a while ago that the ability of a peer-to-peer's network to provide a particular file would go up dramatically if less emphasis was put on "real-time" searching and instead in a sort-of subscription model -- the user would request a file, and the request could propagate overnight from client to client while the network itself decided the best/most reliable way to provide that particular file to that particular user... beyond privacy concerns and TTL spoofing on requests (which are probably both non-trivial problems with this idea) I'd certainly be willing to wait a couple days to get a particular file or set of files as long as I didn't have to "babysit" the client and as long as I could be assured that the file would be full and complete when it finally arrived."

      This having been the secret sauce for AudioGalaxy - the fact that you could queue your requests, that it would fetch them from whichever client had them, and that you DIDN'T HAVE TO BABYSIT THE DAMN CLIENT.

      The way they did it was to store your requests on the (easy non-moving target) central server.

      Whoever can create a p2p network duplicating this functionality will get every orphaned AG user descending upon them with shrieks of joy and large hard drives full of goodies. Walkers might be the way to implement this.

      And THAT will make the RIAA shit an entire brickworks, which is almost reason enough.

      • If you have gripes against the RIAA (and who doesn't, I do), the solution isn't to pirate their music, but to support music not associated with the RIAA.
      • WinMX [winmx.com] does this with "Autocomplete". It's my P2P client of choice. Too bad it's closed source, and Windows only.
        • "WinMX [winmx.com] does this with "Autocomplete". It's my P2P client of choice. Too bad it's closed source, and Windows only."

          Well, it doesn't quite - I've been using it for a few weeks and it still needs way too much babying not to be horribly delayed in getting anything.

          Going from AudioGalaxy to WinMX is like going from OS X to Windows 95.

    • You might like eDonkey 2000.

      While I don't know how searches work exactly, they do continue to add new results long after I send them. As for downloading files, you give it a file size and an mp4 hash, and it'll go running around the network trying to find that file. It seems to do a good job of downloading from multiple sources at once.

      I run a third party client, mldonkey, as a daemon. A couple times a day I telnet into the daemon and check on the status of my existing transfers. It's not fast, but it will eventually get the file...

      As for searching, I barely use that anymore, since there are a couple indexing web sites out there (namely http://www.sharereactor.com), that have a large index of reviews of particular files (such as video quality), and will provide you with the size/hash of that file. When you go to ed2k, you'll get exactly that file. It seeems to work pretty well.
    • If this whole "peer-to-peer" thing weren't about piracy, the problem would be easy. Just put all new music releases on Usenet as binaries. Each new song would traverse each netnews link no more than once. Major ISPs would have local netnews servers, and everybody would just download from their local server. There would be user-friendly music-oriented clients for end users, from the same people who develop "jukeboxes" now. Problem solved.

      But no, we can't do it that way. It takes orders of magnitude more bandwidth to support music "sharing" as currently implemented. What we have now is comparable in efficiency to buying groceries on eBay.

    • The network's ability to provide me with a particular file would go up dramatically if everyone else switched to my revolutionary real-time non-searching model.
    • It seems you are suggesting the requests be processed only when the network load is relatively light. But even if we don't put any time-constraint, the search still demands some crawling around and this will still eat away a significant amount of bandwidth if many people are querying/replying simultaneously(nonetheless this is more network-friendly than realtime crawling). Moreover, due to the nature of P2P network, the files exist a few days ago may not be available now. The owners can be off-line or the files were deleted. Finally, how to effectively/properly probe the network condition is a research by itself.
  • The paper is online (Score:3, Interesting)

    by jdbarillari ( 590703 ) <joseph+slashdot@barillari.org> on Saturday July 06, 2002 @12:12PM (#3833255) Homepage
    Qin Lv, one of the researchers, was a TA in one of my classes last semester. She has a website [princeton.edu], and the paper is posted there.
  • This might be easier for non-conforming members of the P2P net to disrupt (The RIAA). If everybody followed the rules, you'd be fine. But since the point of origin is contained in the walker itself, it would be trivial to crapflood the entire system with random walkers, all with different points of origin. Since random walkers are more bandwidth intensive than normal searches, you'd make everything slow as syrup.
    • Random walkers certainly should NOT be more bandwidth intensive if they are implemented like JavaSpaces(TM) which is what I suspect. Only BW is used during walking, but the searching is local, and it does not have to necessarily report home, as long as it eventually comes home.

      Problem is that this runs in the servers space so its a tax on the server computer as opposed to the client.

      • It has to know how to stop somehow.

        One way, just to have the random walker work for a set number of node changes, would A) be useless because it doesn't check enough nodes before ending, or B) be a bandwidth hog because they last so long and an attacker (or someone who just wants to search faster) can create a great many extra random walkers. Both ways would nearly useless for finding uncommon stuff, because of search replication.

        The other way is to check back home to see whether the thing has already been found.
  • How well does this work shortly after changing the directory structure? Specifically, adding new branches.
  • Are there any clients that implement this, or are planning to?
  • This paper only applies to unstructured networks like gnutella... for optimization purposes wouldn't it be much smarter to simply to create a hieararchy of nodes and cache information from lower nodes in the higher ones? This would greatly reduce message overhead since a search request to one super-node would effectively search all the subnodes.
  • Even better yet (Score:3, Interesting)

    by PureFiction ( 10256 ) on Saturday July 06, 2002 @01:00PM (#3833444)
    Use a non random walk (i.e. tuned to a specific domain or set of preferences) and let it walk the network with a lightweight UDP protocol:

    cubicmetercrystal.com/alpine/ [cubicmetercrystal.com]

    The conservative use of bandwidth with an intelligent search mechanism can provide scalable decentralized search for peer networks of any size.
    • The main advantage of intelligent walking is the ability to find obscure or unpopular data quickly once a peer group is tuned to your preferences (all done locally using implicit query feedback when interacting with peers).

      This is important as the paper mentions the problems random walkers have with regards to unpopular resources.
  • It's easy. Just give a Googlebot a large bottle of grog (I suggest Bundaberg Rum.)
    • Just give a Googlebot a large bottle of grog

      Problem is it keeps eating through my bottle before I can make it to Google.

      I've emailed Guybrush Threepwood for suggestions, he hasn't gotten back to me, something about a wedding and root beer.
  • This sound exactly like what JavaSpaces(TM) does. I'm surprised they were bold enough to say they thought of it. Unless of course they also invented JavaSpaces(TM) too and their just double dipping.

    http://java.sun.com/products/javaspaces/faqs/jsfaq .html
    Well for somereason it keeps inserting a stoopid space in the link...

    There is the FAQ though I don't think the average person will be able to understand the sameness. Plus JavaSpaces is a technology, while this thing seems to be a usage.
  • Quoth Commander Taco:
    "which would theoretically yield much better search speed than such other networks as Gnutella."

    I think a network of carrier pigeons would yield a better search speed than Gnutella. :)
  • Sandvine Incorporated has an interesting solution to this situation... Sandvine, Inc. [sandvine.com]
  • by Anonymous Coward
    Random walkers amount to little more than a way
    of slowing down the queries. Instead of asking
    everyone at once you only ask one person and they
    ask someone else - as the network gets larger this
    means that your query will take longer to propagate through the whole network. The effect is that as the network gets larger your queries will take longer to propagate through the entire network - thus leaving more resources available for other people's searches in the meantime.

    The main problem I see here is that while it's in the interests of the network for you not to flood it with requests, it's in YOUR PERSONAL interest to do so.

    I suspect that random walk queries will by MUCH slower than regular broadcast queries.

    When they say the optimum number of walkers is between 16 and 64 they mean the best number for getting resutls without fucking up the network for everyone else.

    If you don't care about fucking up the network and only want the best results for you then you send out random walkers to every node at once just like a regular flood/broadcast query :)

    The more walkers you can send out the sooner you'll get a response so there's little incentive to play by the rules.
  • by dbretton ( 242493 ) on Saturday July 06, 2002 @03:03PM (#3833994) Homepage

    I can picture it now, playing Return to Castle Wolfenstein, with Gnutella running in the background...
    Suddenly, my framerate grinds to a halt.
    "F%#!!@ Gnutella findfast"

    • I always just leave my searches "open" in Gnucleus (IMHO the best) and my search just grows and grows.

      It's great for when you are looking for seasons and not just episodes of your favorite shows to archive.

      Say for example, I want to get divx copies of the Friends episodes my girlfriend has on tape, I simply just search Friends AVI and leave it open. All the time it finds new hosts, and even gives me more places to download my existing selections from.
  • Cool idea, I'm wondering what other uses there might be for this. I do have one question though: how random is the randomness? Will there be some applied chaos theory at work (such as strange attractors)? How do you decide upon the point where the approach to pure randomness (starting from a non-random beginning) is too expensive in network terms?
  • Related Paper (Score:2, Informative)

    by chaih ( 170120 )
    The paper mentioned in the above article is available (full version) here [princeton.edu]

    Extended version here [princeton.edu]

  • by jon_c ( 100593 ) on Saturday July 06, 2002 @04:40PM (#3834304) Homepage
    Is a project called Circle [monash.edu.au]. It uses the idea of a distributed hash table.

    The Author has a overview here [monash.edu.au].

    My brief synopsis:

    The network is orginized in a circle, like a circular link list, each node knows about it's neightbors it's neighbors neighbors and maybe a little more, bassicly every node knows about maybe 6 or so other nodes. Each node keeps a section of the hash table, say 0x0500-0x1000 or so.

    Each search item is hashed and then sent left or right depending on if it's less or greater then range you are storing, so 0x3000 would be sent to the rightmost node because it's larger. That node would repeat the process, therefore making the search a lot like a binary search.

    -Jon
  • by dh003i ( 203189 ) <dh003i@gmai[ ]om ['l.c' in gap]> on Saturday July 06, 2002 @04:52PM (#3834349) Homepage Journal
    Here's an idea...

    Rather than dynamically searching for the file you want each and every time you type in the name, why not each user create a data-base on the files all the other users on the network are using?

    Once you get on the network, it does a search and accesses other people's database, so your current file database is updated.

    This would require much less bandwidth than searching dynamically every time you typed in a search term (though it would require a little bit of hard drive space), and it would allow search results to be produced very quickly, as you'd essentially be searching a file on your hard drive.
  • perhaps files should be stored (or at least indexed) from newest to oldest? "The team also showed that deliberately storing information in a random fashion made the network function more efficiently and that there is an optimum number of copies of a file, related to the number of requests that file gets." If files where searched in the order of newest to oldest it would be effectively stored randomly in an alphabetic sense. Then most importantly people are more likely to be searching for recent files and will get results for them sooner. Caching of results of walkers before they are sent on, for a period of time, would surely help as well.
  • The problem with random walkers and the internet is hidden in the text "Random walkers are going to end up at the well connected nodes". Its the same problem that complicates search agents like those used by google...they always end up at the websites or nodes which have a large number of links to them. On one hand, random messages instead of flooding decreases the traffic load. However, it probably is as bad as gnutella w.r.t searching for infrequent files.

Almost anything derogatory you could say about today's software design would be accurate. -- K.E. Iverson

Working...