Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Music Media

Hypernets -- Good (G)news for Gnutella 169

Red Roo writes: "This online article addresses the recent criticism of Gnutella network scalability by pointing out that it is a Cayley tree. As a viable candidate for massively scalable P2P bandwidth, all trees are dead! But by going to higher dimensional virtual networks (aka "hypernets") e.g., hypercubes or hypertori, near linear scalability can be achieved for P2P populations on the order of several million peers each with only 20 open connections. This concept seems to have been entirely overlooked by critics and developers alike."
This discussion has been archived. No new comments can be posted.

Hypernets -- Good (G)news for Gnutella

Comments Filter:
  • by Anonymous Coward on Sunday February 17, 2002 @06:02AM (#3021164)
    I had a four dimensional hypernet using a router from Heinlein Technologies installed in our off-site server room and it caused no end of grief.

    Firstly we had a continuing problem with dropped packets but things started really screwing up when our time domain packet switching starting picking up packets that HAD NOT BEEN SENT YET!

    The collision rate went up to an astounding 53% by the time I was able to pull the plug. In short It sucked big time!

    By the way - don't waste your time buying any of those California lotto tickets this week because just before I downed the thing I surfed the net....
    • this is funny. get it, a joke.
    • by Anonymous Coward
      I guess they must have messed up the 4th dimension - time if it can cause a collision before you sent them. Wow !!

      As much as I would like the open source stuff. Gnutella is not ready for big time for average users who have stuff to share. The windoze version chocked my old 486 at times by doing nothing. Compared to Morpheus, it was like visual basic slopware that uses all lots of resources. There were no contents on the net. Anything remotely interesting were behind firewall and was unreachable at all or has zero bandwidth. There were too much traffic even when not transfering files - about 1/4 of my uplink on a DSL line. :(

      Morpheus on the otherhand was quite usable on the old 486 windoze 95 box. (see below) I run atguard so I didn't even there were popups ads. It takes up only 20% of the CPU time during transfers and is idling most of the time. There are rarely traffic when idle - a few blips every ten's of minutes. It would seem their supernode architecture is the right aproach.

      As to contents, there are over 600,000+ users sharing ~650 terabytes of files. It is not perfect, but it is quite simple to use even for the average users who pulled in all kinds of contents all over the world.

      Note:
      I use 95 because it has the smallest memory & HD foot print - disk space are for shared files. The quietest 486 box generate the least amount of heat and survived 3 extended (2-3 days before I knew) fan failures over the hottest days in the summer without crashing. This is my morpheus (only) machine that runs 24/7 and I reboot it may be every other months. I would not trust my mainbox that have the takeoff noise and hot exhaust of a Harriar to left unattented or overnight.
    • OK, Since you liked this I'll admit that the Anonymous coward was me

      Sorry, I just couldn;t resist.

    • OF course you realize that 4-D would simply be 3-D space+Time. If you've got a 3-D router (or a 3-D world for that matter) NOTHING WOULD EVER HAPPEN.

      So, because of the fact that your router is not frozen in time, you can see the future?

      Besides, I can't remeber the last time I went to the lotto website and it wasn't down.
  • More Info (Score:3, Informative)

    by Mattygfunk ( 517948 ) on Sunday February 17, 2002 @06:07AM (#3021173) Homepage
    For more information on the structure try Relation Nets and Hypernets [uni-mannheim.de] in pdf form.
  • what a crock (Score:1, Informative)

    by Anonymous Coward
    anyone who has taken basic algebra knows this article is a complete farce...let me count the ways:

    - a hypertorus has d^2 orthogonal dimensions, not d, as the "article" states
    - his explanation of network diameter is a complete fantasy. the latency between points on a network has absolutely nothing to do with the chordal length, as he would well know if he'd get out of his ivory tower and do actual internet studies, as I have
    - equating the path length with the peer horizon is utter speculation, and Ritter himself has denounced this type of mathematical voodoo
    - "Little's Law," as he calls it, has been discredited at least a half dozen times by researches at Harvard, Stanford, and elsewhere
    - anyone with a grounding in mathematical principles knows that there is no such thing as a binary hypercube, any more than there is such a thing as a square circle

    I can't believe Slashdot just posts this psuedomathematical nonsense without doing even elementary fact-checking
    • I can't believe Slashdot just posts this psuedomathematical nonsense without doing even elementary fact-checking

      You must be new.
      Or at least have an everlasting optimism.
    • Re:what a crock (Score:4, Informative)

      by Perdo ( 151843 ) on Sunday February 17, 2002 @07:15AM (#3021255) Homepage Journal
      Binary Hypercube [wolfram.com]
      • This link describes a way to describe the nodes of a hyber cube in binary. From a mathematician's standpoint that and saying something is a binary hybecube could be two very different things. I think this is the point that was being made.

    • I can't believe Slashdot just posts this psuedomathematical nonsense without doing even elementary fact-checking

      At first, I thought you were refering to your own post, and was tempted to give you a couteracting +1 Insightful.

      -- MarkusQ

    • anyone with a grounding in mathematical principles knows that there is no such thing as a binary hypercube, any more than there is such a thing as a square circle

      Sure there is.

      The n-dimensional cube graph, aka the n-dimensional hypercube graph is the graph consisting of 2^n vertices, each corresponding to one of the binary words of length n. Two vertices are adjacent if and only if the binary words they represent differ in exactly one bit position. It has n*2^(n-1)edges.

      Binary hypercubes and Gray codes have been used in tons of applications for a long time. They are used in computer architecture, database design, digital communications, solving the chinese ring puzzle, etc.

      In short, there is such a thing as a Binary Hypercube.
    • I can't believe Slashdot just posts this psuedomathematical nonsense without doing even elementary fact-checking
      what, you new here?
  • Anyone have any good learn math links?
    • Allow me to translate:

      Cayley Tree - A network where the nodes are connected together without any central server.

      Hypernet - A network where smaller nodes connect to high-capacity supernodes, which are in turn connected to other supernodes, each with their own sub-network, i.e. the same thing as FastTrack, but without the central encryption servers (at least, that's how I understand it).

      In short, this is good technology, but, once you scrape the marketing bullshit off, hardly new.

      • Thank you. You saved my half an hour of reading. ;-)
      • by Nindalf ( 526257 ) on Sunday February 17, 2002 @09:59AM (#3021441)
        A Cayley tree is a tree (network with no cycles, a cycle being a set of connections a path you can take in a circle to get back to the same node) in which every node, except those right at the edges, has the same number of connections. As a tree, if you cut any connection, you're seperating the network into two unconnected networks (or isolating one node). Noting that it's a Cayley tree is pointing out that as it grows, nodes at the edges have more and more connections between them (all clients connecting to one server would be a tree, too, but the number of connections to the server would keep on growing, which means it wouldn't be a Cayley tree).

        A hypernet is like a grid: imagine the nodes like the places where the lines cross on graph paper, so each node (except at the edges) is connected to 4 others, in a regular, predictable pattern with lots of cycles. Now imagine a 3d grid, like a lot of stacked sheets of graph paper with each node connected not only North, East, South, and West, but also up and down. Each connected to 6 others, in a regular, predictable pattern with lots of cycles. That's as far as you can go with physical models, but in a freely-connecting network like the internet, you can keep going to 8 connections per node (a 4-dimensional hypercube network), 10 connections (5 dimensions), and so on.

        That explanation was for a hypercube, a hypertorus would be like going from a bunch of connections around a circle, to a regular set of connections over the surface of a donut, and so forth.

        Either way, it's one huge mass of cycles, lots of redundancy, lots of routing choices. If you cut a connection it doesn't matter much; naturally if one user bogs a connection (or chain of connections) down with a heavy load, it's practically like it's been cut. Hypernetworks give you the freedom to route around traffic jams, and the regular structure (cube or torus) simplifies the routing over an unstructured network of random connections.
        • Ah. I wasn't really sure what a Cayley tree was, so I just went with the article's explanation ("a tree with no root"), and put it together with what I knew about Gnutella.

          What would be really interesting would be if the hypernet could seamlessly connect totally different network types. Unfortunately, I doubt this would work, since even if you were able to translate the search data, the methods for getting the actual files would probably be incompatable.

      • Now it has been A LONG time since i could have answered this authoritively - BUT - I'm pretty sure that the hypercube being mentioned in the article is a mathematical construct so your definition of Hypernet may not be real appropriate. A hypercube is a 4 dimensional construct if I remember.

        There were supercomputers built by Intel in the early 90's that used hypercube connectivity schemes to minimize connection times for data as I recall. This has been done before.
    • Yeah, try this one. [sesamestreet.com]
    • Here, mesmerize yourself looking at some four dimensional shapes [nyu.edu], courtesy of the elite Ken Perlin.

      (Note: This page is much more fun while under the influence of hallucinogenics.)
  • by Anonymous Coward on Sunday February 17, 2002 @06:14AM (#3021180)
    The author, Neil Gunther is semi-famous
    in the Unix community for giving very
    good talks and courses on performance.

    Check out his open source performance analysis
    softwar PDQ (Pretty Damn Quick)

    http://www.perfdynamics.com/Tools/PDQ.html

  • Jeez. Editors and submitters take note: if you want somebody to click the link to the story, don't make the blurb a pile of almost-unintelligible mush. A much better way to submit it would be:
    Check out this article. It explains more or less how we might shift the paradigm of a Gnutella-like P2P network towards a more geometric model that has *much* less bandwidth overhead and possibly much better performance. Beware the big words, but do yourself a favor and check it out.

    oh yeah, and if the editor's michael, you can tag this on the end:

    i fucking hate all p2p networks, and this is a really stupid concept, but check it out.
    • oh yeah, and if the editor's michael, you can tag this on the end:

      i fucking hate all p2p networks, and this is a really stupid concept, but check it out.

      and if the editor's Timothy, tag this onto the end:

      i bet this doesn't work on windows.

  • As far as I can tell, what this is saying is that Gnutella is scaleable because it doesn't use a tree, (with each node connected to only a few 'nearby' others) but rather as a more complex graph, with each node having connections to many nodes which aren't really nearby. In a true tree, there's really only one route from one node to another. In contrast, a hypercube has many many paths from one node to another.

    In reality, I'm pretty sure no actual large-scale networks are like this, for obvious reasons, but I surmise they tend to be more treelike than gnutella is, where each client tends to try to make as many connections to other clients as it can. Therefore, it should be pretty scaleable; since if each new client is making connections to a bunch of other clinets that might not otherwise have a short distance between them, there aren't really goingto be any bottlenecks.
    • by kris_lang ( 466170 ) on Sunday February 17, 2002 @07:03AM (#3021238)
      Actually, there was a system in the 1980s that did implement a 10-dimensional binary hypercube topology for communication and implemented it well. Daniel Hillis' Thinking Machine was a highly parallel 1024 node machine which implemented inter-node communication via a 10-bit binary hypercube address system.

      Despite what one of the other posters has said about "binary hypercubes" being nonsensical, they are simply a way of describing the nodes of a hypercube with a binary address. Each node in a 1024 node network has a ten bit address. Each one has to connect only to ten other nodes: the ten nodes which are specified by flipping only ONE bit on that nodes address. This provides multiple redundant paths by which a message can be routed in the case of the failure or congestion of a node.

      And by intermittently polling the other nodes to which it connects, it can keep a routing table which optimizes the paths to be used. The major difference is that in Hillis' Thinking Machines the number of nodes was capped, in fact it was a fixed number of nodes. Now implementing a dynamic network in a hypercude or hypertoroid topology brings on a new set of problems such as dynamically re-allocating the hypercude node addresses as users fall off and climb back on to the network. This can be more of a bear than people realize.

      • by MarkusQ ( 450076 ) on Sunday February 17, 2002 @11:06AM (#3021595) Journal
        Now implementing a dynamic network in a hypercude or hypertoroid topology brings on a new set of problems such as dynamically re-allocating the hypercude node addresses as users fall off and climb back on to the network. This can be more of a bear than people realize.

        This shouldn't be too hard (at least it doesn't look too hard sitting here on a Sunday morning, half way through my first cup of liquid brains). The key is to note that we can't (and therefore needn't bother trying to) enforce the topology at all times. Instead, we just want to bias the network towards the desired form. For example:

        New nodes pick a random twenty+ bit ID. This would probably not be enough to prevent collisions, (cf the Birthday Paradox [burtleburtle.net]) but that can be dealt with presently.
        New nodes connect up to whoever then can find. This would be pretty much what happens on most p2p now.
        Once the node has the desired number of connections, it can rank them by comparing the number of bits its ID has in common with the IDs of each peer (the Hamming Distance [vub.ac.be] between the IDs). As further peers are discovered, it can establish connections to any that are Hamming-closer than its worst ranked peer and drop the worst ranked peers from its connections.
        Likewise, if a node detects a pair of peers that are Hamming-closer to each other than either is to it, it can "introduce" them to each other.
        If two peers meet and discover they have the same randomly chosen ID, the younger can flip one bit (for each ID bit, count the number of ctive connections that differ on that bit; flip one with the max count) and continue.
        Fat pipe peers can run more than one ID if they wish (cf UltraPeers)

        This should at least be functional; no doubt there are a number of clever hacks that could be made...

        -- MarkusQ

        • we just want to bias the network towards the desired form. For example...New nodes pick a random twenty+ bit ID...New nodes connect up to whoever then can find.

          Except that you describe routing based on Hamming distance (which won't work because of looping issues) rather than shared prefix/suffix, this sounds a lot like Tapestry [berkeley.edu].

          • MarkusQ :we just want to bias the network towards the desired form. For example...New nodes pick a random twenty+ bit ID...New nodes connect up to whoever then can find.

            Salamander: Except that you describe routing based on Hamming distance (which won't work because of looping issues) rather than shared prefix/suffix, this sounds a lot like Tapestry.

            Actually, I was just using the Hamming distance as a rough gauge for "how far is this node from where it should be in a perfect binary hyper-cube" (since in a perfect binary hyper-cube, each node is connected to all the nodes (and only the nodes) at hamming-distance 1 from its ID. So what I described was a way to 1) pick a corner at random, 2) wander towards it, and 3) go to another nearby location if the one you picked turns out to be occupied.

            I'm not familiar with the looping issues you refer to. Can you elucidate? (Note also that this would not be so much a way to "route" as a way to "partially broadcast" (e.g., maybe decrement TTL by two when sending "in the wrong direction" or propogate ox-bow removal info when something is broadcast to you via a sub-optimal path). We wouldn't count on nodes being where they should be, just hamming-close.)

            -- MarkusQ

            • I'm not familiar with the looping issues you refer to. Can you elucidate? (Note also that this would not be so much a way to "route"

              You're right. I was conflating the issue of how to establish and maintain a topology with that of how to route messages within that topology. I tend to do that, because I find that in real systems the interactions between these two supposedly-separate issues are so strong that they become inseparable.

              Upon reflection, I find it hard to judge how well a system such as yours would work. On the one hand, you might run into a classic hill-climbing problem. Two nodes that "should be" adjacent to each other in a nearly-ideal hypercube might be too far separated (in terms of the underlying IP network) initially to find each other before they each settle on local maxima instead. On the other hand, it might not be good thing if they did find each other, because you might end up with a really neat hypercube at the upper level, but each "adjacency" in the hypercube is really a long multihop route across the continent at the lower level.

              Trying to find a balance between these two extremes might be difficult. In fact, trying to impose a hypercube overlay-network topology on top of an IP network whose physical structure is most definitely not hypercube-like might be a fundamentally doomed idea. I don't mean to say it's not worth it to try. Only detailed simulation or even real-life deployment can truly provide the answers to these sorts of questions. I'm just saying that this particular problem domain tends to be "swampy"; things that appear promising at first run into pitfalls much further down the road, much to everyone's frustration and indignation.

              • Only detailed simulation or even real-life deployment can truly provide the answers to these sorts of questions. I'm just saying that this particular problem domain tends to be "swampy"; things that appear promising at first run into pitfalls much further down the road, much to everyone's frustration and indignation.

                *smile* That's why in real life I tent to bid such jobs by the hour rather than by the project.

                -- MarkusQ

            • If you're trying to minimize Hamming distance, wouldn't it be better to establish the first connection without an ID then ask the peer for its ID and the IDs around it? Then we pick a non-occupied one and use it.

              Then we can just start making connections, getting their IDs, and dropping the furthest out, yes?

              I like the idea of "introduction" via peers - watch addresses that come by, if one is a certain Hamming distance or shorter away from one of our connections, tell that connection. 's that basically how it works?

              Finding Hamming distance sounds like a parity-class problem (XOR and sum, right?) so it shouldn't be bad at all. Though is there an easier way for the sum part of it than "if the last bit is one, increment counter, rotate right, repeat"? This might not be a problem, and that's an O(n) loop (right?) - not bad, but is there an O(1) solution? Maybe we'll just want to check addresses at random rather than every one - save cycles for more useful things. Also, saturated hosts will probably not even bother with comparisons - if all n of our local peers are right, then why bother?

              If we use the introduction method, won't the "core" of the network [those hosts that stay on for a long long time] eventually fall into place as a proper hypercube?

              How do we handle address space conflicts? If, say, the network were to partition itself in two, somehow (links just happened to be dropped in the right way), and then a host comes up that bridges the two segments, what do we do? Worse, what happens if two perfect hypercubes overlap [all n connections in each are only one bitflip away, and all have the same numbers]?

              --Knots
              • I should state up front that my first post on this topic was off the cuff & contemporanious with my first cup of coffee this morning, and I've haven't sat down and put anything on paper. So some or all of this may fall flat on more detailed analysis. But with that caveat:

                If you're trying to minimize Hamming distance, wouldn't it be better to establish the first connection without an ID then ask the peer for its ID and the IDs around it? Then we pick a non-occupied one and use it.

                I suppose I'm assuming that there are a relatively few "doors" into the network, and worried about clustering; also, my gut feeling is that starting at a random point lets you disregard a number of potential problems on statistical grounds.

                Then we can just start making connections, getting their IDs, and dropping the furthest out, yes?

                Hmmm. What you are describing (if you drop my assumption of random starting location) results in what is called flocking. Basically, everyone winds up trying to move to the center of the cloud as they see it. Nice dense connectivity, but (since we aren't charged for hamming-distance) there's no real advantage to it, and it increases the risk of fragmentation.

                I like the idea of "introduction" via peers - watch addresses that come by, if one is a certain Hamming distance or shorter away from one of our connections, tell that connection. 's that basically how it works?

                Yep.

                Finding Hamming distance sounds like a parity-class problem (XOR and sum, right?) so it shouldn't be bad at all. Though is there an easier way for the sum part of it than "if the last bit is one, increment counter, rotate right, repeat"? This might not be a problem, and that's an O(n) loop (right?) - not bad, but is there an O(1) solution?

                It's O(n) where n is the number of bits in the address, which scales log(n) with the max size of the network; not bad at all. You could reduce it further by doing a lookup table (array [0..255] of 0..8 = (0..255).collect{ |i| i.bits_set } or some such), but there isn't really any need; the amount of work is << the amount required just to receive a packet.

                Maybe we'll just want to check addresses at random rather than every one - save cycles for more useful things. Also, saturated hosts will probably not even bother with comparisons - if all n of our local peers are right, then why bother?

                Cynical answer: there's no harm (all of this should be very cheap) and we can avoid having the behaviour change after the beast has been running for hours, which would complicate the code and might open the door for mysterious & hard to find bugs. ("It ran fine for about a week, then it just went cross-eyed and dumped grape jelly onto the hard drive!") If I'm going to leave something running for a long time I like (where possible) to have hitting the same code paths at the end as it was in the begining.

                How do we handle address space conflicts? If, say, the network were to partition itself in two, somehow (links just happened to be dropped in the right way), and then a host comes up that bridges the two segments, what do we do? Worse, what happens if two perfect hypercubes overlap [all n connections in each are only one bitflip away, and all have the same numbers]?

                This is one of the "you can ignore it on statistical grounds" points. With random starting IDs the odds of this scale towards the odds of all the air being on the other side of the room the next time you inhale. It could happen, but I wouldn't hold my breath.

                -- MarkusQ

        • Or why not go with a 32-dimensional hypersphere. It seems to me we already have that address space handled, routing and all-- IPv4.

          A client can do everything you suggest (to try to shape the network) based on its IP address. This way, the current TCP/IP network routing can be used, which is probably more comprehensive and efficient (in real-world terms) than any "virtual" network would be. Even better... it could be fitted into the current Gnutella protoco by only adding to the connection process.
          • Or why not go with a 32-dimensional hypersphere. It seems to me we already have that address space handled, routing and all-- IPv4.

            Sounds good...I'm reluctant to give up the random address selection but this seems reasonable...hmmm...

            -- MarkusQ

    • I have a question. I am interested in but know next to nothing about this stuff. I was wondering if anyone could tell me whether the networks that are being talked about in these articles are theoretical ideals that are different in form as networks that are actually in use.

      As far as I understand it, a lot of the models used in scalability analyses of Gnutella seem to assume a homogeneously connected network. Whereas analyses of the Internet show there to be (a) a few highly connected sites (b) a large number of sites that are not well connected at all, and (c) a tendency for new network connections to appear on already well-connected sites, rather than on less-well connected ones.

      I was wondering if there would be a difference between a network's scalability if (a) the distribution of edges between nodes was considered homogeneous, or if (b) the distribution of nodes was considered as skewed (e.fg. a Zipf distribution). Is case (b) more scalable than case (a)?

      Thanks.

  • The complaint seems to be that the "network" is being used inefficiently. But who cares? I pay for my bandwith, and I'm glad to use it all up. Costs the same for me.

    Sure, as the number of users gets larger, the quality of my searches stops getting better. But I don't care because the quality of my searches are already good enough.

    This whole thing sounds like sour grapes from people who want to control from a central server.
    • Re:Does it matter? (Score:3, Insightful)

      by GigsVT ( 208848 )
      You don't pay enough to suck down all your bandwidth 24/7. If you were led to believe that was your right when you bought your broadband, then I'm sorry you were misled, but the only people who are sold lines that they are allowed to max out 24/7 are the people that pay for real Internet access, i.e. a T1 from a first tier provider. Otherwise, you have to share.
  • All you P2P advocates (and yes that includes me) had better quit programming P2P platforms and get on your lawyer-faggot wigs because in less than 24 months, Microsoft's DRM will be COMPULSORY on all new computers, and then some.
    Tell your local member of parliament you think this fucking stinks. Or bye-bye P2P, scalable or not.
    • I'm not one for saying "it'll never happen" because sometimes if you just sit there that's exactly what happens.

      But I'm not convinvced of this particular threat.
      It would require worldwide cooperation and at every level of computing. It would be difficult to draft an international law AND define what a computer was. Does my digital watch need DRM?

      To get this law in one country (probably the US) is going to have to implement it unilaterally. Chaos will ensue. I think it's just too much hassle for a government to embark on.

      So mark my words, and then punch me with them when you can't play your .ogg files no more.
      • It would require worldwide cooperation and at every level of computing. It would be difficult to draft an international law AND define what a computer was. Does my digital watch need DRM?

        We, all of us, tend to concentrate more on our own domestic politics than international political trends, especially when thinking and discussing things like privacy, encryption, and yes, the digital copy prevention technology euphemisticly referred to as digitial rights management (DRM).

        But keep in mind that the Hague convention is already passing and allows, indeed requires, any national law regarding copyright (and via the DMCA that includes copy protection, i.e. DRM) to be applied to every signatory country.

        And there are other treaties being railroaded through Geneva by the Copyright Cartels as we discuss this.

        I too once hoped that the DMCA would make the United States so uncompetetive that the problem would be self-solving. This would be true, were it not for the fact that the corporate interests pushing these sorts of things are doing so internationally (both locally in various parliaments, eg. the UK, the EU as a whole, and Australia and at the international treaty level).

        In five or ten years Americans unilateral rewrite of copyright law into criminal law would make it uncompetetive in the new, digital economy (with ripple effects into other parts of the economoy most likely), but we are not going to have five or ten years before things like the DMCA become international law and the playing field levelled once more, at a much lower common denominator.
    • maybe a lot of ppl would switch to linux boxes...at least for the extra computer. i know enough ppl who have one computer dedicated just for their file sharing, it's pretty plausible.
    • All you P2P advocates (and yes that includes me) had better quit programming P2P platforms and get on your lawyer-faggot wigs because in less than 24 months, Microsoft's DRM will be COMPULSORY on all new computers, and then some. Tell your local member of parliament you think this fucking stinks. Or bye-bye P2P, scalable or not.

      ----BEGIN divergence from topic
      Haha! In the good 'ole US of A we kicked out thy tyrats and their tyranical parlimentary system! The American bicameral system and electoral college provide a much better form of abstraction, indirection, and power balance in order to minimize the influence of special interests and big money. The founding fathers were true visionaries and foresaw the problems the Europeans are having.

      As recent events will show, we Americans could never... oh wait, sorry, I was sleep-typing again. It appears to have been a rather good and almost realistic dream. Such a shame I woke up. Maybe I should get myself some extra strongcoffee and pay extra close attention next election instead of sleep-voting again. If nobody votes with a clue, how is congress supposed to get a clue... This is representative government, remember?

      Seriously, if you start voting with a clue, and cluing your neighbors in and actively helping with campaigns you beieve in, then the system will work coser to the way the founding fathers intended. THe system was designed for a bunch of riled up revolutionary citizens, not the bunch of apathetic winers we've become. (Yes, I have volunteared my time with a political campaign and I have written my congresscritters and I have ubmitted an opinion on the MSFT case.) You can't expect the legislature to have a clue unless you personally have done something to clue in the voters. It's that simple. Wasting my time doing something is better than doing nothing.
      ----END divergence from topic

      I wish DRM circumvention wasn't driven by greed in most citizens, but it still represents some sort of civil disobedience in the same way that visiting speakeaseys durring prohibition. Writing P2P systems and using them shaws congresscritters in some sall way how we feel about DRM legislation. I just wish it was a more altruistic demostration.

      Yes, I do realize that the poster's choice of language appears to place him/her on my side of the "big pond".

  • Hyper-cube? Cylnar tree? All this for giving someone else a file? What happened to the good old days when you either used ftp or walked over and gave the guy a floppy.
    • Re:Good lord!!! (Score:3, Interesting)

      by GigsVT ( 208848 )
      Even though you might have been trying to be a little facetious, I think your point is very valid.

      I can grab files, with no small amount of effort, from an online file sharing service, and maybe get 2 GB a day. If I network, in person, with people who share similar tastes in music, I can get a lot more bang for the buck. As soon as we see larger portable technologies (It's already happening), trading media will be just like trading games in user groups was back in the 80s and early 90s. We all just bring our 300GB portable disks to meetings, link them all up, and take them home to copy onto our TB home systems.

      The only thing that would prevent this from happening is a very rapid growth of broadband, to something like reliable 10Mbit levels. I don't see that happening before hard disk space grows to the sizes I quoted above.
      • I aim to be honest yet sarcastic. Anyway, me and my friends just use cd-rw to trade big stuff. Otherwise theres nothing I really need that I can't get with my modem and 3 hours between 3-6AM ;-).

        python & forte 3.0 tonite and java 1.4 last nite... not bad for 6.5 kB max bandwidth per sec (56kb modem is 6.5 kB transfer rate... I think they probably just use bit to make it sound faster).
  • by Rhinobird ( 151521 ) on Sunday February 17, 2002 @07:07AM (#3021244) Homepage

    Correct me if I'm wrong here, but, the article seems to be saying that packet switching is more effiecient than old style circuit switching (?hierarchal switcing?). It says that bouncing stuff around nodes connected to a bunch of other nodes and letting the stuff find a path to its destination is more effiecient and scalable than any kind of tree structure where stuff goes down to the trunk and back up to a different branch to reach its destination.

    Unfortunately with the way things are set up right now, I think our beloved internet is set up like a toroid instead of a cube. You have a backbone as the middle loop and then coming off of it are rings that are local that provide service to local ISPs. Then they sell to thier end users. In the end I picture a fuzzy torroid. And according to the article, those are more scalable than trees, but not as much as the cube. However the article says that they are harder to implement than the cubes, not so, as they seem to have evolved in the marketplace naturally, and setting up a cube like network in the real world is harder.

    But they're talking about this applied to software, and virtual networks, not real world hardware. However, seeing as how the real world has moved from a tree based telecom system, to the torroid sceme of the current system, it would be interesting to see what happens when the torroidal system in the real world runs into scalability problems and goes for the cube.

    • Well.. to have a huge search network.. this is better.

      As long as it has some sort of geometric shape, we can make searching more efficient.

      Gnutella has no shape, It's a handful of spaghetti thrown on the floor.

      The there's Fastrack.. it's like throwing a handful of spaghettios on top of the first pile of spaghetti.

    • I would think that the toroid shape of the network didn't occur naturally, it occured financially. It is cheaper and more selfish (ie, you can make money charging for bandwidth this way).

      And their are many examples of cubic networks in use. Just look a some schemes for beowulf clusters.
  • p2p evolution (Score:5, Interesting)

    by nabucco ( 24057 ) on Sunday February 17, 2002 @07:13AM (#3021254)
    It should be noted that in it's evolution since the Nullsoft Gnutella 0.56 server/client, modern Gnutella servents have reduced traffic and improved network scalability by means of
    - routing pushes instead of broadcasting them
    - caching pings/pongs, and even queryhits
    - use of UltraPeer/leaf relationship, which increases the speed at which traffic is routed

    There are other ideas that Gnutella developers like those at Limewire have been kicking around, which are similar to ideas that publishing networks like Freenet and MojoNation have, such as data specialization (ie. queries are directed to those likely to have the data, not broadcast to the entire world).

    I'm glad whenever mathematicians or people with specialities like traffic analysis examine existing p2p systems, or give their ideas on p2p systems - they might come up with some good ideas or give a good critique that clarifies elements of a p2p network. This paper is certainly less arrogant than ones with names like "Why Gnutella Can't Scale. No Really". A hypernet is an interesting idea, although I can think of a number of reasons why current p2p sharing networks would not implement them. Namely, because authoritarian networks like Napster were shut down by trade associations like the MPAA/RIAA, while more anarchic networks like Gnutella are more immune from such actions - we must consider not only the survival of the scaling network due to technical constraints like Dr. Gunther does, but also it's survival due to legal constraints orchestrated by large corporations. Then there's the question of how many peers the network is designed for - scalability is just one factor in the reasons why I would use a particular p2p client. Luckily, we will have competition between p2p networks like FastTrack, Gnutella, Freenet and MNet (Mojonation), and perhaps different ones will be used for different purposes, just like Usenet, distributed.net and so forth.
    • Exactly backwards (Score:4, Insightful)

      by MarkusQ ( 450076 ) on Sunday February 17, 2002 @09:55AM (#3021432) Journal
      ...modern Gnutella servents have reduced traffic and improved network scalability by means of...use of UltraPeer/leaf relationship

      A hypernet is an interesting idea, although I can think of a number of reasons why current p2p sharing networks would not implement them. Namely, because authoritarian networks like Napster were shut down by trade associations like the MPAA/RIAA, while more anarchic networks like Gnutella are more immune from such actions - we must consider not only the survival of the scaling network due to technical constraints like Dr. Gunther does, but also it's survival due to legal constraints orchestrated by large corporations.

      I think your concern here is exactly backwards. Specifically, higher dimensional topology would decrease the need for central "UltraPeer"s (also known as lawyer bait) and thus make the network harder to shut down. If the trend towards depending more on some "peers" than others continues to the natural limit, you wind up right back at Napster (one UberUltraPeer to rule them all, and in the darkness...get eaten by a grue if it's lucky, sued by the RIAA if it's not).

      On the other hand, if the topology is made more scalable, the targets won't be as tempting at any given network size, and the whole thing would be harder to take down by force. If all nodes are equal, cutting one will likely create enough publicity to attract seven more to take its place.

      -- MarkusQ

  • Hypercube (Score:3, Informative)

    by nr ( 27070 ) on Sunday February 17, 2002 @08:28AM (#3021306) Homepage
    Dont know what a hypercube is? click here [wolfram.com]
  • This paper is a great germ for further study, but until there is some practical experimentation to support it, it's hot air. Good hot air, perhaps, but still vapor. If theoreticians want the support of engineering types, we need to see some experimental tests. I've seen dozens of really good theories on improving peer and grid networks that turn sour as soon as anyone tries to implement them. This paper mentions much about peer networks and nothing about peer review. I'm withholding judgement until I read the peer review.
  • Sounds just like something I heard about once. The speedball. Everyone who does them loves them. The mix of heroine and cocaine are insane one slows you down the other picks up as you are about to crash. Of course everyone who does them dies. Tagging something like this onto a broken model is just ludicrous. The point is this, where the current model is strecthed to the point of death, adding one more element to it to work around it isn't going to make it work better, it needs to be taking to rehab and given a clean start or possibly taken out back and shot ending the problem entirely.
  • by RobL3 ( 126711 ) on Sunday February 17, 2002 @09:03AM (#3021327)
    A form of this organizational structure is already being implemented by Lime wire. Here is an excerpt from thier FAQ:

    we've created a new Gnutella hierarchy, whereby high performance machines become 'Ultrapeers'. These machines accept connections from many LimeWire clients while also connecting to the rest of the Gnutella network. Moreover, the Ultrapeer shields these 'regular' LimeWire clients from the CPU and bandwidth requirements associated with Gnutella, directing traffic to clients in an efficient manner. The reason you see only one connection in your connections tab is because you are a LimeWire client connected to an Ultrapeer. Unfortunately, not all Ultrapeers are as good as others. If you find that you aren't getting many search results with the Ultrapeer you are connected to, simply disconnect and connect. You'll probably connect to a different Ultrapeer, which is more 'connected'. Also, as time goes on and the network grows, you'll receive more results. Moreover, we are currently working hard to ensure that any Ultrapeer you connect to will be well connected - stay tuned to future versions of LimeWire.

    My success with the new structure is mixed. Downloads and searches seem to work almost as well as before, but I'm getting considerably fewer uploads, which must mean that, someone, somewhere, is getting screwed. Limewire itself is not a bad little product, it's main claim to fame, of course, is that it runs well on both Mac OS X and Linux.
  • by Anonymous Coward
    I don't have a /. account yet, but I do have written a (prototype) P2P agent that uses a 64-dimensional binary space to arrange itself. This allows for non-centralized address resolution in an statistically well distributed network.

    Every node takes a random 64-bit number as address (collisions are possible but unlikely with 64 bits) and once seeded searches to position itself within its closest peers. The distance to another node is simply the number of different bits in the address (plus a higher weighting of high bits in case of a tie).

    Upon this infrastructure, a group mechanism is implemented, where any member of a group stores all directory information for that group, and a list of known hosts for the identified content, as well as the peers for the next group.

    Groups are hierarchically arranged, like a directory. Membership in one group mandates membership in all "higher" groups, up to the "root" group. Therefore, it is possible to navigate through the whole system like through a file directory tree.

    Source code is available for the Macintosh (think like "Hotline" without servers). It still has a minor memory leak, limiting stability to a couple of hours, and several other drawbacks that prevent it from becoming full featured, like not being able to reach behind NAT, or limited protection against malevolent nodes.

    Ultimately, I stopped development because an IP-to-Content relation can be established and therefore the network is attackable-by-content. If anybody wants to pick it up and push the work ahead, the source (PowerPlant Net classes & UI) is up for grabs. Contact me at "komet163@gmx.net".

    Thought this might interest someone in the context of this multidimensional network discussion...
    • The problem with forcing stuff into a pre-defined address space is you'll have peers connected to each other based on a purely arbitrary notion without regard to actual network speeds or conditions. I don't want to find myself connected to some slow dial-up half a world away just because it happens to have a similar address.

      Peers should seek other peers based on the quality and better connected peers should automatically move to the center of things. FastTrack works quite well, and it only has one layer of organization (regular vs. supernodes). It might be better to use nature as an example and have some simple set of rules peers would follow to naturally find the best spot for whatever they're doing (the old "ant" idea). You could factor in things like bandwidth, how many searches produced good results, how much overhead traffic is being generated, file types shared, etc. Peers would then keep or drop connections based on these factors. The goal being for each to find its ideal spot. Not just for connectivity, but productivity. Peers with similar searches and files could naturally group together making it easier to find the files you actually care about.

      A simple approach to the IP address issue would be for each peer to generate a GUID and use this like a dynamic DNS name (similar to your 64bit idea but less chance of collision). You could then find another peer even when it re-connected under a different IP and more intelligent routing could be set up.
  • by shimmin ( 469139 ) on Sunday February 17, 2002 @09:24AM (#3021367) Journal
    While the article is interesting in the sense that it shows that efficient p2p network topolgies are possible (for suitably small definitions of efficient), actually implementing it on a network of untrusted peers could be problematic.

    This is because it assumes the peers are already arranged in the network in the topology one wants.

    If a central addressing authority exists, it is no problem to simply give new peers addresses and the addresses of their neighbors in such a way that the network acquires any topolgy the authority wants. The authority can even cope with peers leaving the network more or less arbitrarily.

    However, a real question is -- how do you get peers to "self-assemble" into the desired topology in such a way that a small population of peers that choose not to play by the generally accepted rules cannot dramatically effect the outcome. In other words, how can peers be persuaded to place themselves on the points of a cubic hyperlattice solely by contacting a few already installed peers, some of which may not be telling the truth?

    • Exactly.

      Moreover, what are we to make of this?

      The dominant constraint for hardware implementations of high-dimensional networks is the cost of the physical wires on the interconnect backplane. Since the hypernets discussed here would be implemented in software, no such constraints would prevent reaching the desired level of scalability.

      I really don't see how you can sweep the actual physical infrastrucure under the rug like this. Eventually, virtual hypercubes turn into real packets on a real network. A network that is subject to the very same topological limitations this article discusses. Any wonder that the Tandem Himalaya architecture he mentions was implemented in hardware, rather than as a "virtual" topology implemented on top of a traditional TCP/IP network?

      When it comes to complicated mathematics like this, though, intuition has often led me astray...
      • I really don't see how you can sweep the actual physical infrastrucure under the rug like this. Eventually, virtual hypercubes turn into real packets on a real network.

        It doesn't. The problem with the tree topology used by Gnutella is that it utilizes the available bandwidth in a fashion that becomes highly inefficient as the number of nodes becomes large. At around 10^6 nodes, it uses 15-20% of the total available capacity, whereas a hypercube topology at 10^6 nodes uses essentially 100% of the available bandwidth.

        So obviously, if your application runs up against the physical capacity of your underlying communications systems, you can't send more data. But via correct choices of virtual network topology, you can ensure that the physical capacity is being used productively.

        • via correct choices of virtual network topology, you can ensure that the physical capacity is being used productively.

          Would you say that the performance of the system would depend on your choice of which virtual nodes are connected to which other virtual nodes? Or can you say that the performance of a virtual hypercube topology can be considered completely independantly of the underlying physical network?
          • The first one -- the virtual network performance obviously cannot be independent of the physical network, since you clearly can't exceed the physical layer's capacity. However, to first order you don't really need to worry about the underlying communication system, since it won't be any different for any arrangement of virtual nodes.

            The issues examined by this paper revolve around things like, how many copies of a given packet need to be created to deliver it, how many nodes must it traverse, and what subset of nodes and links is carrying a disproportionate amount of traffic. This last issue is why the hypercube comes out on top in this analysis -- traffic is perfectly distributed over all nodes and links.

            What this analysis does not consider, though, are complications such as routing protocol overhead, and the mapping of virtual links onto physical ones, among other things. While you can consider a computer and its phone line as a unit, you really need to think about the fact that, if your network spans two continents, a large number of your virtual links are going to be sharing a single physical connection. But again, to first order you can neglect these effects.
    • While the article is interesting in the sense that it shows that efficient p2p network topolgies are possible (for suitably small definitions of efficient), actually implementing it on a network of untrusted peers could be problematic.

      ...to say the least. Besides the obvious algorithmic problems of establishing and/or maintaining such a topology in an environment where nodes enter and leave at such a high rate, there's a serious overhead issue. Any serious discussion of ad-hoc routing protocols (which is what this is) nowadays needs to include an analysis of the number of packets needed by the routing protocol itself, in addition to the efficiency with which "user" packets are routed. A network that always delivers user packets over an optimal path isn't really all that useful if 90% of the network's capacity is consumed by route updates. I was very disappointed to see that this particular paper attempts no such analysis of routing overhead; without it, the paper's conclusions must be regarded as highly suspect.

    • you could say the same for gnutella now. Yet only LOCAL information is needed, nobody needs to see the big picture.

      for example a node that newly connects to the network could jump from node to node until it finds a suitable place in the network (you should be able to easily work out this method for a binary tree). It could even (for efficiëncy) ask the network to give it a free spot. Nodes should then jump inwards to accomplish a smaller network.
  • by Anonymous Coward
    I really cannot understand why we keep harping on about gnutella. Gnutella was the experiment, and not the solution to all your file sharing needs. Ever.
    Just get over it. Gnutella came, was good, and went. Now there are better things. Kazaa/FastTrack. Distributed Napster. Etc.
    Stop trying to add new components to you VW bug, when there are Ferraris to be had.

    Hopfrog.
    • Quite to the contrary, gnutella-style networks are quite young and there is much more work to be done. In grad school we did some research with "virtual" or "logical" networks.. where you are building an overlay network without knowledge of the underlying network structure (you're a host, not a router). There are many cool distributed algorithms which would easily outperform gnutella's basic structure.. There is *much* room for improvement! Just adding some level of hierarchy would improve scalability.. Gnutella is a flat network which uses flooding.. anyone in networks should recognize this is a bad approach.

  • Hypercubes have been around for a long time. The first commercial implementation was the Connection Machine CM-2 [base.com] circa 1985 by Thinking Machines Corporation. Its a rather cool box with lots of blinky lights. Its successor, the CM-5, used a fat tree [syr.edu] network, which was a more economical way of achieving scalable bandwidth.

    Thinking Machines Corp went out of business around 1994.

  • As far as I was aware, the issue of scalability of Gnutella is something that came out of a simplistic design, indeed, I cannot really imagine a distributed P2P system of a simpler nature. There is obviously a point to reshuffling its topology (after all, we'll get to use words like hypertori!) but I do not see it immediately. Why not look at something like Freenet [freenetproject.org] which has been
    • designed by people who seem to have a good idea about what they are doing
    • made scalable
    • made efficient
    • made truly anonymous
      • I guess the point of this is to ask ourselves a question: why bother? Let the best programs survive.
  • This sounds like a great idea, but unfortunately, it's dead wrong.

    Moving to a higher order of dimensions of course makes the maximum path length shorter; but it also makes the number of edges per vertex increase. Or, in other words, the number of simultaneous connections needed. Most GNet users have already pushed the number of simultaneous connections up to the maximum they can handle, thanks to bandwidth limitations, and are still experiencing the scaling problem.

    I'm convinced the ultimate solution is a hybrid between Napster and Gnutella, with most end users connecting up like Napster clients, and a few volunteering to be index nodes, with a GNet-type organization between them.
    • Bandwidth limitations: artificial, and should never have happened. But now that the content providers are becoming the bandwidth providers, we won't get bandwidth to burn anymore. Though 802.11 networks might give us the capability sometime, since they are constrained by neither profit nor control.

      Hybrid Napster/Gnutellas: now legally impossible in the U.S., and shortly everywhere else in the world. Any volunteer index node owners would be in a world of legal pain, with their homes and future threatened. So sadly, not an option.
      • Point one: Bandwidth limitations aren't entirely artificial, and are undeniably there for people who don't have DSL or cable connections. A network where everyone has a 20-way connection would require something like 100K/s (800kbps) of throughput per user. Multiply that by the number of users suggested, and the nation's current infrastructure would collapse under its weight.

        Point two: Hybrid Napster/Gnutellas would be no more legally impossible than current peer-to-peer systems are now. Every node is involved in relaying search terms around, and is therefore technically involved in contributory copyright infringement. The only reason Napster went down was because its servers were fixed targets. Volunteer-run servers tied together using a Gnutella-like network could spring up and drop out as necessary, and the network would remain up.
  • You can do all the computer simulations you want of the Gnutella network, but I know that it just works. I've never had any problems finding music that I want, its just that it takes a little more time. Its kind of like stamp collecting. Even after Napster fell and there was an upsurge in nodes, I was still able to use it well. Who cares if you are on a subnet of Gnutella if you still find the music you want?
  • by plastik55 ( 218435 ) on Sunday February 17, 2002 @12:46PM (#3021917) Homepage
    You can turn the Gnutella network into some kind of "virtual" hypercube or hypertorus all you like, but that doesn't change the physical reality of the system, viz:

    The Internet has a structure with physical limitations!

    What good does it do if your many multiply redundant connections allow transmission of messages with a fewer number of virtual hops, when every connection going out of your college dorm goes over the same physical wire. The number of connections over which a search request must travel is a liability, not an asset, when many of those connections happen to use the same physical wire. The author of this "paper" has conveniently ignored this fact, and his conclusion (that adding virtual links to your network allows you to manufacture bandwidth out of thin air) follows directly.

    On a separate topic, the assertion that the virtual connectivity of Gnutella is anything like a Cayley tree is absurd, because it implies no closed paths. Consider: In order to discover and connect to a new a host on the Gnutella network, you need to catch a search request originating from that host. The fact that you were able to recieve that search request in the first place means that there was already a path between you and the remote host--therefore you have created at least one closed cycle by forming the new connection.

    • I wish slashdotters would not respond to a technical article unless they understood what they were talking about.

      The author fully understands the nature of the internet. He is limiting the scope of his study to "theoretical bandwidth limitations" because that provides an upper bound on scalability regardless of the underlying hardware network structure. Yeah, okay, you won't actually hit that upper bound, but that's not the point. You can't possibly *exceed* the upper bound. That's the point. Earlier papers attempted to prove much lower upper bounds, and this is a (quite good) refutation of them.

      Most posters are also confused about the relationship between hardware and software network topologies. Yes, the Internet connection going into your college or whatever has a fixed limited network topology. Yes, that will cause a 20-cube to have less bandwidth utilization than the theoretical limit. But guess what... the current logical tree structure that gnutella uses *also* does not match the underlying hardware structure, and so suffers in the same way.

      By discussing the differences between the Cayley tree and the hypercube, we can do things like quantify the *rate* at which each of them suffers due to a bottleneck like this.

      This is how science is done. People who ignore the science and just say "I am going to hack some super l33t heuristics into the network so searches work better" are being fools. Any heuristic you implement over a poorly-performing topology is going to work much better on the well-performing topology.

      You can install Linux on a 486, or you can install it on a 1GHz Athlon. The slashdotters dissing this paper are essentially saying "Well it's stupid to install Linux on the Athlon, it won't go any faster because you know, those instruction cycle counts for the Athlon are all theoretical and you will never hit that performance in the real world... there are going to be cache misses and FPU stalls and stuff. So let's stay with the 486! Yeah, it's slow, but if we just optimize our apps, they will run faster!" Well you can optimize your apps and fool yourself into thinking you are doing a good job. But if you then turn around and run the newly optimized apps on the Athlon (or put the same amount of effort into optimizing them differently for the different CPU architecture) you will ground the 486 into a pile of rubble.

      -N.
    • True. The fact that the physical layer of the network plays a large part in the performance of a P2Pnet is often neglected. I personally believe that the best performance increase in the next couple of months/years will come from designing 'virtual networks' around physical sublayers instead of just mathematical analysis. Both are important, but for now, one of these is completely overlooked. Since I'm seriously considering writing my masters thesis about coupling physical with virtual nets in the most efficient manner I'd like to hear about current research related to per peer bandwith and latency issues (being the first to address this sounds just as well, though ;) ). Mathematical proofs are very important when practicing 'good' research, but practical issues always seem to screw up these theoretical proofs, so I believe we should try to address these too.
    • You appear to have failed to grasp the true meaning of this paper. It is not saying "Scrap all that Gnutella is! It's a load of $EXCREMENT!" It is providing an alternative internetworking topology based on mathematical theory that provides a higher theoretical bandwidth utilisation than is currently enjoyed when the network scales to a large number of nodes.

      To simplify: If the Gnutella network used a different node addressing (and associated routing) scheme, total aggregate bandwidth of the entire network would be increased, even as the network scaled upwards of a million nodes.

      The underlying physical wiring is exactly the same for this method as for the existing Gnutella topology. That's the whole point. By simply changing the way Gnutella nodes connect to one another you can increase the aggregate network efficiency. It isn't manufacturing bandwidth out of thin air, it is making more efficient use of what is already there. Read over sections 4 and 5 again perhaps.

      Finally, I have to admit to being somewhat boggled by your last paragraph. Are you suggesting that in order to become part of a network you must be part of the network already? Various routing protocols have solutions for how to dynamically insert a new node into a topology. STP, OSPF and EIGRP have differing methods, not particularly analagous to Gnutella, but useful nonetheless. A 'closed cycle' as you refer to it could also be thought of as a routing loop. It is possible to have multiple paths and not have a routing loop, something routing protocols are generally designed to prevent. What you're talking about here is that you have to have connectivity into a network in order to become a part of the network, which is somewhat obvious. I mean, it does generally work better if you plug it in.

  • by Sanity ( 1431 ) on Sunday February 17, 2002 @12:48PM (#3021925) Homepage Journal
    The effectiveness of a search in Gnutella is directly proportional to the number of nodes who receive the query. This paper really only talks about the order in which those queries would be received by those nodes, which is - of course - largely irrelevant.

    Until Gnutella starts directing searches intelligently - towards nodes which are more likely to have the data being sought, as Freenet does, it will always be an inefficient way to search for data.

  • huh? (Score:3, Interesting)

    by Magila ( 138485 ) on Sunday February 17, 2002 @02:08PM (#3022198) Homepage
    While I admit I have no clue what the article is talking about (hyper what?), I was under the impression we could already do better than linear scalability with a simple hierarchical P2P network. Just have a dynamic cluster of loosely connected "servers" to which clients maintain a persistent connection to only one. With a little intelligent routing of queries between servers (i.e. don't just spit the querey out to all the other servers you know) you get O(1) scalability at the client side and something (probably significantly) better than O(n) on the server side.
  • by jeske ( 7383 ) on Sunday February 17, 2002 @03:51PM (#3022571) Homepage
    The author is using bandwidth calculations intended for physical network topologies and applying them to virtual networks. This is flawed and renders his analysis invalid.

    In layman's terms. If you are a node in one of his example networks, and you're sitting on a DSL connection, does the available bandwidth you contribute to the net change whether you have 20 outbound TCP connections or 2? No. It is constant. The author incorrectly computes the "bandwidth" of these different network topologies like you are stringing a separate DSL to each person you open a TCP connection to.

    Available network bandwidth in a peer network like Gnutella is related only to the physical interconnect of the nodes. (i.e. whether they are on an SBC DSL line or sitting a North American OC-3)

    The only useful analysis is that which determines the amount of data-transfer required between each node (and all nodes) for common operations when using different topologies. When performing this study, you are looking for the topology which will transfer the smallest amount of data over the smallest number of nodes when performing searches. For a great analysis, see the Gnutella Performance Paper by Ritter [darkridge.com], referenced by the above paper.

    Careful analysis will tell you the same thing that common sense does -- that the best architecture involves centralized dedicated servers (supernodes), located on machines with the largest physical bandwidth available. (i.e. eactly what Napster did. )

    In order to create an efficient peer network which scales, Gnutella 'merely' needs to 1) order the network by physical topology, 2) identify the nodes with the best combination of physical bandwidth, longevity, and CPU/disk resources, and 3) fully utilize those machines as supernodes.

    Good luck. :)

  • Google or Babelfish don't have a fuzzy-math-to-English convertor tool.
  • Having done p2p for my Honours project (yeesh!), I can tell you right now that Gnutella doesn't scale gracefully. It ends up being a collection of nodes that tend not to be able to see each other due to bandwith limitations. So whilst you might see 2000-3000 hosts on a good day, it's going to suck up a whole load of bandwith.

    FastTrack on the other hand has a semi-heirarchical structure so it uses less bandwith, with those with the fastest connections doing the routing.

    (Plug: View my Honours dissertation by clicking here [n3.net])

    • I think your research (and dissertation) is a bit out of date. The architecture of LimeWire's "UltraPeers" is very similar to the architecture used on the FastTrack network. In fact, UltraPeer's use of intelligent query routing makes them almost assuredly more efficient than the similarly designed FastTrack implementation, although it's really difficult to know because FastTrack is not open source and its messages are highly encrypted.

      All that aside, FastTrack has not changed at all over the past 8 months, whereas Gnutella has made huge strides. What's more, many of the clients (like LimeWire and Gnucleus) are open source, and Gnutella is an open protocol. We need open standards like this to keep the Internet free (as in free speech).

"Protozoa are small, and bacteria are small, but viruses are smaller than the both put together."

Working...