Forgot your password?
typodupeerror
The Internet

New Peer-to-Peer Designs 138

Posted by michael
from the pie-in-the-sky dept.
We've received a lot of peer-to-peer submissions, including the one that follows and this one. Perhaps people will post links to those systems which they think have a decent chance of solving the known problems of p2p networks? PureFiction writes "Given the recent ruling against Napster and the various happenings at the Orielly P2P conference this is a good time to mention a new type of decentralized searching network that can scale linearly and adapt to network conditions. This network is called the ALPINE Network and might be a future alternative to searching networks like Napster and Gnutella while remaining completely decentralized and responsive. You can find an overview of the system as well as a Frequently Asked Questions documents on the site."
This discussion has been archived. No new comments can be posted.

New Peer-to-Peer Designs

Comments Filter:
  • We have no control over the ads. O'Reilly's ad probably is timed to coincide with their peer-to-peer conference, and the reason all these companies are unveiling their systems is also due to the conference, so there is some sort of correlation here, but it's not as if we chose to run a particular ad on a particular story.
  • I think you are taking generalizations too far. Each maintained connection uses a measurable amount of bandwidth, say <i>c</i>. If your total capacity is <i>B</i>, then you will be able to maintain roughly <i>B</i>/<i>c</i> connections. Of course, you need to have enough bandwidth left over to actually do useful work, so your actual bandwidth will be decreased by some arbitrary constant of your choosing. <i>B`</i>=(<i>B</i>-<i& gt;k<i>)

    Now, perhaps a T1 is a wide enough pipe for, say, 100,000 users. Maybe at some point the network will scale beyond this, and you'll need a T3, etc etc. The point is not to search the entire network, but to search a large enough segment of it to find what you are looking for.

    If you are searching for something extremely rare (or nonexistant) and your bandwidth is small with respect to the scope of the network you may be required to cycle your connections many times until you acheive hits. As intended, the network allows you to search at the maximum speed allowed by your bandwidth--but gives you the option of doing a long (but exhaustive) search regardless of whether you have a 14.4 or a gigabit connection.
  • split the search tasks up into hierarchies. you search for N inside a given range. if the result can't get found within that range, you propogate the request of the tree

    Which is exactly how Napster works. And look what happened to the company that hosts the biggest nap network [napster.com].


    All your hallucinogen [pineight.com] are belong to us.
  • by PureFiction (10256) on Thursday February 15, 2001 @08:23AM (#430165)
    We meet again, ;)

    Sending the same data to 10K hosts in separate packets not only doesn't scale, but it's an extremely antisocial abuse of the network

    Funny, I thought web servers acted this way...

    Even at 60 bytes per packet, if you're
    trying to send to 10000 nodes that's 600K. Then the replies start coming in - in clumps - further clogging that pipe.


    If you find the reply your looking for, then there is no need to query the remaining peers. Also, you will not clog the incoming pipe, i've covered this quite a bit, you control how many queries you send out and when, and also to which peers they are sent. The adaptive nature of the protocol ensures that successive queries will be more likely to find what they are looking for sooner.

    You would only query 10,000 in a worst case scenario.

    The traffic patterns ALPINE will generate are like nothing so much as a DDoS attack, with the query originator and their ISP as the victims.

    No, each of these 'victims' would only receive a single 60 byte packet. This is the opposite of a DoS attack, as you are sending a large number of packets, but each peer is only receiving one of them.

    Omnifarious, are a little naive, but well-known technology in mesh routing and distributed broadcast can easily enough be applied
    to create and maintain self-organizing adaptive distributed broadcast trees (phew, that was a mouthful) for this purpose. Read the literature.


    I understand what your getting at, but your missing the main purpose of this network. If you need to search a large number of peers for dynamic content in real time, you need to reach all of them to do it. Whether you do this using a tree/routing/forward approach, or a single peer using multiple unicast packets, you have to reach them to do it.

    The design of this network is so that the resources you use are your own and that you can tailor the bandwidth, peers, and effectiveness of the search to your own preferences.

    This is a highly specific network architecture with a very specific purpose using very small packets. This is why alpine can bend the common conceptions about scalability and performance and still remain efficient and scalable.
  • It said on the site that is would be using UDP protocols. I thought UDP connections were relatively unreliable, providing no error checking and other such services? Would that really make a good p2p network?
  • Hrm, from the FAQ:

    What about actually getting the data? Is it transferred over DTCP as well?

    ALPINE will be heavily dependant on alternate delivery systems to actually *transfer* the data located within the network. The entire ALPINE is primarilly used for information location. This is the big hole in most peer based networks, as it is probably one of the more complex tasks. Once a resource has been located, you may use OceanStore, Freenet, Swarmcast, FTP, etc, to actually retreive the data. Trying to transfer anything of decent size over DTCP would be insane.

    So it sounds initially they require the user to utilize two different programs to acheive their goal: 1) the Alpine to *find* the data, 2) something else to get it. I think in order for this system to reach widespread use (especially in the Windoze community), these two functions need to be combined into one interface. Is that not what helps proliferate Napster, people who barely know how to turn on the computer and quickly find and download stuff from one program. Perhaps they will incorporate both 'features' into a final product...or did I miss that in the faq?

    Secondly, doesn't this facilitate in finding an end users location? After finding the information, now I get to manually enter the IP address into FTP to connect and download. Does this not make it easier for a program to simply track down 'file X', log IP addresses to file and then resolve these IP's and hunt down the users? It would seem that in the early stages of the networks growth, it could be easily quashed by the corporate forces as the number of users would be small and easy to track down/handle. OTH, if it scales as easily as it says it does into the billions of connections...at that point it might become futile trying to track down and wrangle up everyone. Still, industries could start going after random individuals and will probably inact a new law dispensing severe penalities for those caught (probably from precents set nowdays in the Napster case).

    This is where having the same program find the files and transfer them could come in handy. Instead of ever presenting the final address, perhaps it could transfer this data amongst the network in an encrypted fashion. Then when the user see a match has been found for the data/file being searched, he/she tells the program to get it. Keeping the addressing route encrypted within itself should help issue of anonymous usage (I think this was mentioned earlier already as well).

    Interesting system concept anyhow (what with the multiplexing schema).

    Not your normal AC.

  • No, you only keep track of this information for the peers you are currently connected to..

    Oh ok. But that means I start all over again with the "adaptive" process each time I 'log on'. Probably ok, since statistically I'd be getting a different group each time. (People ever realize how people in your Napster Hotlist were never on ever again)

    I dont see your point. Each 10,000 ME's would have their own ISP, and would use their own bandwidth.

    How do the 10,000 others query him without getting to him? They have to get to his computer somehow. Thus...10,000 searches (at a time) going through the 1 client's bandwidth. (replace 10,000 with whatever number we're working with here).

    Yes, I've seen lights on when i was on Napster. But all the searching was one directional--> To Napster's server. You're bypassing that now. So that means more bandwidth coming to me.

    Rader

  • The ISP will have to handle 10,000 user requests of ME. And you can't reiterate the B.S. about throttling search requests

    I dont see your point. Each 10,000 ME's would have their own ISP,

    WTF??? No more than one user per ISP?

    and would use their own bandwidth.

    My God, you are so fucking stupid. Bandwith is never entirely "your own" - unless we're talking about an isolated home LAN - it is shared with many other users of your ISP (because the ISP has only a limited number of outgoing pipes) and from there on with the other users of the intervening networks. If 1000 people on one ISP start clogging up that ISPs pipes with this crap, and that ISP has a clue, they will kick those people off.

    You obviously have no clue about real world networking problems. You'd even make a bad marketroid. I suggest you become a management consultant.

  • see my post #130. I talk about a distributed index database that allows you to do searches with only one single search query. My example there is simplified but that is what I am talking about: revise the entire searching strategy. No matter how good your protocol is, it still inherits all the bad points about the worst scaling existing distributed protocol: Gnutella. You see, no matter how well you narrow down your query paths, you still are left with exponential growth problems. Don't compare Alpine to Napster, these are very different, Napster is centralized - one query per search. Alpine is distributed - send as many search queries as it takes until you find something. This does not solve the scalability problems, it just procrastinates the moment at which all networks go down under Gnutella or Alpine pressure.
  • Perhaps it would help if you, oh I dunno, provided a link?!
  • The more and more I read about P2P, the more it starts to sound like an internet within the internet. Can anyone post a model of just how we can ping someone across the net? I'm sure P2P is going to be solved with some form of decentralized hierarchy. The best machine get's the highest energy level. Like valence electrons or something. I fast machine(bandwidth first) would be higher on the heirarchy. Similar to an ISP. Your ISP has an ISP. His ISP has an ISP.... Instead of focusing on a centralized system of an individuals machine, we need to look at it like a bunch of protons/neutrons (fast machines) with a whole bunch of electrons(56k) floating around. or maybe go subatomic and work from the inside out. In reverse...
  • by smutt (35184) on Thursday February 15, 2001 @08:26AM (#430173)
    I came accross this amazing P2P system the other day that completely blew my mind. It scales well and can handle any kind of file type. It has mature clients for all major platforms including Linux/Solaris/IRIX/SCO/AIX/BSD/Windows even Amiga. It's so powerful it even includes a meta-search tool for searching for P2P servers.
    It's called Archie and the meta-search tool is called Veronica. You should try it out it's amazing.
  • That doesn't make any sense. (so it must be me who isn't making any sense).

    But if a modem user can only have 200 people to be connected to, then only those 200 people can be connected to him. That means Even if I had a OC48 running and "could" handle everyone's connection, i'm still offlimits to the modem people that have already "picked" their 200 people.

    If that's not the case, then maybe you're thinking that a modem person is only allowed to search 200 people's files, but can recieve searches from EVERYONE. That seems beckwards, since the network-hogging activity is the 1,000's of search requests you recieve.

    Rader

  • Well said. Here's my pet peeve about the P2P hype - companies that don't really fit the definition of P2P riding the bandwagon in order to get coverage in the industry press, for example distributed computing companies like Popular Power riding the P2P bandwagon [oreilly.com]. What the hell is "P2P" about distributed.net/SETI@home-style distributed computing? Peers never communicate, the only communication is with a central server.

    I think these companies are soon going to experience "Marimba Syndrome" - if you recall, Marimba rode the push wave of '97 until it became painfully apparent that the push emporer had no clothes, then they tried to distance themselves from it as quickly as possible and never really recovered.

  • The problem isn't how many queries you send out. It's how many you get. If people can query everybody in the network, it isn't going to scale. People with high-bandwidth connections will send out tons of queries because they can, and kill the clients on low-bandwidth connections. You can claim that eventually the folks on low-bandwidth queries will stop getting hits, but in the mean time they are DOA. And if new clients are coming up all the time and searching them, they'll probably never have a usable connection.

    Not to mention the problem of having the searcher send out an individual query to every client it want to search. If I understand this correctly, if I want to search 3,000 hosts I have to send out 3,000 otherwise identical packets. This is not what is known as a scalable protocol. In fact, from a network point of view, it's a worst-case scenario.
  • by Jim McCoy (3961) on Thursday February 15, 2001 @07:30AM (#430177) Homepage
    While it is probably not very important to the people reading this, there be dragons ahead for this project that I do not think the implementor is aware of. We implemented a system very much like this for Mojo Nation [mojonation.net] to achieve the swarm distribution (parallel downloads) which is one of the key features of our technology. Windows does not like to hold lots of open connections and you quickly eat up local resources and run out of file descriptors. It works like a charm under Linux and other "real" operating systems, but backporting this to make it available to the un-enlightened will be a very, very unpleasant task for whomever tries to actually implement this. jim
  • by deft (253558) on Thursday February 15, 2001 @07:33AM (#430178) Homepage
    and it will do so because this community just will NOT take no for an answer....there's too many bright minds out there. I'm personally interested in the guys over at www.musiccity.com in league with napigator.

    The main problem legally with napster is that there is a central server. That problem is being solved by having multiple and/or moving servers. This makes it much harder the levy a lawsuit against anyone.

    We all know napster works, but it's illegal (or will be soon). Warez is illegal, but it will never go away because you just can't prosecute.

  • by luqin (3559)
    ignore that last post please.

    ---
  • I dont' think we're talking about the same thing, but hey, if it isn't a flame, it's still a good topic.

    Let's take your example...of searching only what you need. Or if it's rare, and you need to slowly search everyone. I'm just saying that a T3 guy who can handle everyone is still offlimits to the modem guy that can only handle 300 people, and he's maxed out with the 300 nodes already 'picked'.

    Rader

  • Why is it that hotline never gets mentioned???

    Because Hotline is a pain. Yes, it's simple enough to use and has potential, but the reality is pretty unpleasant to use. You find that someone has something you're interested in. You have to see what particular hoops you have to jump through to get access "...go to this web site and sign up for this spam-bait..." To hell with that. I share over AudioGalaxy [audiogalaxy.com] and Napster [napster.com] and the BearShare [bearshare.com] Gnutella client because I like to share, not to try to make a nickle from people.

    Surely more bytes have been transfered over Hotline servers than ANY other file (not just mp3) sharing peer to peer system!!!

    Probably, but if you're interested in MP3s, and are not looking for movies or warez, everything else is less of a pain.

  • Orrin Hatch has said [inside.com] that he might introduce legislation in response to threats to Napster. He said that 50 million constituents writing to him and his collegues would compel it. I suggest that we start writing those letters.
  • obviously most of us are smart enough to realize that "p2p" and the "2-way web" are just rephrasing of the ideals and regrouping of the same technology that the internet is based on. for those of us that know what to do already, we just see a regroup and proprietizing of services and a deviation from standards.

    but then again, the web browser (and things spawned from it) is the interface Joe Blow knows well. *sure* he could run Apache, use and FTP client, use Gopher or WAIS, fiddle through IRC and Newsgroups... but all that came and went, arguably, when Netscape made the web browser big.

    my dad, a 48 year old man, doesn't like to juggle a different app for every service. however, my dad could easily be an non-technophile entreprenuer or a small business owner or an engineer of some sort running some sort of over-net collaboration...

    p2p is "amazing" to these people because it funnels all these other "mysterious" services into one window that they're willing to pay attention too.

    p2p == buzzwords. crap. silly. etc. but, then again... lots of idea are recycled. very few things are "revolutionary" or "insanely great."

  • IMO [NOT A TROLL] half a million p2p projects are out there. Most of them are vaporware or under development.

    However they are going slow as bananas - for obvious reasons. The people making p2p are legit, while most p2p users are not (pirates and like - come on, how many legit people are on napster?). The people that want to use p2p the most are the pirates - i.e. its safer... however ironically they are the ones that are willing to put in the least amount of effort...

    Don't get me wrong, I want p2p - I just think that it is funny everytime someone starts complaining about how sloooooooowwww it is going when they are not willing to contribute that much.

    What p2p needs is a set of rules/standards - or a company to make and release a freeware version of something and keep it updated. Ordinary programmers meeting over the internet is going to take forever - example:freenet - to produce a working product.

    Yawn... [I prefer IRC or HL anyway]
  • My God, you are so fucking stupid. Bandwith is never entirely "your own" - unless we're talking about an isolated home LAN - it is shared with many other users of your ISP (because the ISP has only a limited number of outgoing pipes) and from there on with the other users of the intervening networks. If 1000 people on one ISP start clogging up that ISPs pipes with this crap, and that ISP has a clue, they will kick those people off.

    Perhaps you misinterpreted what I meant with that statement.

    Regardless of any peering application or network you use, you will be using bandwidth. If this application is maxed out, your using *all* of your allocated bandwidth, i.e. your pipe. This happens all the time.

    ISP's continually operate at near peak usage. They dont leave lots of empty bandwidth laying around because someone *might* use it.

    Also, 1000 people on alpine would be no different than 1000 people on napster, a 1000 people on freenet, etc. Show me a peering application that does not maximize use of your bandwidth in a large network.

    And finally, you can tune the amount of bandwidth you use. If you want to use half of your DSL line, and leave the other half free for surfing, etc. you can. UDP gives you complete control over when and how large a packet is sent. TCP cannot do this, you can only send a buffer, and it may go out as one packet, maybe five, it may be delayed a fraction of a second, etc.
  • Good idea, and it would work... per se, as the one weak link in your idea is the fact that the "index" or search engine is still centralized (ie Google.) =)

    But you're on the right track. The difficulty is building a client that acts as a server too, while also being able to perform a distributed search of other clients.

    Cheers,
    Chris
  • Oh ok. But that means I start all over again with the "adaptive" process each time I 'log on'.

    You dont have to. Part of DTCP provides persistant connections. You can resume a connection when you log on, even if your IP and port changed in the interim. So, you only need to start the adaptive process whenever you create a new connection.

    Thus...10,000 searches (at a time) going through the 1 client's bandwidth. (replace 10,000 with whatever number we're working with here).

    Yes, you are correct. And that is where slow users would have a smaller peering group that they are connected to, as well as throttling peers who query too agressively. They can even outright ban peers who are abusing bandwith.

    They may also use a proxy, which would handle the replies.

    And last, you control how many peers you query and when. If you find what your looking for after querying 100 peers, then then is no need to query the rest.

    Likewise, if you start getting a large number of responses, you can slow or halt the broadcast of additional queries.
  • and my grammer, as well as my proof-reading, sucks when i'm in a rush, sorry.
  • Hotline is a little tricky to describe or download for a brief tryout. Briefly, it's a mini-BBS system designed for the Internet. I found out about it from a Salon article [salon.com] a while back, which I recommend (both parts [salon.com], in fact).
  • I think in order for this system to reach widespread use (especially in the Windoze community), these two functions need to be combined into one interface.

    You are correct, and they are combined. Right now a simple TCP transfer ala FTP/HTTP will be used, with additional transfer types provided using pluggable modules.

    Secondly, doesn't this facilitate in finding an end users location? After finding the information, now I get to manually enter the IP address into FTP to connect and download. Does this not make it easier for a program to simply track down 'file X', log IP addresses to file and then resolve these IP's and hunt down the users?

    Only if the refence you provide for the content is on your machine. You may simply provide a freenet key and the user can then obtain the file anonymously using freenet. You may provide an FTP location on some offshore server that is outside the bounds of US jurisdiction. It could be anywhere. The majority may be on your machine, but this isnt a requirement.

    Instead of ever presenting the final address,
    perhaps it could transfer this data amongst the network in an encrypted fashion


    The final address is only used during a reply. Where you actually get the data is another issue. So, for the paranoid, they may always upload their music into freent, but locate it using Alpine.

    This would be the best of both worlds for fast searching and anonymous downloading.
  • I think you are taking generalizations too far. Each maintained connection uses a measurable amount of bandwidth, say C.

    This amount of bandwidth is not a constant. Each connection shares total bandwidth b and then the adaptive nature of the alpine protocol as well as filtering and throttling ensure the fair use of this limited bandwidth. You can use as much as you want, this is a configurable setting in the DTCP stack.

    If you are searching for something extremely rare (or nonexistant) and your bandwidth is small with respect to the scope of the network you may be required to cycle your connections many times until you acheive hits. As intended, the network allows you to search at the maximum speed allowed by your bandwidth--but gives you the option of doing a long (but exhaustive) search regardless of whether you have a 14.4 or a gigabit connection

    Yes, and this is a drawback, but no diffrent than napster for example. If you cant find your song in the 3,000 to 10,000 peers on the server your at, you can keep searching, or try a different server.
  • I really wish the freenet [sourceforge.net] project would get up a little more steam and start creating say a nice Freenet to web interface and start having a community. Uncrackable and undestructable and totally anonymous!
  • by Salamander (33735) <jeff.pl@atyp@us> on Thursday February 15, 2001 @08:33AM (#430193) Homepage Journal
    • Sending the same data to 10K hosts in separate packets not only doesn't scale, but it's an extremely antisocial abuse of the network

    Funny, I thought web servers acted this way...

    Increasingly, in the era of second-gen content distribution networks, they don't. Where they do, they pay dearly for the privilege of sucking up so much bandwidth. I don't think you do yourself any favors by pushing a first-gen "solution" when the second gen is already out there and some people - such as myself - are already working on gen three.

    If you find the reply your looking for, then there is no need to query the remaining peers

    You won't get the answer until you've already sent queries to the next batch. Net result: not only are you consuming all this bandwidth and creating all this congestion, then you turn around and drop those packets on the floor. That's just adding insult to injury, as far as your upstream is concerned.

    The adaptive nature of the protocol

    Please describe how this adaptation occurs. The details are not on your website, it's a complex problem, and I think you're just handwaving about something you don't understand.

    you are sending a large number of packets, but each peer is only receiving one of them

    But the intervening routers are receiving them - and the replies - in huge clumps. That's just like a DDoS.

  • But if a modem user can only have 200 people to be connected to, then only those 200 people can be connected to him. That means Even if I had a OC48 running and "could" handle everyone's connection, i'm still offlimits to the modem people that have already "picked" their 200 people

    Thats correct. I havent gone into connection cycling, but lets say one of those 200 is a lowly rated peer as far as quality is concerned (again, this ties back to the quality metrics)

    This peer would probably decide to bump him off, and give you a chance. If you turned out to be a quality peer, you would migrate towards the top of his query list, and would be less likely to be bumped off in return.

    If you a rogue/leacher peer then you may end up in a situation where no one wants to allow you a connection, and your T1 goes to waste.
  • If it's a "truly flat network", wouldn't it be more appropriate to call it "The NORDIC Network"? :P
  • ISP's continually operate at near peak usage. They dont leave lots of empty bandwidth laying around because someone *might* use it.

    Yes, they do. Network traffic is notoriously bursty, so to accomodate peak usage a network provider does indeed overprovision so that during non-peak there's a lot of unused bandwidth. There are actually some really neat opportunities there, for a heavy-data-transfer application that's smart enough to use time-shifting and caching to move traffic off-peak.

    1000 people on alpine would be no different than 1000 people on napster, a 1000 people on freenet

    Wrong. Protocol affects bandwidth need/usage, and a brain-dead protocol will use more bandwidth than a cleverly-designed one to accomplish the same task. That's what I and apparently several others have been trying to tell you.

    Show me a peering application that does not maximize use of your bandwidth in a large network

    You're missing the point. There's a difference between effective bandwidth and total bandwidth (which includes protocol overhead). Maximizing effective bandwidth is good; maximizing total bandwidth is antisocial, and ultimately reduces the effective bandwidth left over for getting real work done. With your protocol, all of the capacity will be sucked into a black hole doing queries, and actual downloads will slow to a crawl.

  • The issue isn't whether some parts of the software are performing server functions and some are performing client functions - it's that everybody's a server, and there aren't any centralized resources - it's all decentralized. One big difference between a Gnutella-like P2P and your hypothetical Mozpache is that everything your client-side downloads in Gnutella or Napster is advertised for uploading by your server-side, and the object naming convention is something that supports this. By contrast, Mozpache could symlink the Mozilla cache directory and the apache exportable-files directories together, with a bit of work, but Apache doesn't have a useful way to extract information from fat.db, and the cache directory file naming convention isn't very exportable (random-looking names, but better than taking all the files and naming them "index.html".)
  • I don't see how. If a computer can connect to the internet (and thus other computers) there's really no stopping it.

    Sure, some apps might get stopped, but that just means we move back to something less crude. (until the next version of something nice comes out again)

    Rader

  • by Alien54 (180860) on Thursday February 15, 2001 @07:36AM (#430199) Journal
    From the FAQ -

    What about latency? I dont want to wait 2 minutes for a reply!

    Get a DSL Line! ;) Also, this is assuming you query the *entire* group. Part of the purpose of the ALPINE protocol is to adapt to the repsonse you receive during queries. The first query you make may take 2.5 seconds. The next you may query the responsive peers first, and you may find what you are looking for in 1 second. The next query may be further refined and your peers are organized so that you find what your looking for in a fraction of a second.

    You can only do this type of adaptive configuration tailored to *each* peer and their use of the network if you allow them to do the quering themselves, and order the queries themselves. This implies a direct connections to the people they are quering.

    You cannot perform this type of custom adative configuration without an extreme amount of overhead in a routed architecture, thus the need for DTCP.

    I do not know about you, but an awful lot of users out there do not have high speed access yet. And I can think of many folks whose first action would be to search everything.

    Remember, half the population is below average.

  • Wow...PureFiction is doing a heck of a marketing job for this, if nothing else. I can remember at least 4 comments (threshold 2) from that account on the Gnutella will never scale discussion which promoted this system.

    I don't know what the technical merits are, but the marketing is solid! :)

  • by Wraithlyn (133796) on Thursday February 15, 2001 @07:36AM (#430201)
    This looks really cool, however I forsee a lot of problems with users that don't have a direct internet connection. Namely, you cannot transmit a UDP packet to someone behind a proxy/firewall/NAT unless they have sent a packet out to you first. Still, they do mention NAT in the overview, so at least they are thinking of this.
  • If I remember correctly, one of the touted benefits of Gnutella, (but not Napster?) was that the transfer was (or could be) anonymous.

    I'm not sure how important this is, but will the described "flat" structure of this system allow both source and destination to choose anonymity (assuming both ends agree to it) - If so, how can the end requesting anonymity guarantee that they really can't be traced?

    Or perhaps this is just the imaginings of a madman?
    --
  • by dinky (58716) on Thursday February 15, 2001 @07:38AM (#430203)
    From the FAQ: (perhaps you should read it)
    How do you support 100,000 connections? Wont the host system crash long before then?

    These connections are all multiplexed over a single UDP connection. This is one of the functions of DTCP, to provide a multiplexing protocol for use over UDP. The multiplexing value is 4 bytes, which allows for a theoretical maximum of over 4 billion connections.


    This thing uses one single UDP socket so I don't think porting it to Windows would be too hard now would it.
  • You're missing the point. There's a difference between effective bandwidth and total bandwidth (which includes protocol overhead). Maximizing effective bandwidth is good; maximizing total bandwidth is antisocial, and ultimately reduces the effective bandwidth left over for getting real work done. With your protocol, all of the capacity will be sucked into a black hole doing queries, and actual downloads will slow to a crawl.

    No, queries only use half the bandwidth on a piep at most, and can be configured to use less.

    And I dont see what your implied difference is between effective bandwidth and total bandwidth.

    I can log activity on my internet interface for napster, for freenet, for gnutella, and they all max out my available bandwidth.

    I guess I dont understand what part of this your implying is different? Bandwidth in the overall internet with regards to these services? Bandwidth through my ISP?
  • I think I have seen this before, PureFiction isn't hypemaster extraodinar Zanshin is he?
  • And they're on high-speed T3 or OCx connections to the Internet, connections that are designed to handle such a load.

    Yes, that was a poor response on my part...

    What if your query isn't an exact match to one file? For instance, I'm looking for "songs by The Offspring, in .ogg or .mp3 format, at bitrate >= 160 kilobit/s,"

    You can query as long as you want, however, if you received 100 replies you could automatically halt querying to see if they would suffice. If you want more, you can continue. This is not an atomic operation.

    Also, the adaptive features would increase the likelyhood that you would find those hundred hits faster with each successive query.

    Its up to you how long, how far, and how fast you want to query.

    From every single user who's searching. Say a user searches a 20,000 user network once every 10 minutes (this takes into account inactive users). You'll have to handle (on the average) 2,000 queries a minute, over 60 a second. That's not even counting peak use. Can your hardware and network connection keep up?

    On a DSL line you could handle 40-60 queries per second, although, you would likely have a smaller network than this, or at least trottle or exclude the really noisy peers.

    A DSL line can handle over a thousand a second. This shouldn't be a problem, and again, this is all configurable and adaptable. You get to make the rules.

    But whenever I think of the obvious solution to this problem (proxies that cache search requests for a group of users), I realize that such a topology would be equivalent to that of the existing OpenNap network

    True, and there are specific instances where a multitude of other system would be more efficient and response than the alpine network.

    This is not intended to be the be-all-end-all of peer searching. It is simply a usable completely decentralized searching network, and in some instances, this may be a very nice solution.

    I can think of quite a few better ways to get certain things or lcoate certain things,

    Google, AltaVista, etc,
    OpenNap, Napster,

    etc. etc... but none of these are completely decentralized searching networks. That is where ALPINE is intended to function.
  • > No, each of these 'victims' would only receive a single 60 byte packet. This is the opposite of a DoS attack, as you are sending a large number of packets, but each peer is only receiving one of them.

    Well, *if* I am the only one to query. In the general case, they would receive 60 bytes packets for every query done in the network.

    This is the major flaw of all gnutella-like systems. If only the client knows what is on its disks, then you can kiss scalability good bye, no matter how hard you try.

    Cheers,

    --fred
  • I'll let everybody else hash out _how_ to get data from point to point and offer searching capabilities. What I'm concerned with in the "P2P" business is that everybody wants to erradicate record labels when it comes to MP3 transfering, right? So how do we do that?

    My take on it (and I'm sure there are others that agree.. I spent a few nights up 'till 2am reading thesis papers on this stuff) is how you CATALOG the data. Filename keyword searches suck -- really.

    I log onto Napster or another P2P network and I already know what I want because the radio has crammed some stuff my way I like. What I would like to see is mathematical representations of music "feeling" really. If I like a song I want to be able to grab it's feeling signature, plop that into a search engine and say I want songs that deviate 5% or so from that. It's outside the realm of P2P technology... but it relates to it. I don't want to sift through artist after artist to find something I "jive" with.

    This really applies to all media... take back ground pictures for my desktop. I like certain things, generally blue/green stuff (it's a personality thing). Discovering ways to mathematically represent this "mood" of something I think is totally essential for P2P to take off... and if it does take off people now have a way of finding stuff they like in a concrete mathemtical way.

    It's far fetched... but it's very exciting to me. I've considered going back to college simply to expound my mathematical knowledge (instead of computer science) to get this kind of handle on things. It's "uber cool" in my mind... anybody else think so? :)

    Justin Buist
  • > Remember, half the population is below average.

    Below *median*

    Cheers,

    --fred
  • You won't get the answer until you've already sent queries to the next batch. Net result: not only are you consuming all this bandwidth and
    creating all this congestion, then you turn around and drop those packets on the floor. That's just adding insult to injury, as far as your upstream is concerned.


    No, there is no batch. The query process is iterative, and can be halted, slowed, at any point in time. While there may be a dozen to a few hundred packets in transit before you start receiving replies, you can slow or stop the process once you see that you have enough replies, or that you have found what your looking for, or just want to cancel.

    Please describe how this adaptation occurs. The details are not on your website, it's a complex problem, and I think you're just handwaving
    about something you don't understand.


    Sure, there are various criteria that indicate a bad or good peer. These include, among other things:

    - Did the peer respond to your query?
    - Did the peer misrepresent the response?
    - Is the file or resource valid?
    - Is the peer sending you too many queries?

    Etc. The various properties, and other, control where in the list of peers to search an individual peer is located. A high quality peer, who often responds, has quality files, will be queried long before a peer that never responds will.

    For negative behavior there is even ban lists and so forth to prevent them from bothering you further.

    But the intervening routers are receiving them - and the replies - in huge clumps. That's just like a DDoS.

    Only your initial upstream router is receiving them, and from there the packets fan out to their respective destinations. Any any ISP that cannot handle the bandwidth generated by a customer has much more major problems.
  • My money has always been on Freenet, ever since it was made public. I will start making heavier use of it, when searchability is added. I am currently running a node, but I haven't inserted anything yet.
  • Talk about target marketing!! My rites have been violated by /.!

    Rader

  • if you received 100 replies you could automatically halt querying to see if they would suffice.

    Sorry, I realized that as soon as I hit submit.

    A DSL line can handle over a thousand a second. This shouldn't be a problem, and again, this is all configurable and adaptable. You get to make the rules.

    But a good dial-up connection runs at only 50 kilobit/s. Are you saying that users who want to use ALPINE should pack up and move to an area where DSL is available?

    This is not intended to be the be-all-end-all of peer searching. It is simply a usable completely decentralized searching network ... I can think of quite a few better ways to get certain things

    Not to put down the ALPINE system, but I'm beginning to think a completely decentralized approach just won't work for dial-up users.


    All your hallucinogen [pineight.com] are belong to us.
  • by Rader (40041)
    Makes sense to me. I'm going to write a book labeled "Whores through time" about an organization of hot babes that travel through time, and the minutes before a violent act, they pop in and seduce the perp.

    No way they'd resist...Kill someone, or have hot sex...hmmm..

    Problem solved. Till 10 minutes later .

    Rader

  • Check out Espra [espra.net].

    We released beta 1 at the p2p conference in SF today and things are moving at the speed of light in development. ESP Worldwide is funding the development and the program is free and open source (don't ask how we make money its too complex ;) ).

  • Below *median*

    Well, true, although with a genuine bell curve in a very large normal population, the results are identical.

    (I gotta fix this caffeine deficiency problem I have from time to time.)

  • Just create a well-known URL that runs a CGI to list the contents of your HD.
    --
  • Unfortunately, peer to peer networks that have the ability to allow persons to trade copyrighted material without compensating the owner of the work should be banned...according to this article [mp3newswire.net] about the European Parliament.

    Freedom of depress. The Linux Pimp [thelinuxpimp.com]

  • If people can query everybody in the network, it isn't going to scale.

    They cannot query everybody on the network. They can only query everyone they are connected to. So, modem users would obviously have a smaller connection pool compared to a DSL user.

    If a peer they are connected to is causing too much load, they have them slow down, or they drop them entirely.

    Someone on a T1 connection may indeed be able to connect to just about everyone, but they would also have the bandwidth and memory to do so.

    Not to mention the problem of having the searcher send out an individual query to every client it want to search. If I understand this correctly, if I want to search 3,000 hosts I have to send out 3,000 otherwise identical packets. This is not what is known as a scalable protocol. In fact, from a network point of view, it's a worst-case scenario.

    Worst case scenario is a forwarded broacast. And at any rate, 3,000 queries to find what your looking for is indeed a worst case search.

    Part of the alpine protocol is the adaptive configuration of he query list so that quality peers are queried first, thus greatly increasing the chances that you dont need to query more than a few hundred to find what your looking for.
  • But a good dial-up connection runs at only 50 kilobit/s. Are you saying that users who want to use ALPINE should pack up and move to an
    area where DSL is available?


    They would most likely try to locate a proxy peer. They may have to gain favor by providing quality content or resources, or simply ask nicely to get a procy connection, but they are not left out in the cold.

    Also, the network is still usable for modem users. It may take a few minutes to locate something, assuming you have to search everyone in your peer group, but even that may be acceptable to most users.

    The adaptive configuration of searching should reduce this as much as possible, so that you may only query a few hundred peers before you locate what you're looking for.
  • Napster wasn't just a way to get illegal music, it was a highly available, high capacity filesystem. Commercial products like that usually cost a bundle and require a team of specialists to configure and maintain. I think we're gonna see applications stop using databases and start using big p2p networks.
  • If I'm following this, then the user who can handle 200 connections only performs his search on the 200 people they are directly connected to. Queries are not forwarded from client to client like in Gnutella, correct? So doesn't this just mean that instead of a single global search space, the users are divided up into lots of little islands depending on the quality of their link? While this approach might very well lead to a practical application, it doesn't seem to be addressing the scalability problems encountered by Gnutella. Rather, it seems that it's just a method for making sure that the users get partitioned into lots of separate networks whose size is small enough that the scaling problems don't crop up.

    I have a guess I know what your answer is going to be from later posts, so let me see if I can guess. The answer is that clients will do the search, and if it fails then they will begin dropping the "lame" connections and opening new ones. The problem with this approach is that it will probably generate the same scaling issues that Gnutella faces. The overhead of the clients constantly closing and opening connections might begin to consume significant amounts of bandwidth. Clients who are "full" can still have their bandwidth eaten up by other clients asking them to open new connections. Not to mention the problem of discovering clients to connect to in the first place. The addresses of clients needs to be flooded thru the network somehow. Nobody is going to type in 200 IP address into their config! How well is this part of the protocol going to scale?
  • It's hard to say how big of a pipe you would need to run a server. It really depends on a lot of factors if you want to get any kind of estimate. How many servers are in the network? How many clients? How many server-server links? How many clients on the server? Are the servers caching responses (essentially trading local disk and memory usage for network usage)? How efficient is the server-server network? Do servers forward queries they can answer on to other servers? If no, you may miss relevant hits. If yes, the queries consume a lot more bandwidth. There are a lot of design choices to be made, and don't know what the right answers are (if any answers can even be said to be "right")

    That said, there are a lot of college students with T3 quality connections. A lot of cable modem's might be suitable also.

    The hard part is designing the server-server protocol so that you don't waste any bandwidth with redundant copies of the same query. When a server sends out a query, it should get sent to every other server exactly once. This is a tricky problem. Fortunately, there is a solution. I hesitate to mention it for fear that some coder who isn't half as "l33t" as he thinks he is will screw things up.

    The solution is: multicast. Most major universities are connected to I2 which provides native multicast. These same connections also tend to be very fast. This solution pushes all the complexity of getting the queries broadcast out to all the servers onto the network where it belongs. The problem that needs to be solved is a multicast problem. I.e., a single host sends a single packet to a subset of hosts on the network which want to receive that packet. All the servers need to do is join a given multicast group address. When they want to issue a query, they put it into a UDP packet and send it to the group address. The network automagically sends a copy to all the other servers listening to that group address. Any server who has an answer replies back to the issuing server directly. This functionality could probably be dropped into OpenNAP with about 100 lines of code.

    Right now, most folks on the Internet don't have multicast connectivity. But, if you have a tiered network, only the servers need to be on the fast multicast connections. The clients just need to connect to one of the available servers via unicast. It's so simple I'm really amazed no one has done it already. Of course, now the scalability issue has been pushed into the network's multicast implementation. I'm guessing that you could build a network with a dozen servers without I2 even noticing. They probably wouldn't complain if you cranked it up to a hundred servers. When your network reached a thousand servers, you will start to get hate mail from the network administrators. But, would there be any need for a thousand servers? Probably not until you had over a million users. BTW, I'm guestimating the number of servers you can have based on the experience of the ramen worm, which had a bug that created tables in the routers similar to a multi-thousand server network.
  • Someone earlier mentioned that the RIAA/MPAA would close down all the P2P programs next. Well there's lots of technical arguments against that, I figure the only method left for the RIAA to take it to the next step is to start setting up sting operations.

    They pose as under-cover traders (heheh) and they trade with you. Under Policy Act 5.4.11.c they log your illegal activity, turn it into a judge and then prosecute you at the $25,000 - $100,000 dollar fine for each copyright violation.

    However, I think the mild trading done over the internet would be small fries compared to the assholes like me who trade 100's of albums at a time through the mail.

    Rader

  • Wawawawaawawawaweeeeeeeeeeee!

    BOOM!

    Inappropriate!

    ------

  • ...and thus resistance to attack. There are two kinds of centralization here: legal centralization and network centralization. Napster has a single point of "legal failure"--you only have to sue one entity to bring it down. It also has relatively few network points of failure. The opposite is Gnutella, which is as unusable as it is unsueable. We've already demonstrated that a network as centralized as Napster won't work for legal reasons. However, there are existing networks out there that have enough decentralization to be highly resistant against lawyers, but are centralized enough to take the bandwidth burden away from the ordinary user.

    There are already two systems in place that would work. The first is the oldest form of P2P on the net: IRC. IRC fileswapping channels have been around for a while; the problem is that they don't have the "critical mass" of users to make them really useful. Someone, however, should write a script that reflects searches from one channel to the other. For instance, if someone sends out a search on channel 1, the bot will send the same search to channel 2. If it is sent any results, it will echo them to the original searcher. This doesn't put any bandwidth burden on the original searcher, but extends his search radius considerably, especially if the reflectors are configured to examine multiple servers (for instance, #mp3 on DALnet reflecting to #mp3 on EFnet. I don't know how you'd write it as a mIRC script, but it could be done in C...). The bots would probably be configured to keep a record of all the fileswapping channels that they have heard of. (Reflector bots should mention all the channels that they are reflecting to/for every 10 minutes or so). When the bot logs on, it should join each of its list of channels, and then determine which channel links are already being maintained. This is easy to do: send a query and see if someone reflects that query to the other channels. If any channels do not already have a reflector linking them, the bot starts bouncing queries between them. It then leaves all channels that it's not serving a function in, to cut down on its bandwidth use. Anytime a reflector hears another reflector give its periodic status report, including a line like "I am a member of channels #foo, #bar, #fnord, etc...", it should add any new channels to its list.

    A system like this is halfway between Gnutella and Napster in terms of bandwidth use. (The regular users in the channels who aren't running servers could safely squelch reflectors to cut down on bandwidth.) It has no single legal point of failure: the IRC network is protected as a common carrier with a significant non-infringing use, and there are too many people running servers and reflectors to sue them.

    note: I'm merely talking about technology, not endorsing or condemning its use for any specific purpose. -Entropius
  • From what I can tell, if i root a few boxes on nice fast t3 lines then i can send out a query to all hosts on the alpine network searching for all songs with a "3" in the filename. That should return every mp3 on the planet :)

    Now since the query is udp based I can spoof the return address to www.ebay.com. Ooops :) For every tiny search packet I send out, hundreds of results get fired at ebay from a whole pile of hosts that would normally use their site for legitimate reasons.

    Also you cant throttle your bandwidht on alpine. Certainly you can send out less searches but if you have 100,000 users online then there will be about 5000 searches a minute, and every 33.6 user will have to download all thoses queries.

    Mathematically it doens't work out quite as badly as gnutella, but it still sucks :)
  • From what I can tell, if i root a few boxes on nice fast t3 lines then i can send out a query to all hosts on the alpine network searching for all
    songs with a "3" in the filename. That should return every mp3 on the planet :)


    Yes, you could, but only if they allowed the connection.

    Second, they would not send an entire list of MP3s. They would send back a single packet that contains the number of hits found. like, 1,234.

    To get the list of MP3s you need to do some more work.

    Now since the query is udp based I can spoof the return address to www.ebay.com.

    No, the return is sent to the originating connection. That is you. The handshake protocol for establishing a connection makes this as resistant to spoofing as TCP. (Which isn't perfect, but at least its better than nothing)

    Also you cant throttle your bandwidht on alpine. Certainly you can send out less searches but if you have 100,000 users online then there will be about 5000 searches a minute, and every 33.6 user will have to download all thoses queries.

    Again, a modem user will have fewer connections. And the response to broadcast queries is only a single packet with the number of hits found, if anything is sent at all.

    The combined configuration of how many searches you perform, and how many connections you have active control how much bandwidth you use.
  • The query process is iterative, and can be halted, slowed, at any point in time.

    The more you throttle it down, the longer it takes to get past the overwhelming majority of negative response to the few positive ones, so you can have slow response because you didn't throttle your traffic or slow response because you did. Yippee.

    Sure, there are various criteria that indicate a bad or good peer. These include, among other things:

    Is a framework for collecting, collating, and using this information already thought out, or did you make up this list only in response to my query?

    Only your initial upstream router is receiving them

    You need to look further than that. If you have a Napster-like number of users there will be thousands of routers out there connected to thousands of ALPINE users each generating queries. When you multiply things out to get total traffic, as was just done for Gnutella, you do get a level of traffic that will make the router owners sit up and take notice.

  • Why don't we write a P2P program that just piggybacks on the power of IRC servers? This is a protocol that can't be shut down, and has decent scaling properties.

    A front end could be written so that no one even has to know that the info is being sent through IRC on the back end.

    Rader

  • If you'd read a little further, you would have noticed that there's an answer to this question on the site. The main reason appears to be that maintaining TCP connections is to resource intensive for the job (they want something like 50000 or even more connections). On top of that TCP is a bit overkill for the purpose. The connections don't have to be reliable and timeouts are no real problem.

    In fact it is really a smart way of reusing the way IP provides peer to peer connections. This DTCP thing sounds like it just might work (unlike gnutella for instance).
  • Because then you'd have iMesh.
    A listing of what is online and offline. Mostly offline due to statistics.

    How would you know what was available NOW? Not only that, but posting UP info isn't going to happen. Look at all the leeches that were on Napster. Not only that, but lets say people did try...they're still not going to reasonably post UP their changes all the time. Maybe we could automate it. But now you're never going to find Free anon web pages that can handle that.

    Rader

  • by OlympicSponsor (236309) on Thursday February 15, 2001 @07:45AM (#430234)
    Isn't "p2p" the same as "client/server" in the special case where client==server? So, for instance, HTTP is P2P if I'm running netscape and apache and so are you and we connection to each other? Or does it only count as P2P if it's a single piece of sofware? If so, then I'm announcing Mozpache, a web browser AND server.

    "But how do you search," I hear you cry. How do you search NOW? Google, right? Same deal here, just use DynDNS (or whatever) to get the link to stay stable.

    "P2P," sheesh--it's amazing what some people think is amazing.
    --
  • by roman_mir (125474) on Thursday February 15, 2001 @07:46AM (#430235) Homepage Journal
    This system does not fully eliminate the Gnutella problem of having too many search queries on the network. With Gnutella your queries will be propagated from your node to all the nodes you are connected to and then to all the nodes that your neighbours are connected to, which creates search clashes (same node searched gets the same query from neighbours over and over.) With Alpine the overlap is eliminated but the point is, you still will have to search every node every time you want to find something. I do not see Alpine as a huge step forward in terms of scalability, what they achieved is basically elimination of repeated search queries but not the real problem - sending as many queries as there are users. I am not sure whether they will eliminate Ping Pong, I don't think so.

    It is necessary to rewise the entire searching stragegy, not simply linearly reduce the number of queries.
  • I know Napster this and Napster that, but we are talking about something that is much bigger. P2P sharing will always be around. Before Napster there were DCC bots on IRC and ratio FTP servers that were basically the predecesor to P2P. People upload, people download. There's just a layer between. People are getting more and more used to this type of sharing anyway.

    There are hundreds of ftp server applications for Windows 98, or whatever. When a large group of people learn to put up their own ftp servers, there's nothing sponsoring this other than the end users. It's at their own risk. There may not be pretty interfaces and chat rooms anymore, but seriously, did any of you ever use that?

    In the future, I see listserves with people sharing today's port and password to a community of millions.


    Dissenter

  • by PureFiction (10256) on Thursday February 15, 2001 @07:47AM (#430237)
    Yes, you are correct. And you will always send a packet first. If you are behind a NAT firewall this will be a NAT discovery packet.

    A reply is then returned which has your masqueraded IP and port which the NAT router is using. From this point on, this masqueraded address is what you use to identify yourself.

    Some systems may need to turn on loose UDP masquerade or the equivalent to allow reply packets from sources other than the initial destination to which you sent the discovery packet.

    There are additional details, but the end result is NAT users are supported.
  • by HairyBN (252481)
    This one [flipr.com] is a free application that provides people with a way to share music and other media while keeping track of all the transfers to be able to pay the artists.

    Check it out.. The server is on linux too...
  • Could have it designed so that you send out a query to say 10 peers, but each time they send to say 60% less. That way you dont keep sending querys forever, but chances are you will find stuff. The query just widens out, but loses its intensity like a ripple in a pond.
  • Even at 60 bytes per packet, if you're trying to send to 10000 nodes that's 600K. Then the replies start coming in - in clumps - further clogging that pipe.

    Funny, I thought web servers acted this way...

    A web server only sends out to it's 10,000 users. Those users aren't also web servers sending out 10,000 packets each. Web servers are getting away with murder compared to 10,000 searching ALIPINE users.

    Rader

  • ... No, each of these 'victims' would only receive a single 60 byte packet. This is the opposite of a DoS attack, as you are sending a large number of packets, but each peer is only receiving one of them.

    Yea, each victim only gets ONE single 60 byte packet. FROM ME. But we're talking about 10,000 users doing the same, then ALL of them will be getting 10,000 packets.

    There is only one thing in the back of my mind that would support where you're going with this information you're sharing...is that your research shows that 90% of the people connected are just connected to be nice, (went to bed, etc) and they're not active. Leaving a rotating 10% of active users. (active=searching)

    Rader

  • Project ELF [projectelf.com] looks pretty cool. It uses "tag routing," in that each ELF packet (encapsulated in TCP/IP) has a unique 6-byte tag. I assume that each client remembers the incoming IP address associated with each tag. File requests sent to a client are broadcast (i.e. sent out to all known IPs except the one the file request came in on), and search hits are reverse-path-forwarded using tag routing.

    My main concern about Project Elf is its scaling issues using "broadcast" packets for searches...what happens if there are a million clients out there?
  • Archie plus apache plus *ftpd plus Linux/*BSD

    BTW, I've got this great idea for a round device. You put a stick thru the middle of it and you can easily move things around. Any ideas on how to improve it?
  • by Rader (40041) on Thursday February 15, 2001 @09:31AM (#430254) Homepage
    .... No, there is no batch. The query process is iterative, and can be halted, slowed, at any point in time.

    What is an appropriate sized batch? 200 queries at once? 100? Seems like searches will take forever by stalling a query.

    Sure, there are various criteria that indicate a bad or good peer. These include, among other things:

    Wow, this seems like a lot of information to keep track of on the client side. Not only am I keeping track of every IP-node user out there, but I have to keeep track of it over time. In a napster-success scenario, I'd have 2 million entries to keep track of. Not only that, but it seems like a lot of wasted overhead? Even if a user doesn't have what I want, I have to compute statistics into his/her record each time.

    ... Any any ISP that cannot handle the bandwidth generated by a customer has much more major problems.

    Um...look I'm just one user. Any searching done by me, yes, is only one person's activity. But I'm logging into a group of 10,000 active users? The ISP will have to handle 10,000 user requests of ME. And you can't reiterate the B.S. about throttling search requests. That's like saying there'll be less pee in the world if we all just pee'd slower.(yes, the only analogy I could think of. I'll brb, i gotta go P)

    Rader

  • This is an idea friends and I have talked about. Allowing anyone with the resources to become more of a server, while other users connect to these various levels of servers. Reminds me a lot like IRC.

    However, once you start doing this, the popular servers will get pressured from the RIAA and be forced to shut down.

    So what sized machine/bandwidth are we talking about being able to handle being on the wide "backbone" you spoke of? I'm curious to see how many people out there would be able to be part of the backbone of the system. From what I've seen, the bandwidth is more important than the speed of the computer (the ratio of computaton vs. bandwidth being pretty small, so any decent computer could handle the computations). If only T1's were a requirement, then I'd see quite an inexhaustible supply of volunteers, but if it required more than a T3, then I see an easy target for the RIAA.

    Rader

  • you still will have to search every node every time you want to find something

    This is not the case. You only have to search unitl you *find* what your looking for. This is a big difference, and part of the ALPINE protocol is adapting to the responses and peers your communicating with to ensure that you search fewer peers each time your looking for something.

    This is covered in the documents, and is a major benifit. The network adapts to your preferences and optimizes accordingly.
  • I would like to start building a P2P system based on the ideas here [slashdot.org] and The StreamModule System [omnifarious.org]. I expect that it can be built fully decentralized and completely scalable. I also want a lot of careful protocol documentation along the way so people can easily see how to works so holes can be poked before it gets too big.

  • by Salamander (33735) <jeff.pl@atyp@us> on Thursday February 15, 2001 @08:08AM (#430264) Homepage Journal

    I have to admit that it's a little bit strange posting something with such a subject line from the conference hall at the O'Reilly P2P conference in SF, but I can't help myself.

    Implementing a pseudo-broadcast by sending separately to all destinations is stupid. Real network designers have known this for years. First off, to send to N destinations you have to shove N packets down your local pipe, which may be narrow. Even at 60 bytes per packet, if you're trying to send to 10000 nodes that's 600K. Then the replies start coming in - in clumps - further clogging that pipe. That single UDP socket you're using does have a finite queue depth, so it will start dropping replies left and right after the first few. Well, maybe not, but only because your ISP's routers will have dropped them first because they overflowed their own queue depths.

    Sending the same data to 10K hosts in separate packets not only doesn't scale, but it's an extremely antisocial abuse of the network. The traffic patterns ALPINE will generate are like nothing so much as a DDoS attack, with the query originator and their ISP as the victims. In the same Gnutella thread in which you started hyping ALPINE, some slightly clueful people were suggesting tree-based approaches. Those ideas, as stated e.g. by Omnifarious, are a little naive, but well-known technology in mesh routing and distributed broadcast can easily enough be applied to create and maintain self-organizing adaptive distributed broadcast trees (phew, that was a mouthful) for this purpose. Read the literature. The pitfalls in what you're suggesting are already so well known that they should be part of any computer-networking curriculum, and much more reasonable solutions to the same problems are only scarcely less well known. There is no need to reinvent the wheel, especially if your wheel is square.

    As Clay Shirky mentioned in his talk here yesterday, "peer to peer" can be considered a little bit of a misnomer. It's a lot more about addressing and identity issues, and even more about scalability, and having N^2 connections in a network of N nodes is no route to scalability. ALPINE's scaling characteristics will be worse than Gnutella's. Pemdas made a good point [slashdot.org] that you seem to have a talent for marketing. Stick to it. Unlike Pemdas I can evaluate the technical merits of what you're proposing, and you are headed 180 degrees away from a solution.

  • by BeBoxer (14448) on Thursday February 15, 2001 @08:10AM (#430265)
    This is absolutely correct. I talked about this in the Gnutella scalability thread yesterday. Even if you ignore the overhead of your "backbone", the process of even trying to send every query to every client is fundamentally broken. If you want to support people on less than 100bT dorm networks, this is not going to scale.

    Just figure out how big a query is, then figure out how many queries per second have to be in the network before all of the client's bandwidth is consumed. If you estimate a query packet to be 1000 bits, your modem users max out at 56 queries per second on the network. And that's an absolute best case which will never ever be acheived in practice.

    Until this problem is addressed, these networks will never scale. You have to have some hierarchy of high-bandwidth servers which get the queries and low-bandwidth clients which don't. This can still be a truly distributed network, but you have to distinguish between the machines that have the resources to handle lots and lots of queries and those that don't.

    Imagine a two-level network where you have a Gnutella-style network of OpenNap servers which the napster-style clients connect to. The servers distribute the queries amongst themselves to perform the searches. Each server knows what files it's clients are sharing Napster-style, and can answer for them. With this architechture, the well-connected hosts on cable networks and dorm subnets do the heavy lifting of the searches while the dial-up clients get good performance because they aren't being clogged with a bunch of queries. The network scales better because you aren't trying to do lots of work on really slow links. Your network is also more stable because you don't have the clients (which come and go quickly) changing the topology of your "backbone".
  • The point you have to remember is that you control exactly how much bandwith you use for queries and how many peers you query. Also, the alpine protocol adapts to the responses you receive so that you tend towards a more efficient search.

    Similar peers that have similar content and quality service will graviate towards the top of each others query lists. Thus, these higher quality peers will be queried before the others (if the others are queried at all).

    The net result is that ech query you make with success enhances the probability and speed with with the next query will be answered.

    For example, napster has grown to millions of users, but whever you execute a napster query, you are only searching among a grpoup of 3,000-10,000! And these are randomly selected.

    Alpine will allow you to search 3,000 to 100,000+ of *selective* peers, which you have tuned to optimial result.
  • It is not a simple matter to trace the requestor of a file on Freenet, unless the attacker can do some good traffic analysis. Read this [sourceforge.net] and dive into the documentation if you have doubts.
  • They cannot query everybody on the network. They can only query everyone they are connected to. So, modem users would obviously have a smaller connection pool compared to a DSL user.

    Someone on a T1 connection may indeed be able to connect to just about everyone, but they would also have the bandwidth and memory to do so.

    You seem to be contradicting yourself. If a modem user can limit (or has to limit) the number of connections in his/her group, then how is it possible for a T1 user to have everyone in their group? Both cannot happen.

    Rader

  • Not only am I keeping track of every IP-node user out there, but I have to keeep track of it over time. In a napster-success scenario, I'd have 2 million entries to keep track of. Not only that, but it seems like a lot of wasted overhead?

    No, you only keep track of this information for the peers you are currently connected to.. This may be 3,000 to 10,000 for a napster sized group (not all one million napster users are on the same server!) or more if you have a beefy machine that can handle it.

    It is entirely up to each user how many connections and how much bandwidth they wish to use.

    The ISP will have to handle 10,000 user requests of ME. And you can't reiterate the B.S. about throttling search requests

    I dont see your point. Each 10,000 ME's would have their own ISP, and would use their own bandwidth.

    Ever watch your modem/DSL lights when your on napster? This is no diffrent, and the throttling does work, unlike TCP streaming where the bandwidth is alsways wide open (unless you excplicitly trottle sending in your application).
  • by roman_mir (125474) on Thursday February 15, 2001 @07:56AM (#430277) Homepage Journal
    Yes, I read that too, note that statistically less than 30% of users have what you need and out of those 30% not everyone will let you download what you want. Let's say that in the best case scenario Alpine has a network that can run 70% faster than Gnutella on networks with large node numbers, this is good, but only linearly good, exponential growth of the network will cause the same problems with Alpine that exist with Gnutella, since infinity/2 is still infinity :)
  • I'm not implying that you intentionally tied the ad to the story, but I am implying that perhaps you were more inclined to run the story because of the ad. In the future you might want to consider checking to see if you have an ad running which is directly related to a story you're planning to post, and adding a brief disclosure statement ("Slashdot is currently running an ad for the book 'Peer-to-Peer' which is published by O'Reilly") if there is such an ad.


    fnord.
  • Peer to peer will always survive and it will do so because this community just will NOT take no for an answer.

    I hate to do this, because it paints P2P technologies with an unethical light, but if there is ever an official P2P war, it will have the same results as the war-on-drugs, or prohibition of alcohol, or trying to keep Marijuana illegal.

    I truly believe that pot will (eventually) become legal to grow, and smoke, and the governments will tax it heavily (as they do tobacco) and profit from it. I'm not HOPEFUL that this will happen, nor am I opposed to it. I just believe it will happen.

    The "war-on-drugs" is mildly successful, but, if I wanted to go our and get a shot of heroin, or a cap of Mescaline tonight, I wouldn't have a whole lot of trouble finding someone to sell it to me.

    And we all know how prohibition of alcohol turned out.

    Warez is illegal, but it will never go away because you just can't prosecute.

    You _can_ prosecute, it's just difficult. It's a losing battle. Prosecuting one person in one town isn't going to solve anything, and prosecuting too many people just becomes ultimately more expensive than the projected "loss" by individuals 'pirating' your software.

    It's like arresting one junkie for possession. It doesn't solve the problem. Our prisions just aren't big enough to hold everyone who violates the law, which is why we have varying levels of prosecution.
  • You seem to be contradicting yourself. If a modem user can limit (or has to limit) the number of connections in his/her group, then how is it possible for a T1 user to have everyone in their group? Both cannot happen.

    It would be very unlikely, but all that would need to occur is that one of the 10,000 connections that every peer has would be to the T1 server. The rest of the connections may be to random peers, but the T1 user would still be connected to everyone, while everyone else maintains only 10,000 connections.
  • Problems:
    1. no standard (growing) packet sizes leading to real delivery failures and more.
    2. packets can not interact with each other and cancel each other, so only one packet can be sent to query the entire network. This is not bad but drastically reduces searching speed, since the packet will have to traverse the entire network and return to you. Also the packet will have to keep trace of the entire rout with every traversed nodes (imagine the size of the packet by the 100'th node) it'll probably be lost if a recieving user node is dropped before the package is redirected... how long are you willing to wait for a response for your search query?
    3. the worst part is that there is no heuristic for the search, just because your packet is on node A and nodes B and C are connected to node A, there is no way to predict which direction to go, there is no preference in B and C.

    But there is still hope, it should be possible to build network where the search is done on a number of self proclaimed servers that index the rest of the network. These servers must have a number of clones so that no info is lost once server goes off line and the distributed index should be able to update and redistribute itself. This will reduce the total number of search packets sent within the network. Primitive example: Imagine 26 nodes on the network, each one has info on all files stored on the net that start with a particular letter of English alphabet. The servers a cloned a few times and your queries go to the closest server that has info on your query that (for example) starts with a particular letter.......
  • I can't find any detailed technical spec on Espra, at least there are some on Alpine, till then Espra is at the bottom of p2p appl that I will listen to till it faces the real world. Alpine is already getting critisim for questionably "flawed" design... Let's not get burned by the p2p hype, take it slow guys...

  • Funny, I thought web servers acted this way

    And they're on high-speed T3 or OCx connections to the Internet, connections that are designed to handle such a load.

    If you find the reply your looking for, then there is no need to query the remaining peers

    What if your query isn't an exact match to one file? For instance, I'm looking for "songs by The Offspring, in .ogg [xiph.org] or .mp3 format, at bitrate >= 160 kilobit/s," in whatever query language the system uses. (I picked a random P2P-friendly band.) I'm not "Feeling Lucky [google.com]"; I know my query is vague, but I want to survey the net around me and see what Offspring tracks are on hosts close to mine. The reply is the set of results I get back, not just the chronologically first element.

    If, on the other hand, I typed in "artist contains Offspring, title contains Pretty Fly, length within +/- 3 s of [whatever the real length is], Ogg Vorbis format, bitrate 160-192 kbps, on a persistent connection," I would accept a "first reply" response.

    No, each of these 'victims' would only receive a single 60 byte packet

    From every single user who's searching. Say a user searches a 20,000 user network once every 10 minutes (this takes into account inactive users). You'll have to handle (on the average) 2,000 queries a minute, over 60 a second. That's not even counting peak use. Can your hardware and network connection keep up?

    But whenever I think of the obvious solution to this problem (proxies that cache search requests for a group of users), I realize that such a topology would be equivalent to that of the existing OpenNap network.


    All your hallucinogen [pineight.com] are belong to us.
  • The packet steamers of the 19th century were the first example of packet pier to peer communication.
  • There is a new peered sharing network, Project ELF [projectelf.com], which allows truly anonymous sharing of any file type. The point of this system is actually privacy, rather than speed, but there are some features which will actually make it faster the larger the network gets, including downloading pieces of the same file from multiple sites simultaneously. Pretty cool!

How many Unix hacks does it take to change a light bulb? Let's see, can you use a shell script for that or does it need a C program?

Working...