Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
The Internet

Can Poisoning Peer to Peer Networks Work? 391

andrewchen writes "Can poisoning peer to peer networks really work? Business 2.0 picked up my research paper from Slashdot and wrote an article about it. In my paper, I argue that P2P networks may have an inherent "tipping point" that can be triggered without stopping 100% of the nodes on the network, using a model borrowed from biological systems. For those who think they have a technical solution to the problem, I outlined a few problems with the obvious solutions (moderation, etc.)."
This discussion has been archived. No new comments can be posted.

Can Poisoning Peer to Peer Networks Work?

Comments Filter:
  • by CreatorOfSmallTruths ( 579560 ) on Tuesday September 03, 2002 @10:12AM (#4188572)
    By trying to deactivate part of the net you can't stop all of it.
    For example , lets take a net of 2^n nodes, and lets say 80% of which have been poisoned ... the other 20% will still be able to resist the attack.
    take, for example, IRC - splits will never kill it (while I am saying splits I really reffer to poisoning, ofcourse).
    Another example is the iraqui internet during the golf war. it didn't came down. why ? because when using distributed networks (such as P2P and the net itself) the resistability is just plane great.
  • by Kragg ( 300602 ) on Tuesday September 03, 2002 @10:14AM (#4188587) Journal
    Although this idea [checksums] works for newsgroups and some other centralized services, it does not with P2P. Basically, it comes down to the fact that you must trust whomever is actually doing the checksumming, or else they can just lie and publish false checksums. In the case of P2P networks, the checksumming is done by the same person you want to figure out if you can trust! As far as I know, this is an unresolvable problem.


    So, um... how about this... If it's a standard file, such as, say, the deviance rip of neverwinter nights, or the new MPEG of Two Towers, then it should always have the same checksum.

    Somebody somewhere needs to maintain a website with these checksums on. Then there's no dependence on the person who you're pulling the file from.

    Obviously doesn't work for random porn videos (although it would for more popular ones... which might also tell you whether they're any good).

    And there's nothing illegal about it.

    Problems?
  • faked hashes (Score:3, Interesting)

    by vurtigo ( 605110 ) on Tuesday September 03, 2002 @10:24AM (#4188647) Homepage

    The problem faked hashes can be addressed using trees of checksums rather than just a simple checksum although a workable implementation would require embedding into the P2P protocol.

    The idea is you break the file up into smallish sized blocks (100k or so) and generate a hash for each one of these. For each 8 first level hashes, you feed them into a crypto hash function to generate a second level hash. For each 8 second level hashes... you generate a third level hash. This allows a continuous (per 100k blocks) proof that the content is valid... The size of the proof grows with the log of the content so it is not much of a problem.

  • Shameless plug... (Score:1, Interesting)

    by CoderByBirth ( 585951 ) on Tuesday September 03, 2002 @10:32AM (#4188695)
    I'm currently in the process of designing a opensource Peer-to-Peer network which will take care of some of these issues.
    The network will be a semi-server-centered with a design similar to the NeoModus Direct Connect network.
    The basic new idea is to reward users who share information by giving them more access to the network.
    Hopefully this will make the network somewhat self-moderating since users sharing undesirable content will not rise in network status.

    As I said, the project is still in the design-phase with a preliminary protocol spec just finished.
    If you would like more details or contribute to the project, visit:
    Bitpeddler project page [sourceforge.net]
    or
    Bitpeddler homepage (with design/protocol spec) [sourceforge.net]
  • by perljon ( 530156 ) on Tuesday September 03, 2002 @10:58AM (#4188859) Homepage
    This would be changing constantly. First of all, joining a P2P is pretty easy assuming it is open to the public. And, as I am out searching for enimem (they throw out a lot of poison), I download a poision file, and now, I am a) blocked from the network or b) passing out poison myself.

    A blocking system can't work fast enough.
  • Re:Simple! (Score:4, Interesting)

    by decathexis ( 451196 ) on Tuesday September 03, 2002 @11:12AM (#4188974) Homepage
    A more 'toothful' modification of this idea would be to require all files to include some DMCA-protected text, like DeCSS.

    Or, maybe, a "licence":

    By making this File available on the Network, directly or through an Agent, the Distributor hereby
    gives up any and all Rights to its Content, as well as any other Works of Art matching this File in name.


    Having distributed content together with such licenses (or hired someone to do so), it might be a bit harder for the labels to defend copyright claims for individual songs.

  • by jidar ( 83795 ) on Tuesday September 03, 2002 @11:31AM (#4189094)
    Taken from Andrew Chens responses to the solutions:

    Although this idea works for newsgroups and some other centralized services, it does not with P2P. Basically, it comes down to the fact that you must trust whomever is actually doing the checksumming, or else they can just lie and publish false checksums. In the case of P2P networks, the checksumming is done by the same person you want to figure out if you can trust! As far as I know, this is an unresolvable problem.

    Actually, the checksums should still work I believe, in much the same way that file sizes work now. Consider the reason the files that are being injected are set to the same size as the real file; the purpose is to mask these files to the naked eye. Checksums could be used for the same purpose.
    The reason for this is because as people find good files they will tend to keep them while deleting the bad files. Sure if we only get 1 result back then we don't know one way or other, but if we have 10 results back and 8 of the 10 of the same checksum, we can assume those 8 are the good files.
    Of course the problem with this is that a great many people don't bother to delete bad files after downloading, but should the poisoning become too much of a problem we can entice more people to clean up their shared files by way of the client interface.

    All in all, I think this would combat poisoning very well.
  • block checksum (Score:3, Interesting)

    by bogado ( 25959 ) <bogado&bogado,net> on Tuesday September 03, 2002 @01:02PM (#4189747) Homepage Journal
    one could keep a trusted block signature for each file. Say you have signature file that has one MD5 for each x bytes of the file. This file and it's MD5 hash is the identity of the file. On would then choose to download this file before the file itself and then download the blocks of x bytes from the file in a rendomised order, and possibly from diferent nodes. I guess this would add some otherwise uneeded downloads, but would help to restart the stoped downloads and would detect poison nodes easily.

    To bad I am so late in posting this...
  • Two Problems (Score:2, Interesting)

    by Fascist Christ ( 586624 ) on Tuesday September 03, 2002 @01:33PM (#4189996)

    I see two problems with this idea.

    1. Their problem They don't want to change. They don't want to give in to this non-physical technology. They don't understand it, so they condemn it. It's human nature. They aren't simply hard-headed.
      -or-
    2. Our problem They will sell it to us for $5 per 64-bit mp3 to make up for the "lost sales" on the "pirated" copies. 128-bit will cost you $10. They won't offer any higher quality because it would "take away from CD sales."
  • by Arcaeris ( 311424 ) on Tuesday September 03, 2002 @01:58PM (#4190167)
    It's more ridiculous than you might think.

    Searching on Kazaa yesterday for LOTR - The Two Towers (yes, I know, I'm such a pirate), I found about 4 files of 800MB - 1 GB in size. They all said, "Incomplete" or "Does not Work" in the filename.

    It's not just that these files exist and don't work, but that people have them and just don't care. With HDs getting so large, who can blame them, either? Even a gig here and there isn't hurting most people.

    So, I decided to download one of them and see. The Kazaa description was "LOTR - The Two Towers." The filename was "Eight-Legged Freaks Part 1 of 2." The actual file, upon some A/V work on my part, turned out to be several hours of audio from a trailer for The Scorpion King.

    I mean, Jesus. Sorry if it's a little off-topic.
  • by Archfeld ( 6757 ) <treboreel@live.com> on Tuesday September 03, 2002 @01:58PM (#4190170) Journal
    look at DVD's...provide so much material that it is more work pirating than it is to buy. Why does a DVD cost the SAME as a CD ? Last time I checked a movie was SIGNIFICANTLY more expensive to produce than a ALBUM, and yet DVD's sell for the same or LESS, and quite often contain the BLOODY soundtrack as well. If a CD included multimedia stuff, editing room floor tracks, useless bio info and oodles of extra crap at a reasonable price it will be more trouble to rip it than it would be to buy it. When the RIAA wakes up and realizes that, maybe, just maybe things will turn around, otherwise, one way or another the industry is dead. The MPAA is actually beginning to come around, slowly and not without a FIGHT, but they are evolving. I don't hold out the same hope for the record industry.
  • MD5 Hash (Score:2, Interesting)

    by Xannor ( 174984 ) on Tuesday September 03, 2002 @02:12PM (#4190280) Homepage
    After reading this and some of the comments from the old posting, I realised the MD5 hash is not a bad approach. When a client scans its HD it creates MD5 checksums of its files. when some one requets a file the checksum is sent with the reply. when the file is d/l'ed the checksum is checked. if the checksum fails the user is notified and they can either re-try the d/l or accept it. after they can test the file. if (with a valid checksum) the file is corrupt, the client can store the checksum and filter it from future requests, also they can be shared to prevent others from d/l'in as well. this system could still be temerarily defeted by having many versions of the same file, but again that could be tested as well (too many bad files flags a bad host, etc)
  • by adamshamblin ( 524400 ) on Tuesday September 03, 2002 @02:14PM (#4190295) Homepage

    This proposed solution, or most any solution based upon moderation, has a few serious flaws. First of all, the use of public key encryption would require some sort of central authority to both assign moderator status to select members of the P2P network, and to distribute the keys to the masses. In the case of the Gnutella network, this could be said to be both the antithesis of the network model, as well as being relatively impossible to enforce - with the disjointed nature of the Gnutella network, it is conceivable that segments of the network would not be visible to a logged in moderator. In fact, to insure moderator coverage, moderator status would have to be given to a statistically high number of individuals. Second, the creation of the central authority necessary to administer this proposed 'solution' would give organizations like the RIAA and MPAA easy and - from their point of view - logical individuals to target in their foolhardy quest for the Copyright Grail.

    Perhaps a means of voluntary moderation could be accommodated in the Gnutella protocol itself. 'Karma' could be built up on a network node based upon many criteria, which could include positive feedback from peers, etc. By writing moderation into the protocol itself, client developers could implement these features at their own discretion. The idea of moderation would then be put to the test of software natural selection.

  • by bwt ( 68845 ) on Tuesday September 03, 2002 @02:50PM (#4190556)
    In particular, our analysis of the model leads to four potential strategies, which can be used in conjunction:

    1. Randomly selecting and litigating against users engaging in piracy
    2. Creating fake users that carry (incorrectly named or damaged files)
    3. Broadcasting fake queries in order to degrade network performance
    4. Selectively targeting litigation against the small percentage of users that carry the majority of the files


    This mostly summarizes the war on drugs and the government's strategy against alcohol prohibition in the 1920's. Neither worked and the countermeasures are simple and straight forward.

    A "directed" web of trust, objective quality measurement, and knowledge compartimentalization defeat the above strategy. The countermeasure of creating large numbers of mutally trusting attackers doesn't work when trust "flow" is taken into account. The keys to such a system are:
    1) trust is assymetric
    2) nodes define and change who they trust based on their own assessments
    3) Nodes protect their knowledge of the web of trust

    To see how this works, consider the cops and the drug dealers. The fact that the cops all trust each other does not result in the drug dealers trusting them. When a dealer is compromised, no matter how high up the chain it goes, trust shifts to rivals. Even when a kingpin falls, lines of trust will still exist that aren't compromised.

    Drug dealing is not as popular as file sharing, is substantially more damaging to peoples lives and society, and has motivated levels of funding that are not matchable by publicly traded firms (who must demonstrate at least mid-range ROI). Despite all of these advantages, the war on drugs has been a dismal failure. The bottom line is that the internet makes distribution of content a commidity, where it was formerly a task of enormous complexity and value add. Economics will determine the rest, unless the US adopts and maintains a totalitarian government.
  • by tealeaves844 ( 164316 ) on Tuesday September 03, 2002 @04:10PM (#4191093)
    Here's another way to look at the problem: the physics of evolution. If we can treat p2p as an ecosystem, we can apply the same types of energy balances. The paper isn't talking aobut extinction of p2p, it's talking about a change in the observable patterns it exhibits. Because stressing a network can't eliminate p2p, a new one will pop up in its place. If you treat user demand as "free energy" the most stable state of those users is in sharing. Fundamentally, when you stress an ecosytem, it can "fail" in that the species in it aren't the same, but new ones pop up. The dinosaurs went extinct, but here we are!
  • by Jim McCoy ( 3961 ) on Tuesday September 03, 2002 @04:38PM (#4191277) Homepage
    Mr. Chen correctly points out that an attacker can easily forge the hash values it reports to the network. self-verification won't happen until the user has downloaded a good portion (if not all) of the file. At that point the attack has already been successful.


    You can send out a bad copy once, but if well-known and trusted copies already exist on the network you are not going to be able to replace these with bad copies, the self-verification does not prevent the single-point attack you describe, it prevents the propogation of this attack throughout the network. If an attacker serves up bad files (ones that do not match the SHA1 hash advertised) then the downloader should treat the host as malfunctioning and query a more reliable source. The downloading agent does not need to unpack the file and see what is inside, it just checks the SHA1 hash and then can simple assume that there was a transmission error and try another source. Eventually the malicious node will be trimmed from everyone else's peer list and a new node identity will have to be generated and the game starts again.


    This single attack costs the attacker as much as it does the downloader (and you can bet the RIAA is paying more per MB of data sent than someone downloading the data via a DSL or cable modem line) and a few simple changes to the system like favoring trusted peers (ones who have not given you mismatched hash/payload data) as the first nodes to query and only moving down the local reputation food chain if you need to expand your query or search for alternate sources. Unless an attacker can pretend to be a vast majority of the nodes in the system it is not going to be able to make this attack scale-up in the manner you suggest.


    There is a difference between an attack that works on a single download and an attack that would be viable for a network-wide assault. The case you and Mr. Chen bring up here is clearly in the first category, an inconvenience for individual users but not something that will be a significant problem for the network as a whole.

    Moderation and peer reputation require some method of recording "ratings" of users on the network. Something not present in the current Gnutella network. But if implemented, it would have to be distributed as well. This means that there, at some point, must be a blind trust between clients to complete these "ratings". That blind trust will lead to poisioning of the ratings system and make it worthless.


    "Ring of trust" simply does not work in a distributed environment that is truly open to anyone. Closed distributed environments, or virtually closed environments within an open environment would be the only way. However new users would not be able to enter them and that is how Gnutella keeps itself alive.


    Which is why I think that things like Raph Levien's work in reputation systems (and actually coding up working examples of such a system, see refs below) are rather attractive because they solve this specific problem in a rather elegant fashion and make such simplistic attacks much more difficult and expensive to pull off. [Here's a quick hint: Have you ever noticed that most people seem to care about Roger Ebert's opinion rather than yours when it comes to what movies to go see? This is because distributed trust system can deal with voter flooding attacks by limiting how much influence comes from untrusted sources.]


    You seem to think, Mr. McCoy, that there are obvious solutions. Yet you really don't present any nor do you present any existing real-world examples.



    One of the problems I addressed in the original paper was the fact that it was poorly researched in certain aspects. It seems that everyone is too lazy to actually do any research these days, but since spending five minutes doing google searches on various terms related to reputation systems seems to be too much work for either you or Mr. Chen, here is a quick summary of a few minutes work (although I selected papers that I am familiar with after google returned a hit).


    1) For starters look at Google itself. Google is the single biggest distributed reputation system in the internet. That is what a pagerank is, the "repuation" of a particular link for a particular subject using link count as the voting mechanism. It can be attacked and subverted on a small scale as various Google-juicing experiments prove, but it is also very effective at filtering out these attacks (see some of the Scientology google-juicing wars to see how hard it is to really influence a massively distributed reputatioon system implemented my people who know how to pick the best ideas from current research and invent a few of their own.


    2) EBay seller rankings. These can also be attacked and tweaked, but even when money is involved (making the incentive for dishonest behavior very high, much more so than any p2p system will ever have to deal with) EBay manages to keep fraud to a manageable level and recent research into seller/buyer identity-blinding and reputation cluster filtering can make the seller ranking system even more attack-resistant.


    3) Amazon buyer ratings and recommendations. Yet another example of a real-world distributed trust management system.


    4) Advogato [advogato.org] is a community forum site that implements some of Raph's Ph.D. work in reputaitons and distrubted trust management to create a flow-constrained reputation system that has some very good attack-resistance characteristics. Raph has been running Advogato using his distrubted trust metric for several years now.


    5) Pattie Maes' agents group at MIT [mit.edu], specifically the Yenta reputation clustering system but just about everything to come out of this group is a source of good ideas and practical research in this area.


    6) Check out some of the available research bibliographies (like this [umich.edu]) and places like citeseer for other research in the subject.


    One thing you will notice about these real-world examples is that none of the systems tries to be "perfect", just good enough to get the job done.

"Engineering without management is art." -- Jeff Johnson

Working...