Forgot your password?
typodupeerror

Faster Feeds Using FeedTree Peer-To-Peer 109

Posted by ScuttleMonkey
from the instant-gratification-generation dept.
dsandler writes "Researchers at Rice University have just released version 0.7 of FeedTree, a peer-to-peer system for distributing Web feeds faster. Instead of polling feeds independently, FeedTree users cooperate to share news updates using multicast in Pastry, a scalable p2p overlay network. FeedTree reduces the update delay for existing RSS and Atom feeds to a few minutes without putting extra stress on the webserver (anyone who's ever been temporarily banned by Slashdot's RSS feed knows this is a real concern). Feed publishers can also choose to push digitally signed updates for immediate, tamper-proof delivery to subscribers. The client software (download) runs on Linux, OS X, and Windows, and works with any desktop feed reader."
This discussion has been archived. No new comments can be posted.

Faster Feeds Using FeedTree Peer-To-Peer

Comments Filter:
  • Why? (Score:3, Interesting)

    by TFGeditor (737839) on Monday February 20, 2006 @02:54PM (#14762565) Homepage
    WIth Bittorrent et al firmly established, why do we need another P2P?
    • Re:Why? (Score:5, Informative)

      by idonthack (883680) on Monday February 20, 2006 @02:57PM (#14762599)
      It's not "just another p2p", it's a p2p specifically for distributing newsfeeds. Bittorrent doesn't really work to well for that because it doesn't have the infrastructure and downloading the real feed would be easier than downloading the torrent first. This software bypasses any user interaction and grabs it off its established network.
      • check out uTorrent. The beta version has the option to check RSS feeds from any site, and it will automatically add the torrets to the client. It even has the option of filtering out torrents with simple expressions, and even offers you the ability to save your files in other folders for oranization purposes.
        It works really well for downloading TV shows off of some sites, for example.

        Naturally, because it's bittorrent, it's great at downloading b-i-g files, whereas FeedTree sounds like it's more about dist
        • by RPoet (20693)
          No no no. FeedTree is about cooperatively distributing RSS files. Your Torrent doesn't distribute RSS files, it reads them, thus putting strain on the server hosting those files. This is what FeedTree is meant to solve.

          It is unfortunate that most people think "peer-to-peer" and "file sharing" are synonymous. They're not. Peer-to-peer has many, many users outside of file sharing/distribution.
          • Yeah, that's about what I figured it was for, but the site wasn't especially clear. They said that it was about reducing costs and all, but one had to read between the lines on the front page. They need a concice summary on their home page, insead of burying those details in links.

            If they simply said it was for distributing RSS feeds across a peer-to-peer network to take load off of servers hosting RSS content, that would have been much clearer. Even if they said that it would allow you to host a very po
      • What about putting .nyud.net:8080 in the URL and using the Coral Cache? Wouldn't that reduce the strain on the server lots?
    • Re:Why? (Score:2, Insightful)

      by gkhan1 (886823)
      This is not really the same thing. This is for feed-distribution, not cooperative downloading of large files. The files are rather small in this case
    • Re:Why? (Score:5, Insightful)

      by Jerf (17166) on Monday February 20, 2006 @03:13PM (#14762691) Journal
      As already mentioned, this doesn't compete with Bittorrent, because bittorrent isn't designed for RSS feeds. Along with the file size issue idonthack mentioned (torrents are only a win when the size of the file being transferred is much much larger than the coordination overhead, generally not the case for RSS), BitTorrent is also not designed for files to change over time; it would require a complete overhaul of the protocol because the file hashes that are the foundation of the protocol would be constantly changing.

      There is room for coordination with bittorrent, though; imagine a Pastry-based P2P feed that then used RSS enclosures to tie into a (trackerless?) BitTorrent feed for a fully distributed pod-/vid-/file-casting solution that anybody could run with no fear of the bandwidth involved.

      Tack in some sort of P2P web system, and in theory, you could run a massively popular podcast/blog with millions of hits a day off of your cable modem. (Although something with a bit more upstreaming oomph would be good for the rarely-requested content that falls out of the P2P; anyhow, any ol' webhost could handle this kind of bandwidth.)

      I think this is a worthy goal, as if nothing else, popular websites run for fun would no longer be faced with the dilemma of advertising to cover bandwidth costs or going offline.
    • Re:Why? (Score:1, Interesting)

      by Anonymous Coward
      The particular p2p application isn't really newsworthy. The overlay network (Pastry) is. The Pastry codebase appears to be mostly sponsored by Microsoft, is written in Java, and has a 'BSD-like' license. If all that doesn't give you the shivers, then you must have been in Microsoft hell long enough to start getting comfortable.

      Mark my words: Microsoft is going to attempt to co-opt the term 'p2p', and make it their own.
    • Ignorance is curable

      The cure: RTFA.

    • My ISP, Rogers Cable, has implemented some scheme which makes bittorrent painfully slow to the point of making it not worth using at all. You'll be lucky to get 10k/sec using bittorrent, while I can get over 300k/sec from websites, ftp, etc.. Even if I try and download a legitimate torrent, my upload rate will be higher then my download rate 99% of the time. So now I'm back to using IRC and xdcc list bots for my downloading needs.
  • by Animats (122034) on Monday February 20, 2006 @03:01PM (#14762622) Homepage
    It looks like they just re-invented the netnews protocol, which works in a very similar way.
    • Exactly. All they need now is support for posting new articles from client software.
    • by That's Unpossible! (722232) on Monday February 20, 2006 @03:19PM (#14762733)
      That's like saying IMAP just reinvented POP3.

      This is designed for USERS to help each other get the very latest RSS feeds using p2p tech.

      netnews is designed to let SERVERS help each other distribute messages posted by users.

      I don't really see how it is a re-invention at all.
      • > This is designed for USERS to help each other get the very latest
        > RSS feeds using p2p tech.

        > netnews is designed to let SERVERS help each other distribute
        > messages posted by users.

        > I don't really see how it is a re-invention at all.

        Usenet is a peer to peer network of "servers". This is a re-invention of the way articles propagate in Usenet.
        • Usenet is a peer to peer network of "servers". This is a re-invention of the way articles propagate in Usenet.

          Except that Feed Tree doesn't propagate articles for usenet, it propagates entries posted to RSS feeds. I'm really trying to understand how this is a re-invention of netnews?
          • The assertion is that this method of transmitting nuggets of information (news entries) is similar to an older method of transmitting nuggets of information (usenet posts). Do you really not understand that the goals here are comparable?

            That said, RSS/ATOM have a single source of the truth, while usenet is a web of inserters and receivers. RSS/ATOM are uniformly linear in nature, usenet is not. RSS/ATOM are by intention very very short entries; usenet posts can be much larger. RSS/ATOM are not intended
          • Sounds like you're getting caught up in trees (terms) and missing the forest.

            USENET is a way for articles to be propogated among coordinating servers, and then users would poll those distributed servers. RSS is a way for distributing articles with only one server, and the users query it directly. What this tech does is create a way for articles to be propogated among coordinating servers, but those servers are also the users. The users then query other users, who are acting as servers, and then become se
            • > It's the USENET concept that has been more decentralized in the
              > content distribution.

              Actually the decomposition of Usenet into servers and clients is a relatively recent phenomenon. Originally we read news directly from the spool using local clients. Indeed, it is still quite possible to run your own local server as a leaf node, receiving only those newsgroups you are interested in. I have been doing just that for about twenty years.
            • So basically, this would do the same thing that USENET did, but without the network of static coordinating servers. It, instead, replaces the static servers with dynamic servers and a method of locating those dynamic servers.

              That's the way USENET used to be used, and a lot of USENET software still supports that usage, including automatically locating and subscribing to newsgroups only when a user demands it. It's become more static and centralized because users preferred using it that way, not because of a
          • Except that FeedTree doesn't propagate articles for usenet, it propagates entries posted to RSS feeds.

            USENET propagates news items with metadata in a tree-like fashion, overlayed over the Internet. FeedTree propagates news items with metadata in a tree-like fashion, overlayed over the Internet.

            There are some minor differences in standards (MIME vs. XML) and usage (well-known article hierarchies vs. ad-hoc RSS feeds), but that doesn't make FeedTree new technology. I don't think it's even a "re-invention",
          • Feedtree and Netnews both allow users to wrote blobs of text, in specified open formats that originally came from other applications, and use multicast technology to flood-propagate them to a network of servers where a wide range of clients can fetch them for display to people who want to read them. The indexing details are different, and the specific formats are different, and the clients are different even though people have written Mozilla readers and probably Emacs macros for both, but it's really the
    • I'm not that familiar with netnews http://mcntp.sourceforge.net/ [sourceforge.net],
      but a quick check of the sites shows a rather different architecture.

      This seems more targeted towards RSS type feeds, and looks like one of
      those rather simple and clever ideas that strike one as:

      "This looks like the way it should have been done from the beginning"

      It addresses a very real problem with current RSS news feeds, and has what
      looks like simple (that's a complement), complete, compatible, easy to install
      software for a reasonable varie
    • No, it is more like that they are reinventing BBS: http://en.wikipedia.org/wiki/Bbs [wikipedia.org]
  • What's the best OS X feed reader to use with FeedTree? I don't care for the way Safari handles RSS.
  • s/t - has anyone run this on FreeBSD? Perhaps it works with the Linux compat modules loaded? I'd like to try this out tonight, since I have 3 sites on my FreeBSD box that have feeds that are constantly being hit...this sounds like a solution for the long term.
    • The client and publisher both run on any system with the Sun Java runtime, 1.4.2 or newer. (The networking code in Pastry requires Sun's NIO implementation.) As for the publisher helper scripts, the configurator is Python, and the run control scripts are Bourne shell.

      In other words, It Ought To Work(TM) out of the box on FBSD. If not, file a ticket [feedtree.net].

  • MMMmmmmmm, Pastry.
  • by Refried Beans (70083) on Monday February 20, 2006 @03:16PM (#14762714) Homepage
    I remember seeing something like this in my logs over a year ago. I would see lines like this in my access log:

    66.177.198.139 - Anonymous [04/Apr/2005:03:04:17 -0500] "GET /rdf10_xml HTTP/1.1" 200 5322 "" "Shrook/76p (Distributed; +http://www.fondantfancies.com/shrook/distfaq.php [fondantfancies.com]) "

    I haven't seen a hit from this in a while, perhaps that effort didn't gain much traction. Who knows if this one will... I never saw Shrook mentioned on Slashdot.
  • GMail RSS (Score:5, Insightful)

    by jfengel (409917) on Monday February 20, 2006 @03:17PM (#14762717) Homepage Journal
    I wonder: If GMail were to incorporate an RSS reader (the way Thunderbird does), it could potentially update many, many users with a single hit of each RSS site.

    I'm leaning towards using RSS as a way to do announcements rather than maintain a mailing list. Rather than tell me you want me to send you updates (and deal with being potentially a spammer, deal with your unsubscribe, your email address change, etc.), just poll my site every so often (days, for the lists I'm talking about; hours, for Slashdot) and let it show up in your mail queue.

    The idea isn't quite ready for prime time; too few people use RSS. But GMail could make that happen in one fell swoop. Well, two fell swoops: you'd need some sort of browser extension to make the little orange "RSS feed" button notify GMail.

    I wonder if just having GMail (and hotmail, aol, and yahoo) handle that would solve the problem to the point where we no longer needed a P2P RSS distribution system.

    Alternatively, if ISPs were to cache the RSS feeds the way some do with certain web pages, that might also take a lot of the load off. People will still impolitely set their RSS readers to check the feed every 10 seconds, but at least it never gets out onto the backbone if it's cached at the ISP.
    • That's the way Yahoo does it, AFAIK. But I don't think they update enough. Doesn't Google also use cached RSS for their custom homepage thing? At least, for things like Slashdot.
    • Re:GMail RSS (Score:3, Informative)

      by thing12 (45050)
      Gmail has Web Clips [google.com], that's almost to what you want. But what you really want is for Google Reader [google.com] to be integrated into Gmail. It probably won't be too long before that happens anyway.
      • Thanks; I didn't know about either of those. They have so much stuff I can't keep up. I still use my own domain, so I don't use my gmail much, but I hear they're planning to solve that problem, too.
    • If GMail were to incorporate an RSS reader (the way Thunderbird does), it could potentially update many, many users with a single hit of each RSS site.

      There are already several large Web-based aggregators that work this way, but for various reasons many people prefer local aggregators (just as many people prefer local mail clients instead of GMail). FeedTree solves the bandwidth and latency problem for local aggregators.
    • Haven't used Google Reader [google.com] yet, have you?

      They'll likely integrate this with GMail at some point. But that's just my opinion.
  • Wow... (Score:3, Funny)

    by SoundGuyNoise (864550) on Monday February 20, 2006 @03:21PM (#14762747) Homepage
    Try saying that headline 5 times fast!
  • by xiphoris (839465) on Monday February 20, 2006 @03:21PM (#14762752) Homepage
    As a Rice Computer Science student I would like to point out that Pastry [freepastry.org] actually originated at Rice, under Dan Sandler [rice.edu]. The first framework was in Java. You can see from his web page that he's responsible for FeedTree, too.

    Microsoft Research became interested in the product and ported it to C#, effectively turning it into the form it is now. Many classes at Rice have now "backported" it, I guess you could say, and it's used for many of our classes that involve distributed networks, such as the current COMP 410 [rice.edu] class which has previously turned out distributed file and process system codename Voltron [rice.edu].

    Here's a link to the paper [rice.edu] co-authored by Sandler and others at Rice.
    • Actually, my advisor, Peter Druschel [mpi-sws.mpg.de], developed Pastry with Ant Rowstron (of Microsoft Research). Since then, a number of bright researchers from Rice and elsewhere have contributed to the project; their names and publications are listed on the official Pastry website [freepastry.org].

      There are a number of implementations of the Pastry design; FeedTree uses the Java-based FreePastry [freepastry.org] package, which is under active development by Rice and the Max Planck Institute for Software Systems [mpi-sws.mpg.de] and is available under a BSD-like licen

      • If I understand it correctly, I think that Squirrel [freepastry.org] looks like a much more exciting application.

        from the site:

        SQUIRREL is a fully decentralized, peer-to-peer cooperative web cache, based on the idea of enabling web browsers on desktop machines to share their local caches.

        If everybody used this, then there'd be no need for mirrordot [mirrordot.org] and the slashdot effect would be a thing of the past and more people could afford to host pr0n on their personal websites ;)

      • Hmmm, really? I didn't know. Sorry for saying you built it. I talked to Mathias Ricken about it today (a doctoral student under Corky in the PLT group) and he told me you were largely responsible for it.

        Dr. Wong also said, during my time in COMP 410, that Rice U. was entirely responsible for Pastry before MS took it.

        So what's the real story? Did we make it or not? X_x

        Congratulations on the /.ing, btw. Where's your office? I'll come by sometime and say hi. :) I'm an undergraduate junior from Bro
    • I've heard about this trend before, but it is still very disturbing to see something like this where an application that screams out for a universal client that can be run on any platform is funded by Microsoft who dictates that the language be changed to C# leaving the original Java version to languish. Although, it's nice to see that the original is still available and has an open source license, it's disappointing that MS couldn't simply fund it as it was. As well as being a waste of money to do a port
      • Hey... wait a second... I did a little more digging and it does look like there is at least a java version of the code base. The Linux version seems to run on pure java and the library contains the pastry.jar file. Even though there's a src/net tree, where much of the code seems to reside, I'm not seeing ANY C# code.

        So, it seems I should've dug deeper before making my previous comment. Sorry about that folks :(
      • I'm not entirely sure what happened at Rice w.r.t. Pastry. What I was told by Dr. Wong [rice.edu] that Rice had a Pastry version, MS adopted it and converted it to C#, then allowed us to use it freely. All of this was part of an elective class called COMP 410 that students take. Basically, a team of 10-20 people act like a software company, self-organize, meet with a "client" (professor acting like one) and build a huge system.

        And yes, we use entirely Microsoft software. But I think it's a good thing. When I to
      • Oh, and I should also point out that C# and .NET are actually much more "free" technologies than Java is.

        Java is, and always has been, a proprietary technology completely specified by Sun. Sun owns the specs and decides what language features to add. Period.

        The .NET platform and C# language are fully-specified and are on their way [microsoft.com] into acceptance as international standards by the ISO. Quote:

        In July 2005, Ecma submitted [the C# and .NET] TRs to ISO/IEC JTC 1 via the latter's Fast-Track process. This

        • C# and .Net are actually much more "free" technologies than Java is

          Microsoft is in complete control over the future of the C# language and the .Net libraries and runtime. Just because they do the standards dance doesn't mean they've given up control. Do you honestly think that C# or .Net can change in a way Microsoft doesn't approve of?

          The ECMA even allows the standard to be patent-encumbered as long as Microsoft provides "reasonable and non-discriminatory" licensing fees. That makes me feel com

          • Aside from the production implementation...

            Um, no. .NET itself (the platform, SDK, etc.) is entirely free, just like Java. The only thing Microsoft has control over is the development tools. Microsoft's Visual Studio is not open source, but so what? In the grandparent post to this I pointed out several open-source .NET projects and one IDE. And there are plenty of popular non-open-source Java IDEs [jetbrains.com] out there too. No one has problems with them.

            ... and the related patents, right?

            Sorry, but that's
            • I know the production .Net implementation is free, but the Java one is semi open source. Don't strawman the IDE stuff; I never said anything about Visual Studio.

              I didn't know about the patent grant. That's good news.

              Now all that's left is your claim Sun controls Java while C# and .Net are more open because they are ECMA standards. For all practical purposes, Microsoft is in at least as much control of the future of C# and the .Net libraries as Sun is in control of Java. It doesn't matter how many standa
      • So, I work on Pastry. There are two branches of Pastry: MSPastry [microsoft.com] (developed by Microsoft Research) and FreePastry [freepastry.org] (developed initially by Rice, open source, now developed primarily at The Max Planck Institute for Software Systems [mpi-sws.mpg.de] (where I work)). They were started at roughly the same time, while Prof. Peter Druschel [mpi-sws.mpg.de] (formerly of Rice, now at MPI-SWS) was on sabbatical at MSR.

        Microsoft didn't co-opt anything, and in fact allowed and encouraged the open source Java version initially. These days I understan
  • I can't find any mention of the license terms on the Web site.
    • From the license file:
      Copyright (c) 2006, Rice University
      All rights reserved.

      Redistribution and use in source and binary forms, with or without
      modification, are permitted provided that the following conditions are met:

      * Redistributions of source code must retain the above copyright notice,
      this list of conditions and the following disclaimer.
      * Redistributions in binary form must reproduce the above copyright notice,
      this l
  • An excellent project, it deserves to become dominant in internet
    RSS news distribution.

    It's nice to be able to browse the source code.

    What can we do to encourage adoption of this, before some wretched
    proprietary format tries to muscle in?
  • I personally use Bloglines [bloglines.com] - a web based news reader. This lets me check and read my subscriptions from home and work without having to read posts twice. Google Reader [google.com] is a similar application but has tagging and merges all your feeds into one.

  • Is because RSS doesnt pay. There's no way to monetize the RSS-feed which often can be a large burden on a server in terms of CPU (if dynamic) and bandwidth.

    Micropayments would solve this. Pay 0.001 for every reload automaticlly and you wouldn't need a solution like this. Fix that and solve thousands of small problems at once.
    • This actually isn't true. Feedburner, for one, offers RSS advertising. I don't know if it's a pay-by-impression or pay-for-click model, though.
      • People put their RSS-aggregator to reload every fifth minute, if you're lucky you might get a real ad impression once a day. Getting any money from that kind of advertising is extremely hard compared to the cost of providing the service.
  • Scribe [microsoft.com][Technical paper pdf warning!] is a framework to do very similar things. Is this an application developed on top of that? Scribe works by building a multicast tree of the participants too.
    One interesting thing to note is that as a participant in scribe, you'll have to pass on notifications of feeds even if you're not interested in them, because you're a part of the tree and pretty much the only path to the guys below you. How does FeedTree deal with cheating/lying nodes that refuse to pass on message
  • by DrHanser (845654) on Monday February 20, 2006 @04:25PM (#14763109) Homepage
    I'm afraid I don't understand what problem this is solving. It's like a solution that's still looking for an problem to solve. As an end user, why should I care? I'm not trolling; I just don't get it.
    • The problem is that users aren't able to retrieve RSS feeds as often as they would like. They must wait longer between updates of the feed, to save bandwidth and decrease load on the server that is hosting the content.

      This solution, however, would allow one source to poll the server, and use P2P to transfer the content to the many clients as want it - and as often as they want it. No additional strain is being put on the source, and the clients are all happy.
  • I've been thinking for quite some time of utilizing this type of P2P distributed caching proxy concept with many different protocols. RSS is just one possibility amongst many that could utilize the basic technology here. Some others might include distributed file systems, distributed caching http proxies, or even a Google competitor that uses a distributed P2P implementation of the database and utilizes everyone's everyday web activity to augment the spidering (i.e. every time anyone who is part of the P2

  • The client software (download) runs on Linux, OS X, and Windows, and works with any desktop feed reader.

    New game in town: never use the word Java. BTW, it doesn't run on Linux and Windows. Except if you install Java of course.
  • I know it's a crazy suggestion, but instead of having hundreds of people polling a single RSS feed, why not have the server which hosts the RSS feed actually PUSH the updates out to the people who are interested?

    We already have a nice and simple protocol (XMPP) which could be used for this, although admittedly PubSub isn't as final as it could be.

    • instead of having hundreds of people polling a single RSS feed, why not have the server which hosts the RSS feed actually PUSH the updates out to the people who are interested?

      FeedTree can operate in that mode, or if the server operator is too lazy to install it, FeedTree will still provide some benefit.

      We already have a nice and simple protocol (XMPP) which could be used for this, although admittedly PubSub isn't as final as it could be.

      Doesn't this lead to potentially high fanout (with the attendant conce
      • Once there is working PubSub, the work will be focused around the PubSub nodes. The site will send one message to the PubSub service, and that service would be one which is built with large scale messaging in mind.
        • If this PubSub service is centralized, then it won't be free. If it's decentralized, then it's essentially similar to FeedTree.
          • Well, I imagine that the PubSub for each site would be centralised (either at the site, or hosted somewhere else.) But each site would probably have its own distribution node, so it's decentralised in that respect. Either way it's not free, because someone still pays for all the bandwidth, and the site still pays for its hosting. ;-)
  • More details at How I Invented a Decentralised Scaleable Push-Based Micronews System in 2000 [1729.com].

    If nothing else, my documented but unimplemented invention might be good prior art, should it be needed.

  • by Zwets (645911)

    Distributed peer-to-peer web 2.0 rss news updates? You young whippersnappers and your fancy-schmancy names!

    In my day we simply called it gossip!

If a subordinate asks you a pertinent question, look at him as if he had lost his senses. When he looks down, paraphrase the question back at him.

Working...