Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
The Internet

When RSS Traffic Looks Like a DDoS 443

An anonymous reader writes "Infoworld's CTO Chad Dickerson says he has a love/hate relationship with RSS. He loves the changes to his information production and consumption, but he hates the behavior of some RSS feed readers. Every hour, Infoworld "sees a massive surge of RSS newsreader activity" that "has all the characteristics of a distributed DoS attack." So many requests in such a short period of time are creating scaling issues. " We've seen similiar problems over the years. RSS (or as it should be called, "Speedfeed") is such a useful thing, it's unfortunate that it's ultimately just very stupid.
This discussion has been archived. No new comments can be posted.

When RSS Traffic Looks Like a DDoS

Comments Filter:
  • RSS maybe (Score:3, Funny)

    by Anonymous Coward on Tuesday July 20, 2004 @03:33PM (#9752005)
    RSS may be ultimatly stupid but you didn't get first post did you! rookie!
  • Yesterday (Score:3, Interesting)

    by ravan_a ( 222804 ) on Tuesday July 20, 2004 @03:34PM (#9752009)
    Does this have anything to do with /. problems yesterday
    • Re:Yesterday (Score:2, Interesting)

      by afidel ( 530433 )
      Oh how prophetic, I went to check the first reply to your post and slashdot again did the white page thing (top and left borders with a white page and no right border). Earlier today (around noon EST) I was getting nothing but 503's. This new code has not been good to Slashdot.
  • netcraft article (Score:5, Informative)

    by croddy ( 659025 ) on Tuesday July 20, 2004 @03:34PM (#9752010)
    another article [netcraft.com]
  • by xplosiv ( 129880 ) on Tuesday July 20, 2004 @03:34PM (#9752019)
    Can't one just write a small php script or something which returns an error (i.e. 500), less data to send back, and hopefully the reader would just try again later.
    • That kind of eliminates the point of having the RSS at all, as the user no longer gets up-to-the-minute information.

      Also, I doubt that the major problem here is bandwidth, more the number of requests the server has to deal with. RSS feeds are quite small (just text most of the time). The server would still have to run that PHP script you suggest.
      • Well, sure if you want to the absolute second, but if you spread the requests across 5 minutes, say, or something similar, it would certainly help, and I doubt most people would complain.
      • I think that the problem is the peak load - unfortunately the rss readers all download at the same time (they should be more uniformly distributed within the minimum update period). This means that you have to design your system to cope with the peak load, but then all that capacity is sitting idle the rest of the time.

        The electricity production system has the same problem

    • by mgoodman ( 250332 ) * on Tuesday July 20, 2004 @03:40PM (#9752103)
      Then their RSS client would barf on the input and the user wouldn't see any of the previously downloaded news feeds, in some cases.

      Or rather, anyone that programs an RSS reader so horribly as to make it so that every client downloads information every hour on the hour would probably also barf on the input of a 500 or 404 error.

      Most RSS feeders *should* just download every hour from the time they start, making the download intervals between users more or less random and well-dispersed. And if you want it more than every hour, well then edit the source and compile it yourself :P
    • by ameoba ( 173803 ) on Tuesday July 20, 2004 @03:58PM (#9752369)
      It seems kinda stupid to have the clients basing their updates on clock time. Doing an update on client startup and then every 60min after that would be just as easy as doing it on the clock time & would basically eliminate the whole DDOSesque thing.
  • Simple HTTP Solution (Score:3, Informative)

    by inertia187 ( 156602 ) * on Tuesday July 20, 2004 @03:34PM (#9752020) Homepage Journal
    The readers should HEAD to see if the last modified changed... And the feed rendering engines should make sure their last modified is accurate.
    • by skraps ( 650379 ) on Tuesday July 20, 2004 @03:45PM (#9752181)

      This "optimization" will not have any long-lasting benefits. There are at least three variables in this equation:

      1. Number of users
      2. Number of RSS feeds
      3. Size of each request

      This optimization only addresses #3, which is the least likely to grow as time goes on.

      • There are at least three variables in this equation:
        1. Number of users
        2. Number of RSS feeds
        3. Size of each request


        And I'll add:
        4. Time at which each request occurs

        If RSS requests were evenly distributed throughout the hour, the problems would be minimal. When every single RSS reader assumes that updates should be checked exactly at X o'clock on the hour, you get problems.
    • by ry4an ( 1568 ) <ry4an-slashdot@ry[ ].org ['4an' in gap]> on Tuesday July 20, 2004 @03:48PM (#9752231) Homepage
      Better than that they should use the HTTP 2616 If-Modified-Since: header in their GETs as specified in section 14.25. That way if it has changed they don't have to do a subsequent GET.

      Someone did a nice write-up about doing so [pastiche.org] back in 2002.

    • So, he's writing from infoworld and complaining that RSS feed readers grab feeds whether the data has changed or not. So, I went to look for infoworld's RSS feeds. Found them at:

      http://www.infoworld.com/rss/rss_info.html

      Trying the top news feed, got back:

      date -u ; curl --head http://www.infoworld.com/rss/news.xml
      Tue Jul 20 19:51:44 GMT 2004
      HTTP/1.1 200 OK
      Date: Tue, 20 Jul 2004 19:48:30 GMT
      Server: Apache
      Accept-Ranges: bytes
      Content-Length: 7520
      Content-Type: text/html; charset=UTF-8

      How do I write an RSS re
    • by jesser ( 77961 ) on Tuesday July 20, 2004 @04:08PM (#9752508) Homepage Journal
      Even if every RSS reader used HEAD (or if-modified-since) correctly, servers would still get hammered on the hour when the RSS feed has been updated during the hour. If-modified-since saves you bandwidth over the course of a day or month, but it doesn't reduce peak usage.
  • ...so could someone recommend a couple of really good ones for Windows and *nix?
  • Call me stupid (Score:5, Informative)

    by nebaz ( 453974 ) on Tuesday July 20, 2004 @03:35PM (#9752031)
    This [xml.com] is helpful.
  • by Patik ( 584959 ) * <.cpatik. .at. .gmail.com.> on Tuesday July 20, 2004 @03:37PM (#9752047) Homepage Journal
    I don't really care for RSS either, but damn, was that necessary?
  • We've seen similiar problems over the years. RSS (or as it should be called, "Speedfeed") is such a useful thing, it's unfortunate that it's ultimately just very stupid.

    And it seems to have gotten worse since the new code was installed- I get 503 errors at the top of every hour now on slashdot.
  • by el-spectre ( 668104 ) on Tuesday July 20, 2004 @03:37PM (#9752051) Journal
    Since many clients request the new data every 30 minutes or so... how about a simple system that spreads out the load? A page that, based on some criteria (domain name, IP, random seed, round robin) gives each client a time it should check for updates (i.e. 17 past the hour).

    Of course, this depends on the client to respect the request, but we already have systems that do (robots.txt), and they seem to work fairly well, most of the time.

    • Just thinking about it... all you really need is a script that has a cycling counter from 0-59, and responds to a GET. Take about 2 minutes to write in the language of your choice.
    • by cmdr_beeftaco ( 562067 ) on Tuesday July 20, 2004 @03:50PM (#9752260)
      Bad idea. Everyone knows that most headlines are made at the top of the hour. Thus, A.M. radio always give news headlines "at-the-top-of-hour." RSS reader should be given the same timely updates.
      Related to this is the fact that most traffic accidents happen "on the twenties." Human nature is a curious and seemingly very predictable thing.
    • RSS already supports the <ttl> element type [harvard.edu], which indicates how long a client should wait before looking for an update. Additionally, HTTP servers can provide this information through the Expires header.

      Furthermore, well-behaved clients issue a "conditional GET" that only requests the file if it has been updated, which cuts back on bandwidth quite a bit, as only a short response saying it hasn't been updated is necessary in most cases.

  • by Russ Nelson ( 33911 ) <slashdot@russnelson.com> on Tuesday July 20, 2004 @03:37PM (#9752058) Homepage
    RSS just needs better TCP stacks. Here's how it would work: when your RSS client connects to an RSS server, it would simply leave the connection open until the next time the RSS data got updated. Then you would receive a copy of the RSS content. You simply *couldn't* fetch data that hadn't been updated.

    The reason this needs better TCP stacks is because every open connection is stored in kernel memory. That's not necessary. Once you have the connecting ip, port, and sequence number, those should go into a database, to be pulled out later when the content has been updated.
    -russ
    • Several people have pointed out that you want a schedule. No, you don't. That just foments the stampeding herd of clients. You really want to allow people to connect whenever they want, and then receive data only when you're ready and able to send it back.

      Basically, you use the TCP connection as a subscription. Call it "repeated confirmation of opt-in" if you want. Every time the user re-connects to get the next update (which they will probably do immediately; may as well) that's an indication that th
    • Yeah, because there's nothing like using a sledgehammer to crack a hazlenut.

      For starters, how about the readers play nice and spread their updates around a bit instead of all clamoring at the same time.

    • I'm not sure the server could handle having that many open connections...hence its current process of providing an extremely small text file, creating a connection, transferring the file, and destroying the connection.

      Correct me if I'm wrong.
    • by Salamander ( 33735 ) <jeff AT pl DOT atyp DOT us> on Tuesday July 20, 2004 @03:58PM (#9752372) Homepage Journal

      Leaving thousands upon thousands of connections open on the server is a terrible idea no matter how well-implemented the TCP stack is. The real solution is to use some sort of distributed mirroring facility so everyone could connect to a nearby copy of the feed and spread the load. The even better solution would be to distribute asynchronous update notifications as well as data, because polling always sucks. Each client would then get a message saying "xxx has updated, please fetch a copy from your nearest mirror" only when the content changes, providing darn near optimal network efficiency.

      • Sigh. You don't get it, do you? You suggest different protocols, when TCP works just fine. The reason you want to stay with TCP is because of the infinite number of ways people have implemented TCP. Just as one HUGE example, consider all the people behind NAT. How are you going to "distribute asynchronous update notifications"?

        I'd like to hear one person, just one person say "Hmmm.... I wonder why Russ didn't suggest asynchronous update notifications?" And then maybe go on to answer themselves by sa
    • Reimplementing TCP using a database is excessive. Making a light connectionless protocol that does similar to what you described would be a lot simpler and not require reimplementing everyone's TCP stack.

      Also, as much as I hate the fad of labelling everything P2P, having a P2P-ish network for this would help, too. The original server can just hand out MD5's, and clients propagate the actual text throughout the network.

      Of course (and this relates to the P2P stuff), every newfangled toy these days is ju

    • by Anonymous Coward
      Yeah, just use a database backend for TCP, good idea. Oh! I know! Lets use XML instead! Jesus christ, if you are this stupid, just shut your hole. Don't propose retarded solutions to problems you don't understand just cause you are bored.
    • Need better TCP stacks? I don't understand why you would break standard TCP in order to accomplish a scattered update. If it's really an issue, the server should just set a hard limit on hits to uselessinfo.rss and cease to return anything other than a 20-byte error message once that limit has been reached. There's _much_ more potential for dDOS with a modified TCP stack. Sounds like you learned a little bit about protocols in your compsci class and now everything is a TCP issue -- the problem is poor c
  • RSS readers and aggregators shouldn't gather new feeds every hour on the hour. They should gather them when the application is first run and then every hour after that (probably not on the hour). I'd hope most GUI applications already run this way. I guess most of this traffic just comes from daemon processes -- and that should be changed.
  • Idea (Score:5, Interesting)

    by iamdrscience ( 541136 ) on Tuesday July 20, 2004 @03:38PM (#9752073) Homepage
    Well maybe somebody should set something up to syndicate RSS feeds via a peer to peer service. BitTorrent would work, but it could be improved upon (people would still be grabbing a torrent every hour, so it wouldn't completely solve the problem).
    • Re:Idea (Score:5, Interesting)

      by ganhawk ( 703420 ) on Tuesday July 20, 2004 @03:53PM (#9752296)
      You could have a system based on JXTA. Instead of the bittorrent model, it would be something like the P2P Radio. When the user asks for feed, a neigbour who just recived it can give it to the user (overlay network, JXTA based) or the server can point to one of the users who just received it.(similar to bittorrent but user gets whole file from peer intead of parts. The user also does not come back to server at all, if transfer is successfull. But the problem is this user need not serve others and can just leech)

      I feel overlay netwrok scheme would work better than Bittorrent/tracker based system. In overlay network scheme each group of network will have its own ultra peer (JXTA rendezvous) which acts as tracker for all files in that network. I wanted to do this for slashdot effect (p2pbridge.sf.net) but somehow the project has been delayed for long.

  • Why have developers made their RSS readers so that they query the master site at each hour sharp? Why haven't they done it like Opera or Konqueror, e.g. query the server every sixty minutes after the application has been started?

    Or did the RSS reader authors hope that their applications wouldn't be used by anybody except for a few geeks?

    • Won't help (Score:2, Interesting)

      by Animats ( 122034 )
      Doesn't matter. If lots of people poll every hour, eventually the polls will synch up. There's a neat phase-locking effect. Van Jacobson analyzed this over twenty years ago.

      We have way too much traffic from dumb P2P schemes today, considering the relatively small volume of new content being distributed.

    • At first i didn't understood the article...
      You mean RSS readers are programmed to fetch the feed at hour xx:00??

      That's fantastic

      Some programmers should be shot...
    • I believe that's what SharpReader does. One thing I personally do is adjust the refresh rate for each feed from the one hour default. There's no point in banging on a feed every hour when it changes a few times a week.

      One good idea would be for the protocols to allow each feed to suggest a default refresh rate. That way slow changing or overloaded sites could ask readers to slow down a little. A minimum refresh warning rate would be good too. (i.e. Refreshing faster than that rate might get you nuked.) I k

  • by SuperBanana ( 662181 ) on Tuesday July 20, 2004 @03:39PM (#9752090)

    ...is what one would say to the designers of RSS.

    Mainly, IF your client is smart enough to communicate that it only needs part of the page, guess what? The pages, especially after gzip compression(which, including with mod_gzip, can be done ahead of time)...the real overhead is all the nonsense, both on a protocol level and for the server in terms of CPU time, of opening+closing a TCP connection.

    It's also the fault of the designers for not including strict rules as part of the standard for how frequently the client is allowed to check back, and, duh, the client shouldn't be user-configured to check at common times, like on the hour.

    Bram figured this out with BitTorrent- the server can instruct the client on when it should next check back.

    • No, it's not necessary to add scheduling. All that's needed is better TCP stacks which can handle millions of concurrent open connections. Presumably this would happen in a database in userland, and not in the kernel.
      -russ
    • It's also the fault of the designers for not including strict rules as part of the standard for how frequently the client is allowed to check back

      Inevitably the most popular clients are the most poorly-written ones which ignore as much of the spec as possible. Telling them what they should do is useless, because they don't listen.

      As an example, consider all the broken BitTorrent implementations out there.
  • it's the PULL,stupid (Score:4, Interesting)

    by kisrael ( 134664 ) * on Tuesday July 20, 2004 @03:40PM (#9752099) Homepage
    "Despite 'only' being XML, RSS is the driving force fulfilling the Web's original promise: making the Web useful in an exciting, real-time way."

    Err, did I miss the meeting where that was declared as the Web's original promise?

    Anyway, the trouble is pretty obvious: RSS is just a polling mechanism to do fakey Push. (Wired had an interesting retrospective on their infamous "PUSH IS THE FUTURE" hand cover about PointCast.) And that's expensive, the cyber equivalent of a hoarde of screaming children asking "Are we there yet? Are we there yet? How about now? Are we there yet now? Are we there yet?" It would be good if we had an equally widely used "true Push" standard, where remote clients would register as listeners, and then the server can actually publish new content to the remote sites. However, in today's heavily firewall'd internet, I dunno if that would work so well, especially for home users.

    I dunno. I kind of admit to not really grokking RSS, for me, the presentation is too much of the total package. (Or maybe I'm bitter because the weird intraday format that emerged for my own site [kisrael.com] doesn't really lend itself to RSS-ification...)
  • It's 'simple,' stupid. :)
  • Proposed Solution (Score:2, Interesting)

    by Dominatus ( 796241 )
    Here's a solution: Have the RSS readers grab data every hour or half hour starting from when they are started up, not on the hour. This would of course distribute the "attacks" on the server.
  • I'd like to dispute the characterization of my client as stupid.

    I'd really, really like to.

    Obviously, I can't, but boy would I like to.

    Stupid RSS.

  • In any commons, co-operation is key. I doubt most people will update their clients to work with HEAD or some sort of checksumming without reason, so the first obvious step is to block clients for a period. If a client retrieves information from a host, place a bam on all requests from said client until either the information changes, or there is a timeout value.

    On the client side, the software needs to be written to check for updates to the data before pulling the data. This will lessen the burder.

    The
  • Why not have rss readers that check on startup, then check again at user specified intervals.. After a random amount of time has past.
    user starts program at 3.15 and it checks rss feed.
    user sets check interval to 1 hour.
    rand()%60 minutes later (let's say 37) it checks feed
    every hour after that it checks the feed.

    simplistic sure, but isn't rss in general?

    on an aside, any of you (few) non-programmers interested in creating rss feeds, i put out some software that facilitates it.
    hunterdavis.com/ssrss.html
  • Push, not pull! (Score:5, Interesting)

    by mcrbids ( 148650 ) on Tuesday July 20, 2004 @03:46PM (#9752192) Journal
    The basic problem with RSS is that it's a "pull" method - RSS clients have to make periodic requests "just to see". Also, there's no effective way to mirror content.

    That's just plain retarded.

    What they *should* do...

    1) Content should be pushed from the source, so only *necessary* traffic is generated. It should be encrypted with a certificate so that clients can be sure they're getting content from the "right" server.

    2) Any RSS client should also be able to act as a server, NTP style. Because of the certificate used in #1, this could be done easily while still ensuring that the content came from the "real" source.

    3) Subscription to the RSS feed could be done on a "hand-off" basis. In other words, a client makes a request to be added to the update pool on the root RSS server. It either accepts the request, or redirects the client to one its already set up clients. Whereupon the process starts all over again. The client requests subscription to the service, and the request is either accepted or deferred. Wash, rinse, repeat until the subscription is accepted.

    The result of this would be a system that could scale to just about any size, easily.

    Anybody want to write it? (Unfortunately, my time is TAPPED!)
    • Re:Push, not pull! (Score:3, Interesting)

      by stratjakt ( 596332 )
      Too many firewalls in todays world for "push" anything to work.

      Too many upstream bandwidth restrictions, especially on home connections. Last thing people want is getting AUPped because they're mirroring slashdot headlines.

      My solution? Multicast IPs. Multicast IPs solve every problem that's ever been encountered by mankind. Join Multicast, listen till you've heard all the headlines (which repeat ad nauseum), move on with life. Heck, keep listening if ya want. All we have to do is make it work.

      Frank
    • Re:Push, not pull! (Score:4, Informative)

      by laird ( 2705 ) <lairdp@gmail.TWAINcom minus author> on Tuesday July 20, 2004 @04:37PM (#9752903) Journal
      The ICE syndication protocol has solved this. See http://www.icestandard.org.

      The short version is that ICE is far more bandwidth efficient than RSS because:
      - the syndicator and subscriber can negotiate whether to push or pull the content. So if the network allows for true push, the syndicator can push the updates, which is most efficient. This eliminates all of the "check every hour" that crushes RSS syndicators. And while many home users are behind NAT, web sites aren't, and web sites generate tons of syndication traffic that could be handled way more efficiently by ICE. Push means that there are many fewer updates transmitted, and that the updates that are sent are more timely.
      - ICE supports incremental updates, so the syndicator can send only the new or changed information. This means that the updates that are transmitted are far more efficient. For example, rather than responding to 99% of updates with "here are the same ten stories I sent you last time" you can reply with a tiny "no new stories" message.
      - ICE also has a scheduling mechanism, so you can tell a subscriber exactly how often you update (e.g. hourly, 9-5, Monday-Friday). This means that even for polling, you're not wasting resources being polled all night. This saves tons of bandwidth for people doing pull updates.
  • by Misch ( 158807 ) on Tuesday July 20, 2004 @03:46PM (#9752193) Homepage
    I seem to remember Windows scheduler being able to randomize scheduled event times within a 1 hour period. I think our RSS feeders need similar functions.
  • Comment removed (Score:5, Interesting)

    by account_deleted ( 4530225 ) on Tuesday July 20, 2004 @03:54PM (#9752302)
    Comment removed based on user account deletion
  • by wfberg ( 24378 ) on Tuesday July 20, 2004 @03:54PM (#9752310)
    Complaining about people connecting to your RSS feeds "impolitely" is missing the mark a bit, I think. Even RSS readers that *do* check when the file was last changed, still download the entire feed when so much as a single character has changed.

    There used to be a system where you could pull a list of recently posted articles off of a server that your ISP had installed locally, and only get the newest headers, and then decide which article bodies to retrieve.. The articles could even contain rich content, like HTML and binary files. And to top it off, articles posted by some-one across the globe were transmitted from ISP to ISP, spreading over the world like an expanding mesh.

    They called this.. USENET..

    I realize that RSS is "teh hotness" and Usenet is "old and busted", and that "push is dead" etc. But for Pete's sake, don't send a unicast protocol to do a multicast (even if it is at the application layer) protocol's job!

    It would of course be great if there was a "cache" hierarchy on usenet. Newsgroups could be styled after content providers URLs (e.g. cache.com.cnn, cache.com.livejournal.somegoth) and you could just subscribe to crap that way. There's nothing magical about what RSS readers do that the underlying stuff has to be all RRS-y and HTTP-y..

    For real push you could even send the RSS via SMTP, and you could use your ISPs outgoing mail server to multiply your bandwidth (i.e. BCC).
    • by fiftyvolts ( 642861 ) <mtoia@noSPAm.fiftyvolts.com> on Tuesday July 20, 2004 @04:28PM (#9752785) Homepage Journal

      You make some very good points. The old saying "When all you have is a hammer, everything looks like a nail" seems to ring true time and time again. These days it seems that everyone wants to use HTTP for everything and quite frankly it's not equipped to do that.

      RSS over SMTP sounds pretty cool. Heck, just sending a list of subscribers an email of RSS and let their mail clients sort it out would be pretty nice.

      Heh, my favorite posts are when some one suggested soething that sonuds totally novel and then someone else points our "Yeah! Like $lt;insert old and undeused technology>. It seems to do that damn well." The internet cannot forget its roots!

  • The main problem with most RSS feeds is that they update all information. Most of these run off a simple JavaScript that will run on a timer to get all the data again and again. A better solution would be to implement an XML RSS (or any language really) that uses a simple ID system for news items. When its time to update the news feed, find any new ID's existing; don't retrieve existing data, only new data. This would cut down a large chunk of bandwidth. A better idea would be to implement some type of
  • by ToadMan8 ( 521480 ) on Tuesday July 20, 2004 @03:59PM (#9752399)
    /. is especially pissy with this but I want breaking news, not whatever is new each hour. So I hammer the shit out of it with my client (and get banned). I'd like to see a service where I download one client (that has front-ends in Gnome pannel, the Windows tray, etc.) that the site (/., cspan, etc.) _pushes_ new updates to when I sign up. Those w/ dynamic IPs could, when they sign on, have their client automagically connect to a server that holds their unique user ID with their IP.
  • Using the distributed DNS system, or a system like DNS, we can push RSS content down to local servers. You still have go to to the site for the actual content, but the headlines are distributed.

    This woul dbe an ideal solution, since most RSS feeds are a few K. There's room for a lot of RSS in 1 megabyte.

    Of course, a caching proxy server would do the same thing.
  • "Speedfeed" is such a useful thing, it's unfortunate that it's ultimately just very stupid.

    Yeah, it is stupid, which is why most of us just call it RSS.
  • by PCM2 ( 4486 ) on Tuesday July 20, 2004 @04:05PM (#9752465) Homepage
    Am I the only one who finds it easier to get the information I want from the home pages of the sites I trust, rather than relying on an RSS feed? For one thing, in an RSS feed every story has the same priority ... stories keep coming in and I have no idea which ones are "bigger" than others. Sites like News.com, on the other hand, follow the newspaper's example of printing the headlines for the more important stories bigger. With RSS, it's just information overload, especially with the same stories duplicated at different sources, etc. Everyone seems really excited about RSS, but when I tried it I just couldn't figure out how to use it such that it would actually give me some real value vs. the resources I already have.
  • Hmmm. I'm neck-deep in DNS code anyway [doxpara.com]; is there any interest in a protocol that would encode update times -- probably not the updates themselves -- in DNS?

    The concept is that every time you updated your blog, you'd do a Dynamic DNS push to a RSS name, say, rss.www.slashdot.org's TXT record, containing the Unix time in seconds of the last update (alternatively, and this is how I'd probably implement it in my custom server, lookups to rss.www.slashdot.org would cause a date-check on the entry). The TTL of
    • Actually, I'll make this even easier, and use the fact that 32 bit IP addresses fit 32 bit unix timestamps juuuuuuuust fine. So you'd do a gethostbyname and recast the response!

      --Dan
  • Better than doing a HEAD first to see if the feed has been udpated is to use the If-Modified-Since and/or ETag headers. If the feed hasn't been updated, the server sends a very small response saying so (roughly the same size as the response to HEAD), and doesn't send the whole feed--that all happens in one request/response. Doing HEAD first, and then GET if the feed has been udpated requires two requests and two responses any time the feed has been updated.
  • Wasn't there a /. article a while back about one of the ntp servers out there (some .edu in Washington IIRC) that was getting DDoS'd by a bunch of home-user grade DSL/Cable routers updating their clocks all the time? Isn't this basically the same problem?
  • There's a variety of ways to deal with this issue. The solution many seem to be suggesting is to randomize request times so that there aren't big spikes in traffic every hour at the hour. That's certainly a good idea. Clients should also respect the ttl [feedvalidator.org] (polling at the interval that is listed in the feed), support conditional GET [nwfusion.com], and handle 304 (not modified) responses to minimize the number of requests they make for the full feed.

    But the primary solution will end up being caching. With the exception
  • To quote their webpage:

    a blog with unlimited bandwidth

    blogs are software systems that allow you to easily post a series of documents to your website over time. Many people use blogs to display daily thoughts, rants, news stories, or pictures. If you run a blog, your readers can return to your site regularly to see the new content that you have posted. Before blogs came along, maintaining a website (and updating it regularly) was a relatively tedious process. Some might call blogging a social revolution--
  • Others have already mentioned that RSS is an attempt to fake a "push" in a technology that is all "pull".

    I have what to my 10 minutes of thought on the subject appears to be a better solution - every web site that currently publishes an RDF page should instead push new entries to an NNTP newsgroup. I'd suggest that a heirarchy be created for it, then sort of a reverse of the URL for the group name, like rdf.org.slashdot or rdf.uk.co.thregister. Then the articles get propogated in a distributed manner and
  • We need a way to make RSS scale, the sooner the better before the mainstream browsers make it easy for a hundred million people to subscribe to a popular feed. Distributing feeds around using something like NNTP so that users can poll a server near them and let the new items propagate out to closer servers, rather than every user polling the source.

    Read this [stevex.org] for some more thoughts on this..

  • Publish/Subscribe (Score:5, Informative)

    by dgp ( 11045 ) on Tuesday July 20, 2004 @04:19PM (#9752656) Journal
    That is mind bogglingly inefficient. Its like POP clients checking for new email every X minutes. Polling is wrong wrong wrong! Check out the select() libc call. Does the linux kernel go into a busy wait loop listening for every ethernet packet? no! it gets interrupted when a packet it ready!

    http://www.mod-pubsub.org/ [mod-pubsub.org]
    The apache module mod_pubsub might be a solution.

    From the mod_pubsub FAQ:
    What is mod_pubsub?

    mod_pubsub is a set of libraries, tools, and scripts that enable publish and subscribe messaging over HTTP. mod_pubsub extends Apache by running within its mod_perl Web Server module.

    What's the benefit of developing with mod_pubsub?

    Real-time data delivery to and from Web Browsers without refreshing; without installing client-side software; and without Applets, ActiveX, or Plug-ins. This is useful for live portals and dashboards, and Web Browser notifications.

    Jabber also saw a publish/subscribe [jabber.org] mechanism as an important feature.
  • Common Sense? (Score:4, Informative)

    by djeaux ( 620938 ) on Tuesday July 20, 2004 @04:27PM (#9752772) Homepage Journal
    I publish 15 security-related RSS feeds (scrapers) at my website. In general, they are really small files, so bandwidth is usually not an issue for me. I do publish the frequency with which the feeds are refreshed (usually once per hour).

    I won't argue with those who have posted here that some alternative to the "pull" technology of RSS would be very useful. But...

    The biggest problem I see isn't newsreaders but blogs. Somebody throws together a blog, inserts a little gizmo to display one of my feeds & then the page draws down the RSS every time the page is reloaded. Given the back-and-forth nature of a lot of folks' web browsing pattern, that means a single user might draw down one of my feeds 10-15 times in a 5 minute span. Now, why couldn't the blogger's software be set to load & cache a copy of the newsfeed according to a schedule?

    The honorable mention for RSS abuse goes to the system administrator who set up a newreader screen saver that pulled one of my feeds. He then installed the screen saver on every PC in every office of his company. Every time the screen saver activated, POW! one feed drawn down...

  • Not to flame... (Score:4, Interesting)

    by T3kno ( 51315 ) on Tuesday July 20, 2004 @04:42PM (#9752973) Homepage
    But isn't this what TCP/IP multicast was invented for? I've never really understood why multicast has never really taken off. Too complicated? Instead of entering an rss server to pull from just join a multicast group and have the RSS blasted once every X minutes. Servers could even send out updates more often because there are only a few connections to send to. Of course I could be completely wrong and multicast may be the absolute wrong choice for this sort of application, it's been a while since I've read any documentation about it.

    • Re:Not to flame... (Score:3, Informative)

      by stienman ( 51024 )
      The practical problem with multicast was that it requires an intelligent network and dumb clients. In other words: routers have to be able to keep a table of information on which links to relay multicast information, and that has to be dynamically updated periodically.

      There is a multicast overlay on top of the internet which consists of routers that can handle this load.

      But the combination of no hardware/software support in the network, and no real huge push for this technology left multicast high an
  • by Orasis ( 23315 ) on Tuesday July 20, 2004 @04:51PM (#9753075)
    The main problem here is that RSS lacks any sort of distributed flow control, much as the Internet did back in the early days with tons of UDP packets flying around everywhere and periodically bringing networks to their knees.

    One completely backwards-compatible fashion to add flow-control to RSS would be to use the HTTP 503 response when server load is getting too high for your RSS files. The server simply sends an HTTP 503 response with a Retry-After header indicating how long the requesting client should wait before retrying.

    Clients that ignore the retry interval or are overly aggressive could be punished by further 503 responses thus basically denying those aggressive clients access to the RSS feeds. Users of overly aggressive clients would soon find that they actually provide less fresh results and would place pressure on implementors to fix their implementations.
  • Told Ya So (Score:4, Interesting)

    by cmacb ( 547347 ) on Tuesday July 20, 2004 @05:27PM (#9753502) Homepage Journal
    I think this was more or less the first thought I had about RSS when I first looked into it and found out that it was a "pull" technology rather than a "push" as the early descriptions of it implied.

    Yes, it's "cool" that I can set up a page (or now use a browser plug-in) to automatically get a lot of content from hundreds of web pages at a time when I really opened up the browser to check my e-mail.

    What would have REALLY, been cool would be some sort of technology that would notify me when something CHANGED. No effort on my part, no *needless* effort on the servers part.

    Oh wait... We HAD that didn't we, I think they were called Listservers, and they worked just fine. (Still do actually as I get a number of updates, including Slashdot, that way.) RSS advocates (and I won't mention any names) keep making pronouncements like "e-mail s dead!" simply because they have gotten themselves and their hosting companies on some black hole lists. Cry me a river now that your bandwidth costs are going through the roof and yet nobody is clicking though on your web page ads, because, guess what? Nobody is visiting your page. They have all they need to know about your updates via your RSS feeds.

Neutrinos have bad breadth.

Working...