Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
The Internet

When RSS Traffic Looks Like a DDoS 443

An anonymous reader writes "Infoworld's CTO Chad Dickerson says he has a love/hate relationship with RSS. He loves the changes to his information production and consumption, but he hates the behavior of some RSS feed readers. Every hour, Infoworld "sees a massive surge of RSS newsreader activity" that "has all the characteristics of a distributed DoS attack." So many requests in such a short period of time are creating scaling issues. " We've seen similiar problems over the years. RSS (or as it should be called, "Speedfeed") is such a useful thing, it's unfortunate that it's ultimately just very stupid.
This discussion has been archived. No new comments can be posted.

When RSS Traffic Looks Like a DDoS

Comments Filter:
  • netcraft article (Score:5, Informative)

    by croddy ( 659025 ) on Tuesday July 20, 2004 @03:34PM (#9752010)
    another article [netcraft.com]
  • Simple HTTP Solution (Score:3, Informative)

    by inertia187 ( 156602 ) * on Tuesday July 20, 2004 @03:34PM (#9752020) Homepage Journal
    The readers should HEAD to see if the last modified changed... And the feed rendering engines should make sure their last modified is accurate.
  • Call me stupid (Score:5, Informative)

    by nebaz ( 453974 ) on Tuesday July 20, 2004 @03:35PM (#9752031)
    This [xml.com] is helpful.
  • We've seen similiar problems over the years. RSS (or as it should be called, "Speedfeed") is such a useful thing, it's unfortunate that it's ultimately just very stupid.

    And it seems to have gotten worse since the new code was installed- I get 503 errors at the top of every hour now on slashdot.
  • by SuperBanana ( 662181 ) on Tuesday July 20, 2004 @03:39PM (#9752090)

    ...is what one would say to the designers of RSS.

    Mainly, IF your client is smart enough to communicate that it only needs part of the page, guess what? The pages, especially after gzip compression(which, including with mod_gzip, can be done ahead of time)...the real overhead is all the nonsense, both on a protocol level and for the server in terms of CPU time, of opening+closing a TCP connection.

    It's also the fault of the designers for not including strict rules as part of the standard for how frequently the client is allowed to check back, and, duh, the client shouldn't be user-configured to check at common times, like on the hour.

    Bram figured this out with BitTorrent- the server can instruct the client on when it should next check back.

  • by ry4an ( 1568 ) <ry4an-slashdot@ry[ ].org ['4an' in gap]> on Tuesday July 20, 2004 @03:48PM (#9752231) Homepage
    Better than that they should use the HTTP 2616 If-Modified-Since: header in their GETs as specified in section 14.25. That way if it has changed they don't have to do a subsequent GET.

    Someone did a nice write-up about doing so [pastiche.org] back in 2002.

  • by Dr. Sp0ng ( 24354 ) <mspong.gmail@com> on Tuesday July 20, 2004 @03:49PM (#9752242) Homepage
    On Windows I use RSS Bandit [rssbandit.org]. Haven't found a non-sucky one for *nix, although I haven't looked all that hard. On OS X I use NetNewsWire [ranchero.com], which while not great, does the job.
  • by Eslyjah ( 245320 ) on Tuesday July 20, 2004 @03:54PM (#9752307)
    If you're using NetNewsWire on OS X, try the Atom Beta [ranchero.com], which, I'm sure it will come as no shock to you, adds support for Atom feeds.
  • by johnbeat ( 685167 ) on Tuesday July 20, 2004 @03:55PM (#9752320) Homepage
    So, he's writing from infoworld and complaining that RSS feed readers grab feeds whether the data has changed or not. So, I went to look for infoworld's RSS feeds. Found them at:

    http://www.infoworld.com/rss/rss_info.html

    Trying the top news feed, got back:

    date -u ; curl --head http://www.infoworld.com/rss/news.xml
    Tue Jul 20 19:51:44 GMT 2004
    HTTP/1.1 200 OK
    Date: Tue, 20 Jul 2004 19:48:30 GMT
    Server: Apache
    Accept-Ranges: bytes
    Content-Length: 7520
    Content-Type: text/html; charset=UTF-8

    How do I write an RSS reader that only downloads this feed if the data has changed?

    Jerry
  • by bbh ( 210459 ) on Tuesday July 20, 2004 @04:09PM (#9752519)
    I'm using Liferea [sourceforge.net] version 0.5.1 under Linux right now. Compiles from source fine on Fedora Core 2 and has worked great for me so far.

    bbh
  • Publish/Subscribe (Score:5, Informative)

    by dgp ( 11045 ) on Tuesday July 20, 2004 @04:19PM (#9752656) Journal
    That is mind bogglingly inefficient. Its like POP clients checking for new email every X minutes. Polling is wrong wrong wrong! Check out the select() libc call. Does the linux kernel go into a busy wait loop listening for every ethernet packet? no! it gets interrupted when a packet it ready!

    http://www.mod-pubsub.org/ [mod-pubsub.org]
    The apache module mod_pubsub might be a solution.

    From the mod_pubsub FAQ:
    What is mod_pubsub?

    mod_pubsub is a set of libraries, tools, and scripts that enable publish and subscribe messaging over HTTP. mod_pubsub extends Apache by running within its mod_perl Web Server module.

    What's the benefit of developing with mod_pubsub?

    Real-time data delivery to and from Web Browsers without refreshing; without installing client-side software; and without Applets, ActiveX, or Plug-ins. This is useful for live portals and dashboards, and Web Browser notifications.

    Jabber also saw a publish/subscribe [jabber.org] mechanism as an important feature.
  • by anomalous cohort ( 704239 ) on Tuesday July 20, 2004 @04:19PM (#9752660) Homepage Journal
    it's unfortunate that it (RSS) is ultimately just very stupid.

    The folks over at Netscape and/or UserLand should have studied the CDF [w3.org] standard first. Then they would have realized the value of specifying schedule information.

  • Common Sense? (Score:4, Informative)

    by djeaux ( 620938 ) on Tuesday July 20, 2004 @04:27PM (#9752772) Homepage Journal
    I publish 15 security-related RSS feeds (scrapers) at my website. In general, they are really small files, so bandwidth is usually not an issue for me. I do publish the frequency with which the feeds are refreshed (usually once per hour).

    I won't argue with those who have posted here that some alternative to the "pull" technology of RSS would be very useful. But...

    The biggest problem I see isn't newsreaders but blogs. Somebody throws together a blog, inserts a little gizmo to display one of my feeds & then the page draws down the RSS every time the page is reloaded. Given the back-and-forth nature of a lot of folks' web browsing pattern, that means a single user might draw down one of my feeds 10-15 times in a 5 minute span. Now, why couldn't the blogger's software be set to load & cache a copy of the newsfeed according to a schedule?

    The honorable mention for RSS abuse goes to the system administrator who set up a newreader screen saver that pulled one of my feeds. He then installed the screen saver on every PC in every office of his company. Every time the screen saver activated, POW! one feed drawn down...

  • by JimDabell ( 42870 ) on Tuesday July 20, 2004 @04:31PM (#9752827) Homepage

    RSS already supports the <ttl> element type [harvard.edu], which indicates how long a client should wait before looking for an update. Additionally, HTTP servers can provide this information through the Expires header.

    Furthermore, well-behaved clients issue a "conditional GET" that only requests the file if it has been updated, which cuts back on bandwidth quite a bit, as only a short response saying it hasn't been updated is necessary in most cases.

  • Re:Push, not pull! (Score:4, Informative)

    by laird ( 2705 ) <lairdp@@@gmail...com> on Tuesday July 20, 2004 @04:37PM (#9752903) Journal
    The ICE syndication protocol has solved this. See http://www.icestandard.org.

    The short version is that ICE is far more bandwidth efficient than RSS because:
    - the syndicator and subscriber can negotiate whether to push or pull the content. So if the network allows for true push, the syndicator can push the updates, which is most efficient. This eliminates all of the "check every hour" that crushes RSS syndicators. And while many home users are behind NAT, web sites aren't, and web sites generate tons of syndication traffic that could be handled way more efficiently by ICE. Push means that there are many fewer updates transmitted, and that the updates that are sent are more timely.
    - ICE supports incremental updates, so the syndicator can send only the new or changed information. This means that the updates that are transmitted are far more efficient. For example, rather than responding to 99% of updates with "here are the same ten stories I sent you last time" you can reply with a tiny "no new stories" message.
    - ICE also has a scheduling mechanism, so you can tell a subscriber exactly how often you update (e.g. hourly, 9-5, Monday-Friday). This means that even for polling, you're not wasting resources being polled all night. This saves tons of bandwidth for people doing pull updates.
  • PulpFiction (Score:3, Informative)

    by Cadre ( 11051 ) on Tuesday July 20, 2004 @04:44PM (#9752998) Homepage

    I recommend PulpFiction for an RSS/Atom reader on OS X [freshsqueeze.com]. I much prefer the interface and how it treats the news compared to NNW.

  • by timothyf ( 615594 ) on Tuesday July 20, 2004 @04:49PM (#9753035) Homepage
    If you don't use one computer all the time and you want to check your feeds from other places, I'd recommend going with a web-based news-agreggation service. I personally use BlogLines [bloglines.com], but there are other services out there as well.
  • by Fat Cow ( 13247 ) on Tuesday July 20, 2004 @04:58PM (#9753165)

    I think that the problem is the peak load - unfortunately the rss readers all download at the same time (they should be more uniformly distributed within the minimum update period). This means that you have to design your system to cope with the peak load, but then all that capacity is sitting idle the rest of the time.

    The electricity production system has the same problem

  • Re:Idea (Score:2, Informative)

    by kingman ( 710269 ) on Tuesday July 20, 2004 @04:58PM (#9753167)
    Shrook [fondantfancies.com] for Mac OS X appears to do almost that, where a central server collects updates and has ONE randomly-chosen client check for updates as frequently as every five minutes, but all other clients just refer to the central server to see if feeds are updated.
  • Re:Oh, come on (Score:3, Informative)

    by Mitchell Mebane ( 594797 ) on Tuesday July 20, 2004 @05:10PM (#9753299) Homepage Journal
    Or maybe something like this [pastiche.org].
  • Re:Not to flame... (Score:2, Informative)

    by Ernesto Alvarez ( 750678 ) on Tuesday July 20, 2004 @05:23PM (#9753443) Homepage Journal
    TCP cannot multicast. It's impossible due to its connection oriented, two way properties.

    IP can multicast, but it needs support from the network to do that. The problem with that is that the internet is not under one authority that can say "from today onwards, we do multicast in such and such way". There have been experiments with multicasting (mbone), but there are some things that cannot be solved easily (eg. how do you register as a multicast client, and (important part here) how do you make every router from source to destination know about it, and act accordingly (remember, those routers are NOT under the same authority). So, even when you could multicast with UDP/IP, some logistics problems make it very difficult to do it.

    However, within an autonomous system (which IS under a single authority) you could multicast, provided there is support provided by the net, in fact, both standard routing protocols (OSPF and RIP) as well as NTP can, and have multicast groups assigned to them.

    It's too bad, but that's how the real world is....
  • Re:Not to flame... (Score:1, Informative)

    by Anonymous Coward on Tuesday July 20, 2004 @05:43PM (#9753684)
    there's another more practical reason why multicast is not supported over the internet; it can be very easily used to do a DDoS attack.

    Imagine being able to send a ping with a forged return header to the IP address *.*.*.* and getting four billion replies sent to the person who owns the forged address.
  • by blowdart ( 31458 ) on Tuesday July 20, 2004 @05:44PM (#9753694) Homepage
    You're missing the point I assume the original poster was making.

    Not all web servers provide last-modified or etag headers. Infoworld doesn't, so even a well written RSS reader has to bring the whole feed down as they have no way to know if it has changed or not.

  • by Poulpy ( 698167 ) on Tuesday July 20, 2004 @05:53PM (#9753786)
    Neither Windows nor Unix, but I've set up Feed on Feeds [sourceforge.net] on my webserver and I like it!

    It's a "PHP/MySQL server side RSS/Atom aggregator", so you can read your feeds wherever you are, you only need a web browser on the client side.

    Pros:
    1) you don't need to synchronize the state between the multiple workstations you might use.
    2) no platform/os problem on the client side.

    Cons:
    1) you need some web hosting with PHP and MySQL available (I pay 45 a year for my domain name + 30MB Webspace + 30MB FTP + 30MB MySQL base + 100*25MB pop/imap accounts + SSL everywhere).
    2) no installer so you'll need many computing skills to set it up (no that hard).
    3) no automated update, you have to click "Update" so you may miss some news when you offline (see away from any internet access) for a long period...

    Changed my online life as I no longer have to install anything on the client side (usefull when away from your home/office) or have to synchronize my feeds either with some removable storage (my USB key failed after 250+ daily syncs) or through the net (BottomFeeder [cincomsmalltalk.com], a smalltalk implementation which works on every platform I ever came accross, allows to sync with an FTP location).

    Regards,
    Poulpy.
  • by johnbeat ( 685167 ) on Tuesday July 20, 2004 @06:34PM (#9754214) Homepage
    Uh, no.

    Pastiche knows when the document was last modified and can support my writing an rss reader that checks last-modified:

    curl --head http://fishbowl.pastiche.org/nerdfull.xml
    HTTP/1. 1 200 OK
    Date: Tue, 20 Jul 2004 22:16:33 GMT
    Server: Apache/1.3.26 (Unix) Debian GNU/Linux mod_gzip/1.3.19.1a mod_jk/1.1.0
    Last-Modified: Mon, 19 Jul 2004 02:52:46 GMT
    ETag: "28620-8faa-40fb377e"
    Accept-Ranges: bytes
    Content-Length: 36778
    Content-Type: text/xml

    But infoworld does not. As far as I can tell from the headers I displayed in the previous post, infoworld's server does not provide such data. Without the last-modified or etag or something similar, there is no way to ask for a conditional get, because there is nothing to base the conditional on, and most likely the server doesn't know how to compare the conditional anyway since it clearly is not keeping track of when the document was last modified.

    I could easily be getting the syntax wrong, but whenever I request that it only send me the xml feed if it has been last modified in the last fraction of a second, I still get the page back:

    date > datestamp; curl --time-cond datestamp http://www.infoworld.com/rss/news.xml

    This returns a bunch of xml.

    Running the same command on Pastiche's xml feed returns, as I would expect, absolutely nothing:

    date > datestamp; curl --time-cond datestamp http://fishbowl.pastiche.org/nerdfull.xml

    Jerry
  • Re:Not to flame... (Score:3, Informative)

    by stienman ( 51024 ) <adavis&ubasics,com> on Tuesday July 20, 2004 @06:43PM (#9754294) Homepage Journal
    The practical problem with multicast was that it requires an intelligent network and dumb clients. In other words: routers have to be able to keep a table of information on which links to relay multicast information, and that has to be dynamically updated periodically.

    There is a multicast overlay on top of the internet which consists of routers that can handle this load.

    But the combination of no hardware/software support in the network, and no real huge push for this technology left multicast high and dry.

    Brief idea of how multicast works:
    1) A source send out a "I have a multicast feed" to its immediate routers. Those routers 'publish' this feed to their connected routers until every segment on the internet has seen this feed broadcast.
    2) At the end points, individual computers see this message on their segment. They can subscribe to the feed by sending a message to their upstream router. This router places an entry in its table saying, "Someone on segment X wants feed Y, which I get from segment Z" It then sends a subscribe message to the router it got the original broadcast from, which does the same thing on upward until it hits the originating server.
    3) Each router, when it sees a multicast packet, consults its table to see which (if any) segments it should forward the packet to. Eventually the packet makes its way to all the endpoints of the network
    4) The publish broadcast is initiated periodically. Each router also periodically checks the table to see if they haven't received a re-subscribe message since the last publish broadcast. If no one resubscribes then the table entry is not refreshed - there is no unsubscribe, if you no longer want the feed just ignore it and it'll go away if no one else on your segment wants it. Only one subscriber on each segment needs to subscribe, so if I want it and my co worker wants it then if I see his subscribe packet before I send mine out then I won't send mine out since it'll be put on my segment anyway.

    It's quite elegant, but when a router is dealing with 40+Gbps of packets it barely has time to figure out where each packet goes, nevermind statefully inspecting multicast packets and forwarding them appropiately. Not impossible, but it hasn't been rolled out and few providers see any money in supporting it.

    -Adam

Those who can, do; those who can't, write. Those who can't write work for the Bell Labs Record.

Working...