Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
The Internet

When RSS Traffic Looks Like a DDoS 443

An anonymous reader writes "Infoworld's CTO Chad Dickerson says he has a love/hate relationship with RSS. He loves the changes to his information production and consumption, but he hates the behavior of some RSS feed readers. Every hour, Infoworld "sees a massive surge of RSS newsreader activity" that "has all the characteristics of a distributed DoS attack." So many requests in such a short period of time are creating scaling issues. " We've seen similiar problems over the years. RSS (or as it should be called, "Speedfeed") is such a useful thing, it's unfortunate that it's ultimately just very stupid.
This discussion has been archived. No new comments can be posted.

When RSS Traffic Looks Like a DDoS

Comments Filter:
  • Yesterday (Score:3, Interesting)

    by ravan_a ( 222804 ) on Tuesday July 20, 2004 @03:34PM (#9752009)
    Does this have anything to do with /. problems yesterday
  • by xplosiv ( 129880 ) on Tuesday July 20, 2004 @03:34PM (#9752019)
    Can't one just write a small php script or something which returns an error (i.e. 500), less data to send back, and hopefully the reader would just try again later.
  • by Rezonant ( 775417 ) on Tuesday July 20, 2004 @03:35PM (#9752026)
    ...so could someone recommend a couple of really good ones for Windows and *nix?
  • by el-spectre ( 668104 ) on Tuesday July 20, 2004 @03:37PM (#9752051) Journal
    Since many clients request the new data every 30 minutes or so... how about a simple system that spreads out the load? A page that, based on some criteria (domain name, IP, random seed, round robin) gives each client a time it should check for updates (i.e. 17 past the hour).

    Of course, this depends on the client to respect the request, but we already have systems that do (robots.txt), and they seem to work fairly well, most of the time.

  • by Anonymous Coward on Tuesday July 20, 2004 @03:37PM (#9752054)
    Why not make it standard that the starting time is chosen randomly or assigned by the remote site? "Forty-three minutes after the hour is pretty empty, from now on you can check the news at that time" or something similar.
  • by Russ Nelson ( 33911 ) <slashdot@russnelson.com> on Tuesday July 20, 2004 @03:37PM (#9752058) Homepage
    RSS just needs better TCP stacks. Here's how it would work: when your RSS client connects to an RSS server, it would simply leave the connection open until the next time the RSS data got updated. Then you would receive a copy of the RSS content. You simply *couldn't* fetch data that hadn't been updated.

    The reason this needs better TCP stacks is because every open connection is stored in kernel memory. That's not necessary. Once you have the connecting ip, port, and sequence number, those should go into a database, to be pulled out later when the content has been updated.
    -russ
  • Idea (Score:5, Interesting)

    by iamdrscience ( 541136 ) on Tuesday July 20, 2004 @03:38PM (#9752073) Homepage
    Well maybe somebody should set something up to syndicate RSS feeds via a peer to peer service. BitTorrent would work, but it could be improved upon (people would still be grabbing a torrent every hour, so it wouldn't completely solve the problem).
  • it's the PULL,stupid (Score:4, Interesting)

    by kisrael ( 134664 ) * on Tuesday July 20, 2004 @03:40PM (#9752099) Homepage
    "Despite 'only' being XML, RSS is the driving force fulfilling the Web's original promise: making the Web useful in an exciting, real-time way."

    Err, did I miss the meeting where that was declared as the Web's original promise?

    Anyway, the trouble is pretty obvious: RSS is just a polling mechanism to do fakey Push. (Wired had an interesting retrospective on their infamous "PUSH IS THE FUTURE" hand cover about PointCast.) And that's expensive, the cyber equivalent of a hoarde of screaming children asking "Are we there yet? Are we there yet? How about now? Are we there yet now? Are we there yet?" It would be good if we had an equally widely used "true Push" standard, where remote clients would register as listeners, and then the server can actually publish new content to the remote sites. However, in today's heavily firewall'd internet, I dunno if that would work so well, especially for home users.

    I dunno. I kind of admit to not really grokking RSS, for me, the presentation is too much of the total package. (Or maybe I'm bitter because the weird intraday format that emerged for my own site [kisrael.com] doesn't really lend itself to RSS-ification...)
  • Proposed Solution (Score:2, Interesting)

    by Dominatus ( 796241 ) on Tuesday July 20, 2004 @03:41PM (#9752119)
    Here's a solution: Have the RSS readers grab data every hour or half hour starting from when they are started up, not on the hour. This would of course distribute the "attacks" on the server.
  • by cmdr_beeftaco ( 562067 ) on Tuesday July 20, 2004 @03:46PM (#9752190)
    And there is a one word solution, peer to peer. The whole torrent concept is what is needed.
  • Push, not pull! (Score:5, Interesting)

    by mcrbids ( 148650 ) on Tuesday July 20, 2004 @03:46PM (#9752192) Journal
    The basic problem with RSS is that it's a "pull" method - RSS clients have to make periodic requests "just to see". Also, there's no effective way to mirror content.

    That's just plain retarded.

    What they *should* do...

    1) Content should be pushed from the source, so only *necessary* traffic is generated. It should be encrypted with a certificate so that clients can be sure they're getting content from the "right" server.

    2) Any RSS client should also be able to act as a server, NTP style. Because of the certificate used in #1, this could be done easily while still ensuring that the content came from the "real" source.

    3) Subscription to the RSS feed could be done on a "hand-off" basis. In other words, a client makes a request to be added to the update pool on the root RSS server. It either accepts the request, or redirects the client to one its already set up clients. Whereupon the process starts all over again. The client requests subscription to the service, and the request is either accepted or deferred. Wash, rinse, repeat until the subscription is accepted.

    The result of this would be a system that could scale to just about any size, easily.

    Anybody want to write it? (Unfortunately, my time is TAPPED!)
  • by Misch ( 158807 ) on Tuesday July 20, 2004 @03:46PM (#9752193) Homepage
    I seem to remember Windows scheduler being able to randomize scheduled event times within a 1 hour period. I think our RSS feeders need similar functions.
  • Won't help (Score:2, Interesting)

    by Animats ( 122034 ) on Tuesday July 20, 2004 @03:47PM (#9752202) Homepage
    Doesn't matter. If lots of people poll every hour, eventually the polls will synch up. There's a neat phase-locking effect. Van Jacobson analyzed this over twenty years ago.

    We have way too much traffic from dumb P2P schemes today, considering the relatively small volume of new content being distributed.

  • by maharg ( 182366 ) on Tuesday July 20, 2004 @03:47PM (#9752220) Homepage Journal
    RSSOwl - http://rssowl.sourceforge.net/ [sourceforge.net] is pretty good.
  • Re:Idea (Score:5, Interesting)

    by ganhawk ( 703420 ) on Tuesday July 20, 2004 @03:53PM (#9752296)
    You could have a system based on JXTA. Instead of the bittorrent model, it would be something like the P2P Radio. When the user asks for feed, a neigbour who just recived it can give it to the user (overlay network, JXTA based) or the server can point to one of the users who just received it.(similar to bittorrent but user gets whole file from peer intead of parts. The user also does not come back to server at all, if transfer is successfull. But the problem is this user need not serve others and can just leech)

    I feel overlay netwrok scheme would work better than Bittorrent/tracker based system. In overlay network scheme each group of network will have its own ultra peer (JXTA rendezvous) which acts as tracker for all files in that network. I wanted to do this for slashdot effect (p2pbridge.sf.net) but somehow the project has been delayed for long.

  • Re:Yesterday (Score:2, Interesting)

    by afidel ( 530433 ) on Tuesday July 20, 2004 @03:53PM (#9752300)
    Oh how prophetic, I went to check the first reply to your post and slashdot again did the white page thing (top and left borders with a white page and no right border). Earlier today (around noon EST) I was getting nothing but 503's. This new code has not been good to Slashdot.
  • Comment removed (Score:5, Interesting)

    by account_deleted ( 4530225 ) on Tuesday July 20, 2004 @03:54PM (#9752302)
    Comment removed based on user account deletion
  • by wfberg ( 24378 ) on Tuesday July 20, 2004 @03:54PM (#9752310)
    Complaining about people connecting to your RSS feeds "impolitely" is missing the mark a bit, I think. Even RSS readers that *do* check when the file was last changed, still download the entire feed when so much as a single character has changed.

    There used to be a system where you could pull a list of recently posted articles off of a server that your ISP had installed locally, and only get the newest headers, and then decide which article bodies to retrieve.. The articles could even contain rich content, like HTML and binary files. And to top it off, articles posted by some-one across the globe were transmitted from ISP to ISP, spreading over the world like an expanding mesh.

    They called this.. USENET..

    I realize that RSS is "teh hotness" and Usenet is "old and busted", and that "push is dead" etc. But for Pete's sake, don't send a unicast protocol to do a multicast (even if it is at the application layer) protocol's job!

    It would of course be great if there was a "cache" hierarchy on usenet. Newsgroups could be styled after content providers URLs (e.g. cache.com.cnn, cache.com.livejournal.somegoth) and you could just subscribe to crap that way. There's nothing magical about what RSS readers do that the underlying stuff has to be all RRS-y and HTTP-y..

    For real push you could even send the RSS via SMTP, and you could use your ISPs outgoing mail server to multiply your bandwidth (i.e. BCC).
  • Re:Push, not pull! (Score:3, Interesting)

    by stratjakt ( 596332 ) on Tuesday July 20, 2004 @03:58PM (#9752365) Journal
    Too many firewalls in todays world for "push" anything to work.

    Too many upstream bandwidth restrictions, especially on home connections. Last thing people want is getting AUPped because they're mirroring slashdot headlines.

    My solution? Multicast IPs. Multicast IPs solve every problem that's ever been encountered by mankind. Join Multicast, listen till you've heard all the headlines (which repeat ad nauseum), move on with life. Heck, keep listening if ya want. All we have to do is make it work.

    Frankly, who said you have to let everyone in the world on your RSS feed. If your server cant handle X concurrent RSS requests, it's hardly the protocols "fault", IMO.

  • by ToadMan8 ( 521480 ) on Tuesday July 20, 2004 @03:59PM (#9752399)
    /. is especially pissy with this but I want breaking news, not whatever is new each hour. So I hammer the shit out of it with my client (and get banned). I'd like to see a service where I download one client (that has front-ends in Gnome pannel, the Windows tray, etc.) that the site (/., cspan, etc.) _pushes_ new updates to when I sign up. Those w/ dynamic IPs could, when they sign on, have their client automagically connect to a server that holds their unique user ID with their IP.
  • by PCM2 ( 4486 ) on Tuesday July 20, 2004 @04:05PM (#9752465) Homepage
    Am I the only one who finds it easier to get the information I want from the home pages of the sites I trust, rather than relying on an RSS feed? For one thing, in an RSS feed every story has the same priority ... stories keep coming in and I have no idea which ones are "bigger" than others. Sites like News.com, on the other hand, follow the newspaper's example of printing the headlines for the more important stories bigger. With RSS, it's just information overload, especially with the same stories duplicated at different sources, etc. Everyone seems really excited about RSS, but when I tried it I just couldn't figure out how to use it such that it would actually give me some real value vs. the resources I already have.
  • Overall traffic isn't what anybody is complaining about- as I noted, the 503 errors seem to come at the top of every hour (I just got through not being able to read slashdot for a few minutes), which means, essentially, slashdot is recieving a slashdotting. Do I know that RSS is doing it? Not from this location which has limited investigation tools or capability to figure out what's really going on. But it might explain recent behavior of the site.
  • Re:Won't help (Score:2, Interesting)

    by AndroidCat ( 229562 ) on Tuesday July 20, 2004 @04:19PM (#9752649) Homepage
    Maybe not--it depends on how the programs work. If they check a feed an hour from the start of the last check rather than an hour from when the last check ended, they won't drift.

    However, the smart money is on Murphy. :)

  • by Anonymous Coward on Tuesday July 20, 2004 @04:22PM (#9752703)
    There's a two-fold effect to this problem, that even a PUSH solution would not solve. With everyone simultaneously grabbing, you have to deal with the initial precursor blast of traffic (RSS fetch), and then you have to deal with the big shock wave of people coming in to get the actual content (content fetch).

    A Push method may stop the precursor, but you're still going to have to deal with everyone jamming into your site at the same time... probably even worse because if it became a 'standard' for clients, you would be faced with a lot more simultaneous content fetches than with a mixed Pull-on-the-half-our/Pull every 30 mins crowd.

    I feel that the best method is to enforce RSS frequency through the delivered XML (I was actually quite dumbfounded when I didn't find that in the RSS 2.0 spec), and to have clients not operate on the hands of the clock, but to be distributed based on app start time. Additionally, site designers should be implementing caching and quick-delivery schemes for their RSS feeds, and be using HTTP headers /w content expiration times (not that you can fully expect all clients to adhere to them).

    - JR
  • Trivial solution! (Score:2, Interesting)

    by Maljin Jolt ( 746064 ) on Tuesday July 20, 2004 @04:39PM (#9752928) Journal
    Random intervals. I already patched my desktop RSS reader to request new feed every 73+-13 minutes.
  • Not to flame... (Score:4, Interesting)

    by T3kno ( 51315 ) on Tuesday July 20, 2004 @04:42PM (#9752973) Homepage
    But isn't this what TCP/IP multicast was invented for? I've never really understood why multicast has never really taken off. Too complicated? Instead of entering an rss server to pull from just join a multicast group and have the RSS blasted once every X minutes. Servers could even send out updates more often because there are only a few connections to send to. Of course I could be completely wrong and multicast may be the absolute wrong choice for this sort of application, it's been a while since I've read any documentation about it.

  • Nope. I just don't get RSS either. Every time there's a story about it I give another reader another shot, and every time I just end up thinking "how is this different than checking my bookmarks regularly?"
  • Told Ya So (Score:4, Interesting)

    by cmacb ( 547347 ) on Tuesday July 20, 2004 @05:27PM (#9753502) Homepage Journal
    I think this was more or less the first thought I had about RSS when I first looked into it and found out that it was a "pull" technology rather than a "push" as the early descriptions of it implied.

    Yes, it's "cool" that I can set up a page (or now use a browser plug-in) to automatically get a lot of content from hundreds of web pages at a time when I really opened up the browser to check my e-mail.

    What would have REALLY, been cool would be some sort of technology that would notify me when something CHANGED. No effort on my part, no *needless* effort on the servers part.

    Oh wait... We HAD that didn't we, I think they were called Listservers, and they worked just fine. (Still do actually as I get a number of updates, including Slashdot, that way.) RSS advocates (and I won't mention any names) keep making pronouncements like "e-mail s dead!" simply because they have gotten themselves and their hosting companies on some black hole lists. Cry me a river now that your bandwidth costs are going through the roof and yet nobody is clicking though on your web page ads, because, guess what? Nobody is visiting your page. They have all they need to know about your updates via your RSS feeds.
  • by grcumb ( 781340 ) on Tuesday July 20, 2004 @06:31PM (#9754182) Homepage Journal

    True story:

    We ran a network operations center to provide support for several hundred servers spread over two continents. Each hour, every server would 'phone home' to see if it needed updates or configuration changes. This was a fairly data-heavy operation, requiring many database lookups. We knew that we didn't want every server calling at the same time, so we had each server derive its own random integer between 1 and 59, and to use that as the minute of the hour to contact the NOC.

    Before long we found that the NOC was dragging itself into a death spiral of overwork. The problem? By chance, an unusually large number of servers chose a very small range of numbers. Worse, they just happened to choose numbers close to 05, which just happened to be when some very large cron tasks were running as well.

    Try rolling a die 100 times. Even though the odds are the same every time before you roll, the actual frequency of occurence of the individual numbers is not even. Leaving the choice of retrieval time to the client does not reliably reduce the chance of a server being overwhelmed. In fact, it more or less guarantees traffic spikes.

    I'm not intimately familiar with RSS client or server implementations, but I suspect that it would be fairly easy to format a suggested refresh interval and refresh time on the server and send that to the client.

  • by Russ Nelson ( 33911 ) <slashdot@russnelson.com> on Tuesday July 20, 2004 @11:45PM (#9756417) Homepage
    Sigh. You don't get it, do you? You suggest different protocols, when TCP works just fine. The reason you want to stay with TCP is because of the infinite number of ways people have implemented TCP. Just as one HUGE example, consider all the people behind NAT. How are you going to "distribute asynchronous update notifications"?

    I'd like to hear one person, just one person say "Hmmm.... I wonder why Russ didn't suggest asynchronous update notifications?" And then maybe go on to answer themselves by saying "Oh! I get it! Russ is right! Hey, that's a great idea! It's backwards compatible and yet does exactly what is needed to turn RSS into a packet-efficient protocol."

    Instead, you get weenies who say something slightly more erudite than "duh" but which could be summarized thusly. You also get people (stand up and take a bow, Salamander) who say "Geez, that idea has OBVIOUS PROBLEMS" even though I obviously anticipate those OBVIOUS PROBLEMS and suggest a solution. Honestly, I see why people have such a low opinion of slashdot posters. Yer all a bunch of dummies!
    -russ
    p.s. pant, pant, pant, pant, okay, I feel better now.

HELP!!!! I'm being held prisoner in /usr/games/lib!

Working...