Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
The Internet

How Much Bandwidth is Required to Aggregate Blogs? 209

Kevin Burton writes "Technorati recently published that they're seeing 900k new posts per day. PubSub says they're seeing 1.8M. With all these posts per day how much raw bandwidth is required? Due to innefficiencies in RSS aggregation protocols a little math is required to understand this problem." And more importantly, with millions of posts, what percentage of them have any real value, and how do busy people find that .001%?
This discussion has been archived. No new comments can be posted.

How Much Bandwidth is Required to Aggregate Blogs?

Comments Filter:
  • by ranson ( 824789 ) * on Sunday August 14, 2005 @06:30PM (#13318009) Homepage Journal
    How much bandwidth is required? A lot less if everyone would take the 5 minutes required to implement GZip compression on their Apache servers. It saves you bandwidth, it speeds up your site for users (especially those on dialup), and saves the bandwidth of aggregators (assuming they advertise an Accept-Encoding header for gzip; deflate)

    So my plea to the internet community today.. make sure your web server is configured to send gzipped content. TFA says he doesn't know how many RSS feeds can support gzip. The answer is easy really, any feed being served by Apache (plus a LOT of other webservers. AOLserver even added gzip support recently). Here's how to setup Apache [whatsmyip.org] and here's where to check [whatsmyip.org] if your site is using GZip or and get an idea of the bandwidth savings you should see get. If you're site isn't gzipping, show your admin (if it's someone else) the 'how-to' above and ask them to implement it -- it's an absolute no-brainer win-win for everyone that takes no time at all to setup really. It's really absurd IMO that it's not enabled in Apache by default.
  • Slashdot? (Score:4, Insightful)

    by djsmiley ( 752149 ) <djsmiley2k@gmail.com> on Sunday August 14, 2005 @06:31PM (#13318013) Homepage Journal
    "And more importantly, with 9M posts, what percentage of them have any real value, and how do busy people find that .001%?"

    On slashdot.... Oh wait....
  • by Anonymous Coward on Sunday August 14, 2005 @06:32PM (#13318016)
    9M*0.001 = 9000...
  • Some Answers (Score:4, Insightful)

    by RAMMS+EIN ( 578166 ) on Sunday August 14, 2005 @06:35PM (#13318033) Homepage Journal
    ``How Much Bandwidth is Required to Aggregate Blogs?''

    Less than it currently takes, what with pull, HTTP, and XML used instead of more efficient technologies.

    ``what percentage of them have any real value, and how do busy people find that .001%?''

    Using a scoring system, like Slashdot's?

    It's not like all of this is rocket science. It's just that people go along with the hyped technology that's "good enough for any conceivable purpose", ignoring the superior technology that had been invented before and wasn't hyped as much. Nothing new here.
  • by davecrusoe ( 861547 ) on Sunday August 14, 2005 @06:37PM (#13318047) Homepage
    And more importantly, with 9M posts, what percentage of them have any real value, and how do busy people find that x%
    Well, the significant percent is probably much larger than you might think. For example, if you aren't a chef, chances are you won't desire to read anything that relates to cooking. So, knock off X% of all blogs. You might not be interested in knitting, so deduct another X%.

    In actuality, my guess is that there are few blogs you might decide to visit, and of those you do, several may have content you find worthwhile. Remember, worthwhile is all in the perception of the reader - there is no real definition for quality or value. Perhaps through trial and error - in essence digital tinkering - you find and derive your own value.

    cheers, --dave
  • by TCM ( 130219 ) on Sunday August 14, 2005 @06:38PM (#13318054)
    Of course every server is powerful enough that CPU time can't possibly become an issue, right?
  • Re:All at once (Score:5, Insightful)

    by ranson ( 824789 ) * on Sunday August 14, 2005 @06:42PM (#13318067) Homepage Journal
    I'm trying to understand how this would help because if everyone would incorporate generally accepted practices with regard to the HTTP protocol into their XML generation script (e.g., including Last-Modified and/or Expires headers, providing an e-tag, etc) the aggregators could use Get If-Modified-Since requests to save an unthinkable amount of bandwidth. As it is right now, since most RSS feeds are generated on the fly from some database, that doesn't happen and the aggregators just have to pull the entire XML at regular intervals to ensure nothing was missed. I find it silly that some basic functionality of the WWW like smart caching rules started being ignored when RSS came along.
  • by ranson ( 824789 ) * on Sunday August 14, 2005 @06:54PM (#13318130) Homepage Journal
    >Of course every server is powerful enough that CPU >time can't possibly become an issue, right? On moderately busy servers, most have found that mod_gzip helps with both CPU and RAM, since users stay connected to your server for shorter durations, resulting in overall fewer concurrent connections.
  • by Rob Carr ( 780861 ) on Sunday August 14, 2005 @06:58PM (#13318150) Homepage Journal
    Most blogs are both drivel and worthwhile, depending upon the individual reading them (including mine). They become worthwhile in context.

    If a friend is going through cancer treatment, her blog is worthwhile. If you find a youth group leader like yourself and can learn from his posts, his blog is worthwhile. A mother fighting for her health so that she can take care of her two sons and husband can share insights that are worthwhile. Someone fighting depression might have a worthwhile blog. A grandmother might have a view of the world that makes her blog worthwhile, just to get a different view. Perhaps a blog by someone who totally disagrees with you will be worthwhile, just to stretch your mind.

    I've just described why I read the blogs on my blog roll. You can choose differently.

    Top political blogs? You can find them easily among Technorati's top 100 list. Tags at Technorati will let you pick out specialties like science or "Master Blasters" or diabetes or the Tour de France. Google will turn up blogs if you search right, which is the trick for using Google.

    "Worthwhile" is a much more difficult variable to calculate than "bandwidth." Perhaps it's the sheer variety of blogs that makes them interesting, because they are so individual and someone, somewhere will speak to your mind or your heart.

    Worthwhile is what's worthwhile to you, and maybe to very few others. Not everyone will agree, and that's not a bad thing.

  • by Anonymous Coward on Sunday August 14, 2005 @07:01PM (#13318170)
    I call BS. Gzip compresses streams in memory. It can't corrupt your hard drive.

    This reads like a generic troll. "We actually had been using $PRODUCT_NAME for quite a long time on a server at home..."
  • by jandrese ( 485 ) * <kensama@vt.edu> on Sunday August 14, 2005 @07:45PM (#13318331) Homepage Journal
    That depends a lot on what you're hosting your servers on. CPU time is expensive on Tandems and to a lesser extent Suns. On PCs the CPU is cheap, especially since most PC installations are clusters and even 1U boxes tend to come with overpowered processors.

    One thing is for certain though, for many users bandwidth is NOT cheap.
  • by ebrandsberg ( 75344 ) on Sunday August 14, 2005 @07:57PM (#13318383)
    Yes, but with a post like that, it should end up on Slashdot in a few days anyway... after every news site has posted it a few days earlier.
  • by magefile ( 776388 ) on Sunday August 14, 2005 @08:00PM (#13318398)
    Erm ... if it's static, just store 2 copies and route accordingly. You're not serving gzipped stuff to save space, you're serving it to save bandwidth.
  • by womby ( 30405 ) on Sunday August 14, 2005 @08:22PM (#13318482)
    With the least intensive compression algorithms html can end up almost 10 times smaller
    That results in a 10 times shorter transfer time,
    Which results in 10 times fewer simultaneous connections,
    Which results in 10 times fewer apache processes,
    Which results in massively reduced memory and processor requirements.

    That unused processor and memory is what would be used to perform the gzip operations. Lets say for arguments sake compressing the output doubles the processor usage (a ridiculously high number) cutting the number of apache processes by an order of magnitude only has to reduce CPU requirements by 50% to come out on top.

    If the gzip operation only inflicts a 10% overhead cutting the apache processes by ten only needs to free more than 9% to come out on top.

    Look at your server, would cutting the number of apache processes from 400 to 40 save more than 10% of the CPU usage, would it save more than 50%?

    [All numbers in this post were selected for ease of calculation not for their real world precision,]
  • by ZorbaTHut ( 126196 ) on Sunday August 14, 2005 @08:57PM (#13318591) Homepage
    That's true. LJ is a very CPU-heavy site (surprisingly), and therefore anything that can spare CPU is welcomed. A site that mostly transmitted static pages would probably find gzipping to be an obvious win.
  • by jp10558 ( 748604 ) on Sunday August 14, 2005 @09:05PM (#13318633)
    Couldn't you GZIP each page once per change (obviously no good for dynamic pages, but for blogs, each post would only need to be done once. Unless you get comments like on slashdot, it's unlikely you'd have to gzip more than once every few minutes or so. And then serve that file like you would any other file?
  • by doktor-hladnjak ( 650513 ) on Sunday August 14, 2005 @10:00PM (#13318755)
    Who says a whole lot of people need to read your blog? Only a small handful of friends read mine, mostly people I live far away from. It's a weirdly indirect way of keeping in touch with those people (I read theirs, they read mine). Still, I find my blog to be more of a diary to keep track of things that happen in my life for my own personal purposes more than anything else.
  • s/blog/website (Score:3, Insightful)

    by ubernostrum ( 219442 ) on Monday August 15, 2005 @06:04AM (#13320037) Homepage

    Time to ditch the World Wide Web, right?.

Software production is assumed to be a line function, but it is run like a staff function. -- Paul Licker

Working...