Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
The Internet

How Much Bandwidth is Required to Aggregate Blogs? 209

Kevin Burton writes "Technorati recently published that they're seeing 900k new posts per day. PubSub says they're seeing 1.8M. With all these posts per day how much raw bandwidth is required? Due to innefficiencies in RSS aggregation protocols a little math is required to understand this problem." And more importantly, with millions of posts, what percentage of them have any real value, and how do busy people find that .001%?
This discussion has been archived. No new comments can be posted.

How Much Bandwidth is Required to Aggregate Blogs?

Comments Filter:
  • 900k a day, not 9m (Score:2, Informative)

    by Anonymous Coward on Sunday August 14, 2005 @06:32PM (#13318018)
    order of magnitude out there, fella... better try again with this new fangled "math" stuff
  • by Madd Scientist ( 894040 ) on Sunday August 14, 2005 @06:40PM (#13318060)
    i used gzip with apache at an old job and we ran into a problem with it... some obscure header problem in conjunction with mod-rewrite.

    so i wouldn't say ANY site using apache... but probably most. the real problem there is with compression load on the servers... gzip compression doesn't just happen you know, it takes CPU cycles that could be being used to just push data rather than encode it.

  • by llZENll ( 545605 ) on Sunday August 14, 2005 @06:42PM (#13318072)
    Answer: Not enough to justify the cost to do it. Which goes to show you that if a site as popular as slashdot can't save money doing this, no other site on the net belongs converting to xhtml, economically speaking of course.

    "Though a few KB doesn't sound like a lot of bandwidth, let's add it up. Slashdot's FAQ, last updated 13 June 2000, states that they serve 50 million pages in a month. When you break down the figures, that's ~1,612,900 pages per day or ~18 pages per second. Bandwidth savings are as follows:

    Savings per day without caching the CSS files: ~3.15 GB bandwidth
    Savings per day with caching the CSS files: ~14 GB bandwidth
    Most Slashdot visitors would have the CSS file cached, so we could ballpark the daily savings at ~10 GB bandwidth. A high volume of bandwidth from an ISP could be anywhere from $1 - $5 cost per GB of transfer, but let's calculate it at $1 per GB for an entire year. For this example, the total yearly savings for Slashdot would be: $3,650 USD!"
  • That's 900,000 posts (Score:4, Informative)

    by epeus ( 84683 ) on Sunday August 14, 2005 @06:57PM (#13318143) Homepage Journal
    I run the spiders at Technorati, and it is 0.9 million posts a day, which Kevin Burton had correct in the post cited. Is the is the no dot effect?
  • by cyberfunk2 ( 656339 ) on Sunday August 14, 2005 @06:57PM (#13318146)
    First, As some AC points out, 0.001 PERCENT of 9 million is 90.

    Secondly, that would be posts, i'm assuming the intelligent stuff tends to be not in 90 seperate posts, but with multiple intelligent posts from the same person.

    Third, since the original poster somehow messed up and cited the number 9 million instead of the correct number, 900,000 , that number is reduced to 9 posts a day, a reasonable amount to read.
  • by Bogtha ( 906264 ) on Sunday August 14, 2005 @07:04PM (#13318178)

    It has absolutely sod-all to do with XHTML. HTML 4.01 and XHTML 1.0 are functionally identical. You can use table layouts and <font> elements with XHTML 1.0 and you can use CSS with HTML 4.01.

    You are referring to separating the content and the presentation through the use of stylesheets. This has nothing to do with XHTML, although it would save a hell of a lot of bandwidth if Slashdot implemented it. They are implementing it [slashdot.org].

  • by lukewarmfusion ( 726141 ) on Sunday August 14, 2005 @08:03PM (#13318411) Homepage Journal
    Are you saying that you read the logs directly/manually?

    See AWStats [sourceforge.net]
  • by epeus ( 84683 ) on Sunday August 14, 2005 @08:55PM (#13318585) Homepage Journal
    If your weblog server implements ETag and Last-Modified, my spider can send a one packet request with the values I last saw from you, and you can send a one packet 304 response if nothing has changed.

    Charles Miller [pastiche.org] explained this well a few years ago.

    (I run the spiders at Technorati).
  • Re:All at once (Score:3, Informative)

    by jrockway ( 229604 ) <jon-nospam@jrock.us> on Monday August 15, 2005 @03:45AM (#13319752) Homepage Journal
    Most sane webservers GZIP the content. XML compresses extremely well. (In other words, gzipped XML is just as efficient space-wise as a binary memory dump. And much easier for mere people to understand.)
  • Th e long tail (Score:4, Informative)

    by Eivind Eklund ( 5161 ) on Monday August 15, 2005 @04:28AM (#13319836) Journal
    I think most of these blogs have something of interest to somebody, and that the value of blogs is in their diversity - in a lot of things having value to a small number of people.

    This effect is called the The long tail [wikipedia.org] effect, and is visible all over the web. For instance, Amazon.com says that every day, it sells more books that didn't sell yesterday than the sum of books sold that *also* sold yesterday. In other words, they sell (in sum) more of the items selling less than one every other day than of items selling (by type) more than that.

    Eivind.

The Tao is like a glob pattern: used but never used up. It is like the extern void: filled with infinite possibilities.

Working...