How Much Bandwidth is Required to Aggregate Blogs? 209
Kevin Burton writes "Technorati recently published that they're seeing 900k new posts per day. PubSub says they're seeing 1.8M. With all these posts per day how much raw bandwidth is required? Due to innefficiencies in RSS aggregation protocols a little math is required to understand this problem." And more importantly, with millions of posts, what percentage of them have any real value, and how do busy people find that .001%?
900k a day, not 9m (Score:2, Informative)
Re:How much? If everyone GZipped, a lot less! (Score:5, Informative)
so i wouldn't say ANY site using apache... but probably most. the real problem there is with compression load on the servers... gzip compression doesn't just happen you know, it takes CPU cycles that could be being used to just push data rather than encode it.
Re:Bandwidth wasted for non-xhtml pages? (Score:2, Informative)
"Though a few KB doesn't sound like a lot of bandwidth, let's add it up. Slashdot's FAQ, last updated 13 June 2000, states that they serve 50 million pages in a month. When you break down the figures, that's ~1,612,900 pages per day or ~18 pages per second. Bandwidth savings are as follows:
Savings per day without caching the CSS files: ~3.15 GB bandwidth
Savings per day with caching the CSS files: ~14 GB bandwidth
Most Slashdot visitors would have the CSS file cached, so we could ballpark the daily savings at ~10 GB bandwidth. A high volume of bandwidth from an ISP could be anywhere from $1 - $5 cost per GB of transfer, but let's calculate it at $1 per GB for an entire year. For this example, the total yearly savings for Slashdot would be: $3,650 USD!"
That's 900,000 posts (Score:4, Informative)
Re:busy people read 9000 blogs per day?? (Score:3, Informative)
Secondly, that would be posts, i'm assuming the intelligent stuff tends to be not in 90 seperate posts, but with multiple intelligent posts from the same person.
Third, since the original poster somehow messed up and cited the number 9 million instead of the correct number, 900,000 , that number is reduced to 9 posts a day, a reasonable amount to read.
Re:Bandwidth wasted for non-xhtml pages? (Score:2, Informative)
It has absolutely sod-all to do with XHTML. HTML 4.01 and XHTML 1.0 are functionally identical. You can use table layouts and <font> elements with XHTML 1.0 and you can use CSS with HTML 4.01.
You are referring to separating the content and the presentation through the use of stylesheets. This has nothing to do with XHTML, although it would save a hell of a lot of bandwidth if Slashdot implemented it. They are implementing it [slashdot.org].
Re:Don't forget the robots (Score:3, Informative)
See AWStats [sourceforge.net]
Gzip helps, but the real win is conditional get (Score:5, Informative)
Charles Miller [pastiche.org] explained this well a few years ago.
(I run the spiders at Technorati).
Re:All at once (Score:3, Informative)
Th e long tail (Score:4, Informative)
This effect is called the The long tail [wikipedia.org] effect, and is visible all over the web. For instance, Amazon.com says that every day, it sells more books that didn't sell yesterday than the sum of books sold that *also* sold yesterday. In other words, they sell (in sum) more of the items selling less than one every other day than of items selling (by type) more than that.
Eivind.