This article is a good start. There are two things were it could improve.
1) They measured the site's total data size, but not the load time. Load time was estimated = size/bandwidth, ignoring latency, parsing time and data dependencies. Actual load time should be (much?) higher.
2) Strangely, they did not account for http compression, so they cannot have measured actual traffic size. This implies they did only measure uncached loads. Most people visit a news sitte more than once, so cached loads would be more relevant. On a cached load, scripts, design elements and other stuff does not need to be reloaded, only the advertisiing. This makes the overhead for ads relatively much larger. On the German news site bild.de, a cached load is 17% content data and 83% advertising and tracking.