While they discuss individual SSDs, modern flash storage arrays ( http://www.violin-memory.com/products/6000-flash-memory-array/ ) can hide all the write latency and its effects on read latency. When you start talking about 16TB SSDs the same techniques can be used.
As far as bandwidth and IOPs, they use a 4K/8K write size for MLC/TLC, but MLC already exists with 8K pages, as well as having the ability to write more than one plane at once, which doubles the write bandwidth. Double the page size again and you double the BW.
Now bigger page sizes only help on the reads if you can use more than a single user read worth of data in the page, which might be possible depending on what the system knows about access patterns. But without making assumptions about the ability to store data together that's likely to be read together, garbage collection, which can wide up reading more bytes than the user does, can use most of the data in a page.
So there are factors of 2X, 4X maybe 8X in performance that the paper misses out on.
As far as density, it is not necessary to go to smaller features to get more bits per chip by using 3D techniques such as Toshiba's P-BiCS (Pipe-shaped Bit Cost Scalable) MLC NAND which allow vertical stacking which increases density without using smaller features with their worse performance and lifetime.
The group at UCSD that authored this has done some nice work so I don't mean to be too negative, but they are trying to predict too far from a limited and faulty set of assumptions which unfortunately negates much of the validity of this paper.
jon
p.s. in the interests of full disclosure, I make the arrays in the first link :)