It only takes a queue depth of 2 or 3 for maximum linear throughput.
I haven't any idea why you are so up voted, because your flat out wrong, 5 minutes with a benchmark like ATTO allows you to see the performance with small sequential IO and queue depth. Another benchmark showing ATTO sequential IO's for small transfers
And, your sort of right the OS will do a certain amount of prefech/etc but that doesn't help when things are fragmented or the application/whatever is requesting things in a pattern that isn't easily predictable (say booting without a readyboot optimized system).
Try it out yourself, get the old sysinternals Disk Monitor and watch the size size attribute. Its in 512 byte sectors, and on my machine probably 1/3rd of the IO's are listed as "8". AKA 4k. Heck the example screenshot on the listed page is all 8 except for one 16.
So, yes small IO transfers are still an issue, and will be until we get OS's that can solve the hard problem of consolidating unpredictable IO streams. Heck a lot of people turn superfetch off because it slows things down. AKA aggressive prefetch isn't necessarily faster.