Yes, that's my understanding as well---the point of slow start is to go easy on the output queues of whichever routers experience congestion, so if congestion happens only on the last mile a hypothetical bad slow-start tradeoff does indeed only affect that one household (not necessarily only that one user), but if it happens deeper within the Internet it's everyone's problem contrary to what some other posters on this thread have been saying.
WFQ is nice but WFQ currently seems to be too complicated to implement in an ASIC, so Cisco only does it by default on some <2Mbit/s interfaces. Another WFQ question is, on what inputs do you do the queue hash? For default Cisco it's on TCP flow, which helps for this discussion, but I will bet you (albeit a totally uninformed bet) that CMTS will do WFQ per household putting all the flows of one household into the same bucket, since their goal is to share the channel among customers, not to improve the user experience of individual households---they expect people inside the house to yell at each other to use the internet ``more gently'' which is pathetic. In this way, WFQ won't protect a household's skype sessions from being blasted by MS fast-start the way Cisco default WFQ would.
If anything, cable plants may actually make TCP-algorithm-related congestion worse because I heard a rumor they try to conserve space on their upstream channel by batching TCP ACK's, which introduces jitter, meaning the windowsize needs to be larger, and makes TCP's downstream more ``microbursty'' than it needs to be. If they are going to batch upstream on purpose, maybe they should timestamp upstream packets in the customer device and delay them in the CMTS to simulate a fixed-delay link---they could do this TS+delay per-flow rather than per-customer if they do not want to batch all kinds of packets (ex maybe let DNS ones through instantly).
RED is not too complicated to implement in ASIC, but (a) I think many routers, including DSLAM's, actually seem to be running *FIFO* which is much worse than RED even, because it can cause synchronization when there are many TCP flows---all the flows start and stop at once. (b) RED is not that good because it has parameters that need to be tuned according to approximately how many TCP flows there are. I think BLUE is much better in this respect, and is also simple enough to implement in ASIC, but AFAIK nobody has.
I think much of the conservatism on TCP implementers' part can be blamed on router vendors failing to step up and implement decades-old research on practical ASIC-implementable queueing algorithms. I've the impression that even the latest edge stuff focuses on having deep, stupid (FIFO) queues (Arista?) or minimizing jitter (Nexus?). Cisco has actually taken RED *off* the menu for post-6500 platforms: 3550 had it on the uplink ports, but 3560 has ``weighted tail drop'' which AFAICT is just fancy FIFO. I'd love to be proved wrong by someone who knows more, but I think they are actually moving backwards rather than stepping up and implementing BLUE.
and I like very much your point that cacheing window sizes per /32 is the right way to solve this rather than haggling about the appropriate default, especially in the modern world of megasites and load balancers where a clever site could appear to share this cached knowledge quite widely. but IMSHO routing equipment vendors need to be stepping up to the TCP game, too.