It's been nearly 6 years since the start of the bufferbloat project. Have you or has your ISP fixed your bufferbloat yet?
First: In-Q-Tel is the venture capital arm of all of the U.S. intelligence services, including DHS, FBI, etc; not just CIA. DHS, for example, will be blamed for any big security disaster; you should not presume that the motives of the agencies are uniform. Nor is all of what those agencies do bad.... It's the pervasive surveillance we *must* stop, and compromising our security standards. See: https://www.iqt.org/about-iqt/ for In-Q-Tel rather than the Wikipedia entry for Dan.
Second: Dan has never taken a security clearance, over his entire career.
Third: He's actually not a In-Q-Tel employee, but a consultant (full time) for them. This is so that he does *not* have to sign a employee agreement, but can remain able to speak freely. Which he does regularly: See http://geer.tinho.net/pubs for some of his publications. One I sparked him to write recently is: http://geer.tinho.net/geer.lawfare.15iv14.txt in reaction to the information I cover in my Berkman Center talk you can find at: https://cyber.law.harvard.edu/events/luncheon/2014/06/gettys
Fourth: people who know Dan, who is really one of the founders of the computer security field, hold him in very high regard and trust, as I do.
If you look at Dan Geer's career, rather than jumping to unfounded, ill informed presumptions based on news reports that don't bother to go beyond reading the Wikipedia entry, you will find:
1) he managed the development of Kerberous at Project Athena (where I got to know him)
2) he co-authored the famous Microsoft is a dangerous monoculture paper a bit over a decade ago (which Microsoft hated so much they
got @Stake to fire him.
3) he is a holder of the USENEX Flame award https://www.usenix.org/about/flame
In short, guys, he's one of "us"....
Don't be ill-informed slashdotters....
*All* content is signed in CCNx by the publisher.
You can get a packet from your worst enemy, and it's ok. The path it took to get to you doesn't matter. If you need privacy, you encrypt the packets at the time of signing.
The day that CDE finally appeared (badly late) on my workstation was the day that I knew there was no hope for the UNIX desktop. Design by committee never works, and it was a camel of 5 humps.
Since all broadband connections have bufferbloat (to some degree or other), in all technologies (fiber, DSL and cable alike), it isn't a good idea to volunteer to run an NTP server on such a connection, even if it is/has been reliable. Bufferbloat will induce transient bad timing into your time service; even more fun, in often a asymmetric way, pretty much any time you do anything over that link.
AQM's don't usually look at the contents of what they drop/mark.
We expect CoDel to be running on any bulk data queue; voip traffic, properly classified, would be in a independent queue, and not normally subject to policing by a CoDel.
While 10 years ago, a decent AQM like CoDel might have been able to get latencies down for most applications where they should be, browser's abuse of TCP, in concert with hardware features such as smart nics that send line rate bursts of packets from single TCP streams has made me believe we must also do fair queuing/classification to get the latencies (actually, jitter) where they need to be due to these "bursts" of packets arriving.
The article's subtitle is: "A modern AQM is just one piece of the solution to bufferbloat." We certainly expect to be doing fair queuing and classification in addition to AQM in the edge of the network (e.g. your laptop, home router and broadband gear). I don't expect fair queuing to be necessary in the "core" of the network.
I'll also say that an adaptive AQM is an *essential* piece of the solution to bufferbloat, and a piece we've had no good solution to (until, we think, now).
That's why this article represents "fundamental progress".
It is *any* transition from fast to slow, including your computer to your wireless link or vice versa from your home router to your computer.
Bufferbloat is an equal opportunity destroyer of time.
You are correct that replacing one bad constant with another is a problem, though I certainly argue many of our existing constants are egregiously bad and substituting a less bad one makes the problem less severe: that is what the cable industry is doing this year in a DOCSIS change that I hope starts to see the light of day later this year. That can take bloat in cable systems down by about an order of magnitude, from typically > 1 second to of order 100-200ms; but that's not really good enough for VOIP to work as well as it should. The enemy of the good is the perfect: I'm certainly going to encourage obvious mitigation such as the DOCSIS changes while trying encourage real long term solutions, which involve both re-engineering of systems and algorithmic fixes. There are other places where similar "no brainer" changes can help the situation.
I'm very aware of the research over a decade old, and the fact that what exists is either *not available* where it is now needed (e.g. any of our broadband gear, our OS's, etc.), and *doesn't work* in today's network environment. I was very surprised to be told that even where AQM was available, it was often/usually not enabled, for reasons that are now pretty clear: classic RED and derivatives (the most common available) require manual tuning, and if untuned, can hurt you. As you, I had *thought* this problem was a *solved* problem in the 1990's; it isn't....
RED and related algorithms are a dead end: see my blog entry on the topic: http://gettys.wordpress.com/2010/12/17/red-in-a-different-light/ and in particular the "RED in a different light" paper referenced there (which was never formally published, due to reasons I cover in the blog posting). So thinking we just apply what we have today is *not correct*; when Van Jacobson tells me RED won't hack it (which was originally designed by Sally Floyd and Van Jacobson) I tend to believe him.... We have an unsolved research problem at the core of this headache.
If you were tracking kernel changes, you'd see "interesting" recent patches to RED and other queuing mechanisms in Linux; this shows you just how much such mechanisms have been used, that bugs are being found in this day and age in such algorithms in Linux: in short, what we have had in Linux has often been broken, showing little active use.
We have several problems here:
1) basic mistakes in buffering, where semi-infinite statically sized buffers have been inserted in lots of hardware/software. BQL goes a long way toward addressing some of this in Linux (the device driver/ring buffer bufferbloat that is present in Linux and other operating systems).
2) variable bandwidth is now commonplace, in both wireless and wired technologies. Ethernet scales from 10Mbps to 10 or 40Gps.... Yet we've typically had static buffering, sized for the "worst case". So even stupid things like cutting the buffers proportionately to the bandwidth you are operating at can help a lot (similar to the DOCSIS change), though with BQL we're now in a better place than before.
3) the need for an AQM that actually *works* and never hurts you. RED's requirement for tuning is a fatal flaw; and we need an AQM that adapts dynamically over orders of magnitude of bandwidth *variation* on timescales of tens of milliseconds, a problem not present when RED was designed or most of the AQM research of the 1990's done. Wireless was a gleam in people's eyes in that era.
I'm now aware of at two different attempts at a fully adaptable AQM algorithms; I've seen simulation results of one of those which look very promising. But simulations are ultimately a guide (and sometimes a real improving insight): running code is the next steps, and comparison with existing AQM's in real systems. Neither of these AQM's have been published, though I'm hoping to see either/both published soon and their implementation happening immediately thereafter.
So no, existing AQM algorithms won't hack it; the size of this swamp is staggering.
No, it isn't based on who you are talking to....
There are significant networks that do not look like the consumer edge Internet, one of which reportedly collapsed in a nasty way (not necessarily in the same way as 1986). Don't presume that the network that may collapse is the global internet (though time based self synchronizing phenomena are a worry there). One of the functions of AQM algorithms is to ensure that TCP flows don't synchronize their behavior. And those AQM algorithms are MIA on many networks today.
Those of us who lived through the 1986 congestion collapse are somewhat worried.
That we don't know is what worries us, as we're flying in an area we don't fully understand.
To solve this, I think we need both AQM fully at the edge of the network, and some sort of "fair" queueing at the edge. The headache is that the classic AQM algorithms won't work in the edge case (and are flawed). "Fairness" is in the eyes of the beholder and a complex question. You may consider it "fair" your kids get half the bandwidth you do, for example. But having a situation where talking to a local CDN 10ms away gets tons more bandwidth than something across the globe may not be "fair" in your view, nor that your web browser can put a horrible transient into the queues as it opens 10 TCP connections simultaneously, and it's unfair what it does to other people sharing your home circuit.
1) Some sort of AQM is necessary to tell the end points to slow down in a timely fashion (by signalling the end point TCP's). Or the buffers fill, and stay full, and you are in fundamental trouble. Consider this the "high order" bit to solving the problem.
2) TCP does not guarantee any "fairness" between flows of different RTT's, Note that what you consider "fair" inside your house is your business: your ISP has some obligation around "fairness" between customers. This is the next bit of precision. Anyone who thinks TCP is "fair" by itself believes in magic that does not exist.
Note these two issues might be addressed by one algorithm, or multiple algorithms, and what is best might be different in a host than in your home router, and different yet again elsewhere in the network. Exactly what we need (and will work well) is as yet unknown, though almost anything (even going back to semi-sane buffer sizes selected with even trivial amounts of thinking) can improve things a lot. And getting there is going to require analysis and testing. I encourage people to help out: for example, we've not yet really played with SFB to see how it behaves in the face of variable bandwidth. Classic RED won't work in a sane fashion at all in that case.
But hugely oversized, dumb, unmanaged, tail-drop buffers have got to go.
Thankfully, at the edge of the network (your host and home router), we have lots more cycles to play with per packet than in a core router and can burn some of them well for this purpose. And that's where we're almost always congested (your ISP may have other congestion problems, but at the aggregation level of those devices, classic AQM algorithms can function, even if maybe not ideally).
The article that was posted yesterday is the short CACM article; Kathy and I have a much longer paper in preparation that will go into this in more detail. But CACM has a word count limit we could not meet, and the "fairness" topic is mostly on the cutting room floor, and now being picked up and finished.
I wasn't expecting the Queue posting to go "live" until next month (i was being naive), so the long paper is not finished. CACM really wanted to lead off January with a bang, so redirected our efforts toward the short, and in many ways incomplete, discussion.
It talks primarily about what I observed on my *home* connection...
The Netalyzr data is all about edge connections.
There are problems in the core, but the worst problems I've seen are at the edge, and the edge is where most of the congestion is these days.
You can have an overall congested network. I've seen this on occasion.
But it is very easy (and even more common) for you (or people in your house) to do it to you, than to have the overall ISP network congested. This is something a simple file copy can/does do to you, in practice.
Some ISP's run AQM properly (e.g. RED) in the cores of their networks; some do not. On the ones that do not, you'll see problems at peak hours. Similarly on corporate networks.
"Pay no attention to the man behind the curtain." -- The Wizard Of Oz