Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
The Internet

The 2.4.x Kernel, ECN And Problem Websites 119

Posted by timothy
from the cable-modems-are-overrated dept.
mitd writes: "Enterprise Linux Today is running an article about how some network devices i.e. routers, do not support ECN (Explicit Congestion Notification), causing WWW sites to be unavailable to 2.4.x kernel based hosts." The article does show you an easy workaround, though. (Read more below.)

"Nice quote: 'The answer is that Linux is once again on the cutting edge of networking technology ...' The article points out some major sites that have not updated their routers to handle ECN packets."

Anything that helps destroy congestion at least has my attention. (And in a parallel universe, legions of Windows users are howling that the Linux hegemonists have again chosen to implement new standards in order to drag them into the fold ;) )

This discussion has been archived. No new comments can be posted.

The 2.4.x Kernel, ECN And Problem Websites

Comments Filter:
  • by Anonymous Coward
    And if you do find it is on by some odd mischance, you do not even have to recompile your kernel. Just 'echo 0 > /proc/sys/net/ipv4/tcp_ecn' and voila, turned off in realtime. Anyone making a this sound like a bug really does owe the kernel devel team an appology.
  • by Anonymous Coward
    CONFIG_INET_ECN:

    Explicit Congestion Notification (ECN) allows routers to notify
    clients about network congestion, resulting in fewer dropped packets
    and increased network performance. This option adds ECN support to the
    Linux kernel, as well as a sysctl (/proc/sys/net/ipv4/tcp_ecn) which
    allows ECN support to be disabled at runtime.

    Note that, on the Internet, there are many broken firewalls which
    refuse connections from ECN-enabled machines, and it may be a while
    before these firewalls are fixed. Until then, to access a site behind
    such a firewall (some of which are major sites, at the time of this
    writing) you will have to disable this option, either by saying N now
    or by using the sysctl.

    If in doubt, say N.
  • Since when does having a lower Slashdot ID make you a smarter poster?
  • by Frater 219 (1455) on Monday April 23, 2001 @06:07PM (#269523) Journal
    Contrary to the article's implications, ECN is not enabled by default in the 2.4.x kernels as Linus shipped them. In order to enable ECN, you must reconfigure and recompile the kernel. The configuration documentation for the ECN option explicitly states that turning it on will cause some routers and firewalls to drop your connections, and suggests that you leave it off unless you know you need ECN.

    If you find ECN enabled in your distributor's 2.4.x kernel package by default, please consider this a severe mistake on your distributor's part. Please do not consider it a bug in "the 2.4.x kernel". The author of the Enterprise Linux Today article owes Linus and the kernel developers a retraction and correction.

  • ECN, if enabled in the kernel configuration, will be enabled by default on the computer. (This is the opposite to khttpd, which is DISabled after being compiled in, and must be explicitly ENabled to be used.)

    IMHO, the kernel needs a standard on this. Should a network protocol be on or off, at boot time?

    My next thought is that ECN is a Good Thing(tm) for these low-grade routers and firewalls. Either people upgrade (and thus remove security holes), or they lose sales, because nobody can reach them.

    IMHO, someone needs to write an ECN module for Wintoes, to exploit this potential force for a quality Internet.

    We =do= want a quality Internet... ...right??

  • So my O/S gets to influence what sites my browser can and can't see? Ouch.

    When is this ever not the case?

    Bander


    --
  • Bzzztt..

    Nope, try again.

    It's <b>draft proposed standard</b>, not [B]draft proposed standard[/B]

    html is really quite simple.

    Keep away from them fancy "tools" and you can learn to type it in your sleep...

    t_t_b
    --
    I think not; therefore I ain't®

  • Does anybody know how (or if) I can determine which router/firewall drops packets with ECN set? I would like to email people responsible for their setup to inform about this misconfiguration.
    The bit currently used for ECN, used to be marked as reserved and told be ignored. Packet with this bit set should not be dropped.
  • Is something as useful as ECN presentable as a mandatory update for infrastructure providers ( i.e. Cisco) or mearly a nice addition to be added when other software changes/updates are applied to routers and major servers?

    It seems that it could have major benefits in improving response times, but only if compliance was the rule rather than the exception. What other OSes currently support ECN? Anyone know? I haven't found much info yet.

  • > Rest assured, it's not enabled by default.

    Not being a Linux 2.4 user, I did not know this, but the blurb alludes to this fact.

    > And it's a great idea. I wouldn't knock it, lest you understand it

    I understand it, have read the RFC multiple times, and have implemented pieces of the standard. Still, I stand behind my original statement that it is a crummy standard. It breaks compatibility with other devices and implements network congestion avoidance at the transport layer.

    I am not a member of "legions of Windows users", or a troll. Many Linux users have this lemming mindset that anything implemented by the kernel development team must be the correct way of doing things, and everyone else is wrong. They are the "stupid trolls", as someone so eloquently stated.
  • Its also important to note (for those that don't read the insanely useful Kernel Traffic [zork.net] that Rik had a good point, the LKML admin person eventually agreed with him, and they worked out an alternate solution.

    Ahhh, open source. Its a bit messy, but works out nicely in the end more often than not...

    So, I'd extrapolate and give them the benefit of the doubt on the ECN thing. They appear to be quite reasonable when presented with coherent arguments.

  • Seriously, man. You are making reasoned arguments, I'll grant you that, but you're basis is a bit dodgy.

    Here's a link for ya. LKML FAQ on ECN [tux.org]. Nifty.

    As an aside, I thought it was entirely funny watching that stairstep. Did you notice that you got totally outgunned on slashdot IDs? Every single person trying to reason with you had been around for longer than you, and you're id indicates you're no slouch.

    Anyway, it appears from the FAQ, the RFCs, and the circumstantial evidence of major vendors providing bug-fix patches for this thing that its not a "deny by default" thing like blocking HTML tags, its a real-deal out-of-spec problem, and networking vendors need to get their act together.

    I didn't enable that option though, so I don't particularly care either way...

  • I'd actually agree with you there - the slashdot ID thing was what generated the "Hilarious" I put in the subject. When I saw that it was 4 vs zip on the stairstep I pictured a group of aging Linux zealots (possibly bearded? witness LKML April fools posting describing Dirty GNU Hippies) going after some hot shot young network punk.

    Barring some statistically significant correlation between length of time on /. and general knowledge of networking, I'd believe it has no bearing

    However, I'd speculate that length of time /. does have a correlation on fervor of defense in linux-critical articles and comments, of which this stairstep was one. That cracked me up.

    I dig the 638 though - I prostrate myself before your greater /. glory ;-)

  • Somebody please mod this one up - clearly a lot of people think 'RFC = Standard', when the ECN RFC is clearly Experimental and explicitly not meant for production usage...

    Currently there is *absolutely no practical benefit* in setting ECN bits in Internet packets today, because you need ECN capable routers throughout a network (or at least at bottleneck points) for ECN to be useful.

    ECN is intended to work like this:

    - ECN-capable host sends packets, setting the ECN-Capable bit in the IP Header's TOS byte to 1 so that routers know ECN is worth using

    - packet experiences congestion in a router somewhere, i.e. router queue is filling up but not yet full

    - router, rather than dropping the packet (which it could do, see WRED), chooses to forward the packet but mark it as 'congestion experienced' using a spare bit in the TOS byte of the IP header.

    - host senses that congestion was experienced and does something about it - essentially the same as if the packet was dropped (e.g. TCP will halve its window size) but with the benefit of being able to process the packet rather than having to wait.

    The end result should be quicker adaptation to congestion conditions, by avoiding some timeouts and retransmissions.

    ECN is an interesting technique, but it will take a long time for it to be tested and debugged in realistic conditions, and for people to deploy it widely (perhaps in a modified version that is Standards Track within the IETF). Some routers, particularly routers in the core of the Internet, may never use ECN, since dropping packets is easier than modifying one bit.

    Turning on ECN now will at least mean that some firewalls won't drop packets with ECN bits set, which is probably a good thing, but it's only going to help the ECN researchers in practical terms.
  • Problem???? Wheres the problem???? ECN is explicitly marked as experimental in the kernel info. Much more, it says that MANY ROUTERS ARE NOT PREPARED TO HANDLE IT. I cant beleive somebody wrote a piece about a experimental feature in linux kernel AND that slashdot links to it. Unbeliable. Fsacking amazing. Next year slashdot will link to a story about Apache 4.0.0.0.0a as Apache having trouble to serve pages because theres no 4.0.0.0.0a code writen yet.
  • And as of now (24 April), vger.kernel.org is still not using ECN. Unless something between it and here is removing the 'WE' flags, as I see incoming mail from vger arrive with only the SYN flag set.
  • Why should (hardware) vendors not release source drivers? The hardware vendor is in the business of selling hardware not (normally) drivers. The vendors produce drivers so that they can sell (more) hardware. So increasing the availability of drivers increases the potential hardware sales. Even if the hardware requires the driver to download firmware, this should not rule out a source level driver, as the downloadable firmware can be supplied either as a separate binary file which the driver reads, or as an initialised byte array in the source files.
  • Sally Floyd's ECN Page [aciri.org] lists ECN implementations. Thanks to some hard work, Linux features prominently in the list, having implementations that date back to 2.0 series kernels.

    ECN does not require universal adaptation to be helpful: every packet that is marked instead of dropped helps. However, it does require that firewalls stay out of the way to be successful.

  • Maybe RFC 1812 - "4.2.2.6 Unrecognized Header Options: RFC 791 Section 3.1 A router MUST ignore IP options which it does not recognize." (caps emphasis theirs, not mine)
  • interesting. except for burstnet, the register (both running linux) and e3expo (nt) all the sites in that list run solaris.

    and i'd bet that burstnet & the register use some sort of linux load-balancer, skewing the results.
    ---
  • No, the 'obvious' problem is present equipment -not- following spec. If routers don't know what the ECN bits are, they should leave them alone and pass them through (as those bit positions were marked as experimental/reserved for future use). The problem is in routers that are too intelligent for their own damn good, that busily reset flags that they shouldn't be touching. Things -were- designed for backward/forwards compatibility.

    Router mfgrs saw fit to ignore that.
  • The problem is not that ECN support is needed at the other end, the problem is that ECN uses bits which were otherwise reserved, and routers which don't know ECN are dropping packets based on the contents of reserved fields which they don't know anything about. If anything is broken, it's network hardware that's assigning new meanings to bits that it shouldn't assume anything about.

  • I don't know which article you read, but I didn't see any statement that ECN was enabled by default. The author did not state where they got their 2.4.x kernel from but I assumed that they compiled it themselves. Especially since the first recommendation for disabling ECN was removing if from the kernel config file and recompiling. There is no need for a retraction, correction or apology. Please calm down 8^)

  • Binary-only modules really aren't supported, you're not going to hear much crying on linux-kernel if they don't work. If you really-really-really cannot distribute modules precompiled for the major stock kernels (stock RH, Mandrake, Debian, SuSE, Caldera) or source then you can always do what 4-Front does, use a small shim that can be distributed as source. Recompile the shim on the target machine and voila! Linux will always be source compatable throughout a stable release, and that is what matters most.

  • And if you would finish the story you would know that the vger admin turned off the DUL when he learned that it was causing problems. Case closed.

  • Way to go. You tell 'em!

    Perhaps one way to describe the situation succinctly would be:
    The problem is network devices that don't implement ECN and fail to act passively with regard to the formerly reserved bit now used for ECN.

  • Because it gives me a bigger kernel. I've now finally managed to exceed 1MB! Small kernels are for embedded systems;)

  • by cpeterso (19082)

    Subject: Explicit Congestion Notification (ECN) now enabled on [lwn.net]
    ftp/filehub.kernel.org

    From: "H. Peter Anvin"
    To: "kernel.org FTP administrator" ,
    mirrors@linux.kernel.org
    Subject: Explicit Congestion Notification (ECN) now enabled on
    ftp/filehub.kernel.org
    Date: Sun, 29 Apr 2001 21:55:49 -0700

    I have enabled Explicit Congestion Notification on zeus.kernel.org, the
    machine which contains ftp.kernel.org and filehub.kernel.org. This means
    that some sites which are behind broken firewalls may have trouble
    accessing it. If you are a mirror site, I would appreciate it if you
    took the time and verified that you can still access filehub.kernel.org.

    Jeff Garzik has a very good page listing ways to fix your firewall to
    deal with these kinds of problems. If someone reports problems with ECN,
    I suggest pointing them to it:

    http://gtf.org/garzik/ecn/

    In particular Cisco have production-level fixes out for all their
    affected products.

    -hpa
  • by cpeterso (19082)
    Subject: Explicit Congestion Notification (ECN) now enabled on [lwn.net]
    ftp/filehub.kernel.org

    From: "H. Peter Anvin"
    To: "kernel.org FTP administrator" ,
    mirrors@linux.kernel.org
    Subject: Explicit Congestion Notification (ECN) now enabled on
    ftp/filehub.kernel.org
    Date: Sun, 29 Apr 2001 21:55:49 -0700

    I have enabled Explicit Congestion Notification on zeus.kernel.org, the
    machine which contains ftp.kernel.org and filehub.kernel.org. This means
    that some sites which are behind broken firewalls may have trouble
    accessing it. If you are a mirror site, I would appreciate it if you
    took the time and verified that you can still access filehub.kernel.org.

    Jeff Garzik has a very good page listing ways to fix your firewall to
    deal with these kinds of problems. If someone reports problems with ECN,
    I suggest pointing them to it:

    http://gtf.org/garzik/ecn/

    In particular Cisco have production-level fixes out for all their
    affected products.

    -hpa
  • According to this message on linux-kernel [lwn.net], David S. Miller plans upgrade vger.kernel.org, the linux-kernel mailing list server, Real Soon Now. This will prevent users behind routers that don't understand ECN from using the linux-kernel mailing list!

    Is this irresposible or just a good incentive for the entire internet to upgrade their routers?

  • Please refer to the bold, red warning prefacing the linux-kernel mailing list FAQ [tux.org]:

    Hot off the Presses:

    On 22-FEB-2001, vger.kernel.org will enable ECN. You may need to switch ISP in order to receive linux-kernel email. See the section on ECN for more details.

    On 25-JAN-2001, David Miller announced that vger.kernel.org will enable ECN in 4 weeks time. This means if your email account is with an ISP which has a buggy router, you will no longer be able to receive linux-kernel mail (as well as other mailing lists hosted on vger). You should check if your ISP is ECN tolerant, and get them to fix their routers or switch to another ISP.


    Of course, these are the same people that use the MAPS DUL to block dial-up modem users [zork.net] from posting to the linux-kernel mailing list. Rik van Riel threw a temper tantrum, saying the DUL was class prejudice based on internet connection and that "DUL is an unethical list to use because it assumes guilty by default. Anyway, since linux-kernel has chosen to not receive email from me I won't bother answering VM bugreports or anything here." Alan Cox quickly replied, Thats ok. Andrea will I am sure be happy to take over as maintainer [of the VM subsystem]."

  • Before opening your pie hole, read the RFCs. Only broken routers who DO NOT OBEY the RFCs fail to pass ECN.
  • but if any Linux admins working for me were upgrading production servers to each new kernel 'just because it was available', they'd get some lecturing. You upgrade production boxes when you NEED to. ie: A security patch...
    It only takes moments to skim the kernel changelog for each new version.

    Also, as I've said before, why on earth would you turn on something like ECN not knowing what it was? And the help file for ECN *DOES* say specifically that it will cause problems on the internet, because many routers don't support it yet.

    This has nothing to do with instability. The kernel is very stable; this has to do with people using things without doing the research.

    The reason a new 'version' isn't released once or twice a year only? OPEN SOURCE. Whenever there are a reasonable number of bug fixes, a new version comes out.

  • Whether ECN is experimental or not, *standards* dictate that the bits in use should be simply passed through by other routers. If a router doesn't understand certain option bits, it's supposed to IGNORE them. It is routers NOT following this *long-standing* standard that are causing the problem.
  • These are not the same bits. The bits often used for TOS were specced for something else, but never really used.
    THe bits ECN is using were originally flagged as 'other'.

    And the main issue is packet filters that say 'these options aren't recognized, so it must be an attack! block it!'
  • by mindstrm (20013) on Tuesday April 24, 2001 @02:12AM (#269555)
    I find it strange. In moving to 2.4 kernels, the first thing I did was, of course, run through the configuration.

    For each option that I didn't recognize, I hit the help button. The help button for ECN (which defaults to off) specifically states that ECN is not supported by some routers, and currently may cause problems with reaching websites on the Internet, so I left it off.

    So my question is: Why would you turn on a new network option without knowing what it was?
  • Sturgeon's law: 90% of everything is crud.

    Presumably, "people" is a subset of "everything".

  • Unused bits in packets, be it IP or another protocol, could be used for a subliminal channel. So your statement that they should always left alone isn't always true. The paranoid among us should always clear them.

    That said, most of the time you're probably right most of the time. Why fiddle with them when they're of no concern to you?

    ----------------------------------------------
  • If you want the benefits of ECN but still need to connect to sites behind broken routers/firewalls, you can temporarily switch it off:

    echo 0 > /proc/sys/net/ipv4/tcp_ecn

    And then a 1 to turn it back on again. No need for a reboot.

  • Of course, these are the same people that use the MAPS DUL Please report the whole story, not just half. Maps DUL usage has been dropped shortly after.

    Also, please note that using DUL generally does not block dial-up users: it forces them to use the ISP's server as a relay, as it should be. Unfortunately, it seems that there are troubles for some dial-up users to do so, and for the sake of them DUL has been dropped. But the vast majority is not affected at all.

    Heck, if you want to use your local sendmail anyway (which makes sense with a dial-up account), setting it up your to smart relay your mail trough your ISP's servers is quite simple.

    OTOH, ECN is really a benefit for every user on the net, and we should make pressure on ISPs and network admins to properly configure/update their routers, otherwise it will be just a really nice thing dropped for laziness.

  • If a firewall doesn't understand a packet, and wants to protect a server behind it, it should drop the packet.

    Or the firewall manufacturer could be forward-thinking, realise that someday someone might have a useful reason to set that bit, and reject the packet, probably by sending a RSET with ECN unset. That way the experimental host can be notified of the problem, and can try again without ECN if it chooses.

    I have no disagreement with firewalls being paranoid. I do disagree with firewalls dropping these packets silently. Especially seeing as upgrades fixing the problem have been available since mid-2000, according to here. [tux.org]

    -Spiv.

  • by Spiv (32991) on Monday April 23, 2001 @07:30PM (#269561)

    In fact, there's a very strong argument to be made that linux is being non-standards compliant

    Actually, ECN is designed to be backwards compatible - if a host doesn't understand ECN, it should respond with a packet with the ECN bit turned off, and the ECN-aware originating host will behave accordingly.

    The problem is routers that drop these packets silently. They should either let them through, or if paranoid, reject them, sending a RST back to the original host, which can then retry without ECN. Dropping silently just makes the connection attempt "hang", until it times out.

    Further, it is *not* enabled by default, can be toggled at runtime via /proc/sys/net/ipv4/tcp_ecn and comes with warnings in the appropriate build option. I'd say that's perfectly responsible way to introduce a new feature.

    -Spiv.

  • Also, please note that using DUL generally does not block dial-up users: it forces them to use the ISP's server as a relay, as it should be.

    It is highly debatable if forcing the use of a third party relay is a good thing or not. My own opinion is that the intention should be to eliminate these. The more third party machines an email appears to have passed through the harder it is to find out where it really came from.
  • The problem is in routers that are too intelligent for their own damn good, that busily reset flags that they shouldn't be touching.

    Or even software designers thinking they are doing something clever when in fact they are being completly daft. A common problem, certainly not confined to IP coding in routers.
  • However, it's worth pointing out that this isn't trying to force the user to use an arbitrary third-party relay. Instead, this is try to get dialup users to relay through their own ISPs mail server.

    With certain ISP business models an ISP third party relay is litte different in practice from an open third party relay.

    If properly configured, the result is to increase accountability.

    That can be a very big if :)

    Some ISPs add headers to identify the message source and, even if they don't, they've got server logs to allow them to track things in the event of spamming.

    A necessary first step is to verify someone's idenity before giving them acess. But then knowing which account used which IP, when (static or at least fairly static IP addressing helps here) is the information you'd actually need.

    Also there are advantages to spammers in using third party relays, any third party relays... e.g. you only need to handle a subset of SMTP conditions when sending exclusivly through a third party relay.
  • I'd say that for stuff which is expected to be common to all installations, for example basic IP, it should be enabled by default. For stuff which may or may not be used by any particular site, eg ECN or khttpd, it should be disabled by default.

    This can allow a default kernel to be shipped, and if the user wants anything out the ordinary, then they can customize the startup scripts, without having to rebuild the kernel.

  • seeing as upgrades fixing the problem have been available since mid-2000, according to here. Upgrading a large network comprised of hundreds or thousands of routers takes time to plan, and you don't want to do too often, or until you're sure the new code base is going to work properly. A year is not unreasonable to obtain, test, plan & implement such an upgrade.
  • > Why wasn't backwards compatibility built in to this?

    Actually, backwards compatibility was built into this. The problem is buggy equipment, which misbehaves when presented with option bits which it doesn't understand. This behavior violates RFC 791 Section 3.1 "A router MUST ignore IP options which it does not recognize.". Which means, pass on the packet with these options unchanged, rather than silently dropping them.

  • It is a bug in the router if it doesn't pass through ECN packets. Some paranoid routers Hotmail was using thought ECN was some kind of security exploit and screwed up all communications _trying_ to use it, i.e. those attempting from ECN-enabled Linux 2.4 hosts. I'm not sure what the resolution has been but it's clear that blocking ECN is an abnormal activity that violates RFC's as well as common sense.
  • "this is a nice feature, it would make internet go faster, but some broken routers/firewalls are stupid enough to drop them, so disable it"

    Sorry, but if we disable it, we slightly reduce chances of having proper ECN support everywhere. So we will stick with congestions, although defense techniques do exist and are implemented. Just because of some lame software/hardware/sysadmin.

    Not using a nice feature because of broken third party software is not a thing to do. Enable the feature, and bother at non-conformant sites.

    Should we stick to HTML 1.0 because some rare clients still have a very old browser lying around ? No. Having only a HTML 1.0 compliant browser is totally silly nowadays, clients have to upgrade. So why isn't it the same scenario with ECN ?

    It's just like IPv6. IPv6 drafts exist for a long time. There are implementations for all major operating systems. Everything is widely documented. But almost no one did a single step to move toward IPv6. Why ? Because many pieces of software still don't support IPv6. And why don't they ? Because their developper think "almost no one moved to IPv6, anyway". And you got a marvellous vicious circle. And we stay with a shitty technology while there are alternatives.

    Please do the step.

    http://www.pureftpd.org

    IPv6 compliant.
  • I agree actually, but some would say if you're the kind of person that turns kernel options on and off without reading all the text first and understanding all of it then you shouldn't be turning kernel options on and off - leave it to the distributions (who afaik all have ecn off by default)

    Also, have you submitted a patch to fix the documentation?
  • I agree with you that testing new things is good. Linux, like all unix variants, is an OS for power usert. The fact is that I still think this is not newsworthy, for the simple reason that this advisory says almost exactly the same thing as the warning on the kernel, and the kernel option is off by default. So, if you have turned on this option, you have almost definatly seen this warning already...
  • by spinkham (56603) on Monday April 23, 2001 @06:17PM (#269572)
    ECN is disabled by default, there's big warnings in the kernel help... This is hardly newsworthy ;-)
  • A *standard* RFC says that if a router doesn't know what to do with one of the reserved bits, it should leave it alone. The router doesn't have to understand ECN to do this.
    --
  • by lizrd (69275) <`adam' `at' `bump.us'> on Tuesday April 24, 2001 @06:59AM (#269574) Homepage
    Hardware vendors aren't really in the business of selling hardware so much as they are in the business of integrating hardware with custom firmware and ASICs and selling the end product. The problem with handing out the driver source is that it likely gives more insight into exactly how their custom firmware and ASICs work. The actual process of soldering some chips onto the board is fairly easy and inexpensive, it's getting good firmware into the chips on the board and making the system work well together that's difficult. Anyone who buys a board can take it apart and look at each chip and look at the traces and be able to put together a pretty similar board without all that much work, so you have to protect the part that people can't just see and the best way to do that is to not release the source to the drivers.

    The thing that I don't understand is why the license agreement that comes with most drivers prohibits me from making copies of the drivers. Honestly, are you going to sell any fewer products if I give a copy of the driver to my friend?

  • I disagree that it is not newsworthy. I was having this very problem, and this article helped me correct it.

    True, it does say in the kernel configuration that this option might get you into trouble. So do several options. What the kernel help doesn't say is any good way to tell that ECN is giving you problems. No diagnostic measures to try in the event of problems.

    Some of us like to try new things. We like to see what happens if we enable a feature, because we like to find bugs and squash them. Many people who are running Linux just want a stable system to work with, and that's good. However, those of us who remember what it was like before Linux went mainstream want to continue to push the envelope.
  • I think the real problem is that ECN makes some wild assumptions about which bits in the headers to use. The Ipv4 TOS byte is overloaded to blazes. Bad idea to use that as it should have been considered no man's land, and there appears to be no escape mechanism to negotiate behaviour between two hosts.

    Had I done this, I would have added the extra bits elsewhere... perhaps TCP header extensions.

    Hint to implementors. I would attempt to see if a clear path exists for those bits to work before applying it to a circuit.

    As I said before I think that the protocol is broken because the discovery mechanism is unreliable as we have now seen.

    Time to go back to the drawing board folks...
  • ok. thanks for the clarification.

    I guess the firewalls/routers have their own reasons for rejecting such packets. The reality is that ECN in the spirit of not breaking backward compatibility should be able to work around these scenarios. It current;y doesn't which is why I suggest it needs to be sent back to the drawing board.

    My suggestion is to try ECN first, if a RST is encountered, try again without ECN and mark the path as ECN unfriendly. If the firewall is dropping packets, then perhaps alternate TCP SYN's between ECN & non-ECN packets until a connection is established or timeout.

    It is not a solution to insist that the errant routers/firewalls be replaced. The may have very good reasons for their paranoid rules and in many cases it may be impractical to perform an upgrade.

    Consider also the case where the reserved TCP bits might be in use locally in a site for traffic management (perhaps in error). site policies might mandate that exploits using these bits be terminated ASAP.

    Anyway, my main gripe is that the solution to the scenarios is not to fix the routers/firewalls, but to implement better workarounds in the protocol itself. The appropriate and well tested method of extending protocols like TCP is through header options, and not by manipulating the reserved bits.

    P
  • Too right. Not that Mandrake is by any means perfect, but their 'hackkernel' release under 7.2 (2.4.0test10, I think) had ECN enabled. Took me a couple of days to figure out why I couldn't get to some sites. Now I have Mdk 8.0, with 2.4.3 by default, and ECN is not enabled. Get that author!!
  • it's clear that blocking ECN is an abnormal activity that violates RFC's as well as common sense.

    No, it isn't. ECN is an experimental protocol, and there is no requirement for everybody to implement every experimental protocol invented.

    In fact, there's a very strong argument to be made that linux is being non-standards compliant, since the first rule of experimental protocols is "don't send packets to people who haven't asked for them".
  • Get the story right guys. This isn't a "linux is up to date while other people aren't" story -- this is a "linux is using a protocol marked as EXPERIMENTAL" story. EXPERIMENTAL protocols are protocols which are not only not internet standards, but are not even standard track.

    If using an EXPERIMENTAL protocol breaks stuff, don't use it. You certainly shouldn't expect people to conform to your own non-standard behaviour.
  • Only broken routers who DO NOT OBEY the RFCs fail to pass ECN.

    Right... only routers which do not obey an EXPERIMENTAL RFC run into problems. Guess what? You don't have to obey experimental RFCs. That's why they're *experimental*, not *standards*.
  • ECN is mature and at the Minn. IETF meeting it was voted to be added to the host requirement standard.

    BZZZZZT. Nope, try again. There is now a [B]draft proposed standard[/B] for ECN. That's it. It isn't a standard yet, and won't be for quite some time yet.
  • Which RFC would that be? I can't seem to find it anywhere.
  • Maybe RFC 1812 - "4.2.2.6 Unrecognized Header Options:

    Which doesn't apply here, since ECN is implemented via bits in the TOS octet, not in an optional IP header.
  • The fact ECN is written up as a request for comments document (RFC) means it *is* well on its way to becoming an Internet standard. Even the process itself of becoming an Internet standard is written up as an RFC. Look at the main web page at www.ietf.org and click on the link marked "The Internet Standards Process." Look at what is there! RFC 2026!

    In case people are too lazy to look up RFC 2026 themselves, here's the relevant section:
    4.2 Non-Standards Track Maturity Levels


    Not every specification is on the standards track. A specification
    may not be intended to be an Internet Standard, or it may be intended
    for eventual standardization but not yet ready to enter the standards
    track. A specification may have been superseded by a more recent
    Internet Standard, or have otherwise fallen into disuse or disfavor.

    Specifications that are not on the standards track are labeled with
    one of three "off-track" maturity levels: "Experimental",
    "Informational", or "Historic". The documents bearing these labels
    are not Internet Standards in any sense.

    4.2.1 Experimental

    The "Experimental" designation typically denotes a specification that
    is part of some research or development effort. Such a specification
    is published for the general information of the Internet technical
    community and as an archival record of the work, subject only to
    editorial considerations and to verification that there has been
    adequate coordination with the standards process (see below). An
    Experimental specification may be the output of an organized Internet
    research effort (e.g., a Research Group of the IRTF), an IETF Working
    Group, or it may be an individual contribution.

    And from the top of RFC 2481:
    A Proposal to add Explicit Congestion Notification (ECN) to IP


    Status of this Memo

    This memo defines an Experimental Protocol for the Internet
    community. It does not specify an Internet standard of any kind.
    Discussion and suggestions for improvement are requested.
    Distribution of this memo is unlimited.
  • Putting aside for now the arguments about supporting experimental protocols and the use of one-used-and-now-reserved bits, there is a very simple issue here regarding firewall design.

    Secure firewalls are designed to block traffic by default.

    In other words, if the firewall doesn't understand the packets being sent through it, it will drop them. There's nothing wrong with this behaviour; in fact, if you try to build a "default-accept" firewall by blocking off packets which you know to be undesireable, you'll inevitably run into problems. However, anyone who has tried to get streaming media, or play warcraft, or use any other new protocols through an old firewall will be able to say that this policy can be a nuisance.

    Which, of course, is one reason why there is an internet *standards track* giving people time to adapt to new protocols.
  • If a router doesn't understand certain option bits, it's supposed to IGNORE them.

    I don't know about these specific routers (have we been told which they are?) but the problem might be that they do understand those bits -- in a different meaning. The TOS bits have been redefined a number of times, and the bits used by ECN have been used for other things in the past.
  • Trying every outgoing connection twice (once with, and once without) would work much better, but I don't know how many people would like that sort of behavior.

    I suggested this at the time it was being discussed on lkml, but Dave Miller considered this to violate the RFCs. There are two ways in which these firewalls misbehave with ECN: either they send an RST packet ("connection refused"), in which case the kernel cannot retry the connection, or it just discards the packet (and the connection times out). In the latter case, the kernel retries anyway - but keeping ECN enabled.

    I suggested retrying with ECN disabled on these retransmissions, but this was regarded as too much of a "hack". One problem is that these routers are broken - violating the RFCs - and Dave Miller is reluctant to work around this sort of problem. He wants as many hosts as possible to hit this problem, to force the owners of these routers to upgrade to RFC-compliant software instead. The trouble is, according to the IETF's survey, 8% of hosts are unreachable with ECN enabled - so enabling ECN is a big problem! (One site with ECN blocked when this topic came up on lkml is Hotmail - enable ECN, and you cut off a *LOT* of sites!)

  • ndustry is dying trying to keep up with the kernel releases and from personal experience, a lot of the industry is getting sick of supporting Linux because there is a new friggin' kernel every friggin' month, which wreaks havoc when a kernel module needs to be released with a product...

    Bull. The major kernel releases are more than a year apart, and the minor releases for user kernels are bugfixes, not "new kernels". If the bugfix breaks your module, you were doing something wrong.

    The frequent minor releases are what makes the kernel *stable* because otherwise, not enough people could participate in debugging in the open environment.

  • Dried frog pills, most likely.
  • I seem to remember an option somewhere in the module support section that lets your modules run on different kernel versions without recompiling.
  • Mr Gates, I told you to not drink when you're on your medication. I thought your performance during the trial would have convinced you that I was right.
  • Correction. Only broken routers which do not obey the original RFC for TCP stating that they should ignore the reserved bit , which was later earmarked for ECN if they don't understand it are broken.

    Note the RFC for ECN does not even enter into this. They should just silently pass a packet with that bit set along like any other.

    Its not so much that Linux is on the bleeding edge, its more that said router programmers didn't RTFM.

  • It's right in my help file. It has been publized on the mailing lists. Do you normally enable options that you don't know the consequence of? This is a not a good idea on production servers.
  • Why wasn't backwards compatibility built in to this? Is there some major technical reason why it would be impossible? Seems to me that a "cutting edge" "experimental technology" ought to at least be backwards compatible with all the old stuff.

    Sheesh!


    Dlugar
  • Rest assured, it's not enabled by default. You have to explicitly choose it. The Kernel help tells you it will break major sites.

    You can enable and disable it on the fly. And it's a great idea. I wouldn't knock it, lest you understand it.
    signature smigmature
  • Where was this article two weeks ago, when we were upgrading all of our production servers to the 2.4.3 kernel, and couldn't figure out why we couldn't hit www.ibm.com or www.sabre.com.

    After much troubleshooting, we found the problem. Perhaps the kernel help for ECN should have the warning about certain routers not supporting ECN nearer-to-the top of the help, instead of in the second paragraph:)

    - James
    signature smigmature
  • And RFC1149 is well on its way to becoming a standard?

    Link.
    1. This was not in the IP options section, it was in the TOS section.
    2. These are probably not routers.
    3. Standards don't mean shit.
    4. From RFC 2481: "Because of the unstable history of the TOS octet, the use of the ECN field as specified in this document cannot be guaranteed to be backwards compatible with all past uses of these two bits."
    5. In RFC 791, the bits in question are shown set to zero.
    6. If a firewall doesn't understand a packet, and wants to protect a server behind it, it should drop the packet. Better for an experimental user to not be able to reach a site than for a system to be crashed or hacked.
    Sure, the devices which fail should be updated, as this is now going to become somewhat common. If they are firewalls, and they are good ones, they've probably already notified someone of the increase in dropped packets of this type, and the solution is already in the works.
  • So my O/S gets to influence what sites my browser can and can't see? Ouch.
  • ....is the upgrades to using IPv6 stuff. Now when mozilla or Konqueror try to look up *.bbc.co.uk or *.doubleclick.net (ok the second one isn't as annoying) the fail to find them, probably due to IPv6 addresses rather than IPv4 ones being returned, but the rest of the net not being able to route me to them.
  • Although the answer to the linux-kernel ECN question went unanswered, linux-kernel DOES NOT use DUL.
    ---
  • I thought slashdot had patented that whole "making www pages unavailable" deal?

    They did, but it would be hypocritical for them to enforce the patent. So OSDN will be doing it instead.


    My mom is not a Karma whore!

  • It turns out that it's not the IP ECN bits from the old ToS byte that cause the problem - it's the ECN bits in the TCP flags field that are used in the TCP connection setup negotiation to negotiate the use of ECN. Some firewalls mistakenly think that those bits signal some sort of attack. RFC 793 says these bits should be ignored on receipt by old TCP implementaions, so any firewall that resets such connections is simply broken and should be replaced.

    More more details on tests to verify what's happening, see http://www.aciri.org/tbit [aciri.org]

    - Fzz

  • One problem is that these routers are broken - violating the RFCs - and Dave Miller is reluctant to work around this sort of problem. He wants as many hosts as possible to hit this problem, to force the owners of these routers to upgrade to RFC-compliant software instead.

    That's a lovely thought, and I think it's a fine idea, but it's horribly impractical. I think the solution is to provide both behaviors, make it a module option, and let people set it if they like.

    The fact of the matter is that you will never get everyone to do everything properly. It's just plain not going to happen. Yes, you should try, but you should also try to play well with others, even if they aren't inclined to do the same.


    --
    ALL YOUR KARMA ARE BELONG TO US

  • In other words, if the firewall doesn't understand the packets being sent through it, it will drop them. There's nothing wrong with this behaviour; in fact, if you try to build a "default-accept" firewall by blocking off packets which you know to be undesireable, you'll inevitably run into problems.

    I think you're reading the wrong things into this. Yes, a firewall is generally default deny, and with good reason. But bits "reserved" in the specification for TCP/IP should not even be examined by a firewall which doesn't actually know that they're supposed to be used for something. In this way, you preserve compatibility with future upgrades. Sure, you'll be missing some functionality, but then I've always thought that firewalls are, without exception, crap. Every firewall should be user extensible WITHOUT tweaking code; You should be able to name a bit, and then use that name in your matching criteria. Alternately, specifying bit masks would be a crappier but workable alternative.


    --
    ALL YOUR KARMA ARE BELONG TO US

  • Actually, if you're releaseing binary-only modules, it does reak havoc since your 2.2.17 module won't insmod on a 2.2.18 kernel - meaning you have to supply binary versions for every possible kernel combo to keep insmod happy.

    At least - that's my understanding of how modules work. As far as I know, they contain specific symbols that link them to one kernel compile at a time - which is why the NVIDIA kernel module is a small C module to interface with the actual binary module. You compile the small C module which just links in at compile time the actual NVIDIA binary-only module. That solves the kernel versioning issues - although there may be ways to use out-dated modules in newer builds (like a 2.2.17 module in 2.2.18, but definately not 2.2.17 in 2.4.3). But if there is, I don't know what it is.

    Of course, in the perfect Open Source world, you can recompile the modules every new kernel, but if people want vendor Linux driver support, some sacrafices must be made...

  • Whee, that's insightful. Not.

    It might have helped if you had decided to read the comment you replied to. alehmann said nothing about implementing ECN. ahelmann did say something about blocking ECN. There is a world of difference. See, routers shouldn't just throw packets away if they have extra information in them. This is rude, and hinders adoption of new protocols, which don't hinder the router's operation in the least, and will often allow hosts on either side of the router to utilize these new protocols, even though the router in question cannot.

    Go away and come back when you have learned about the Internet.

    --

  • I've been using the 2.4 kernel on my laptop since the week it was released, and I've had no problems (to my recollection, at least) visiting any sites. Granted, I use Datek instead of E-trade =). Looks like the sites that have older equipment have been quickly updating though, and I see no reason to disable this forward-thinking ECN.

  • To clarify, if they give a recommendation for disabling ECN this implies it is already enabled.
  • huh? the evidence you present supports Frater 219's interpretation quite nicely. you owe Frater 219 a retraction :)
  • Think about it. If you only communicate with hosts using the criterion you mentioned for the "opt-in" version, you will never end up using ECN. The only time you will use ECN is when talking to hosts that always use ECN. Apply a bit of induction.

    Trying every outgoing connection twice (once with, and once without) would work much better, but I don't know how many people would like that sort of behavior.
  • My linux box can't connect to my school's mail server, but my Windows Box that is being masq'd by my linux box can. Is this the same problem? Uninformed.
  • by JCCyC (179760) on Monday April 23, 2001 @06:49PM (#269614) Journal
    Like... (correct me if I'm talking BS) if I receive an ECN packet from some IP, I store that IP in a table (maybe saved to disk every now and then) and only use ECN for hosts in that list. That would be the "opt-in" version.

    An "opt-out" version could be made too, but I guess an external maintainer would be needed for such a list -- it wouldn't be desirable for every other connection to drop in the process of building what supposedly is a performance booster.

  • Some paranoid routers Hotmail was using thought ECN was some kind of security exploit and screwed up all communications _trying_ to use it, i.e. those attempting from ECN-enabled Linux 2.4 hosts.

    Hey, since MS owns Hotmail, I am sure that someone there thinks that they are not under any obligation to help out by acceptin ECN.

    ;-)

    "Bill, do you think we should use this ECN stuff?"
    "I don't know, do we own it?"
    "Nope"
    "Does NOT accepting this Screw up Linux?"
    "Yep"
    "Can you read my Mind?"
    "Yep!"

    Of course, I would never accuse anyone of being negligent, or of being underhanded. Me? never!

    Check out the Vinny the Vampire [eplugz.com] comic strip

  • It is highly debatable if forcing the use of a third party relay is a good thing or not. My own opinion is that the intention should be to eliminate these.

    However, it's worth pointing out that this isn't trying to force the user to use an arbitrary third-party relay. Instead, this is try to get dialup users to relay through their own ISPs mail server. If properly configured, the result is to increase accountability. Some ISPs add headers to identify the message source and, even if they don't, they've got server logs to allow them to track things in the event of spamming.

  • by Cerlyn (202990) on Monday April 23, 2001 @08:14PM (#269620)

    All right, this is a flame. Dareth I answer it...

    ECN is *NOT* a standard, nor even standards track.

    The fact ECN is written up as a request for comments document (RFC) means it *is* well on its way to becoming an Internet standard. Even the process itself of becoming an Internet standard is written up as an RFC. Look at the main web page at www.ietf.org [ietf.org] and click on the link marked "The Internet Standards Process." Look at what is there! RFC 2026!

    Many protocols in modern use never became an Internet "standard"; these include things like Mobile IP and 802.11 wireless Ethernet. Your idenfication protocol used by almost any IRC server is RFC's 1431 and 0931; they never became a standard. The number of Internet standards actually issued number less than 70. The IETF itself doesn't link to them much anymore since there is an normally an RFC representing the final form of each one.

    [The] systems that you have 'problems' with are systems that support ECN, not systems that don't support ECN

    Sorry! Thanks for playing. If the client says it supports ECN by flagging that fact with the bits once reserved for future use, it will not run into problems if the other side says it does not. The routers, firewalls, load balancers and/or servers on the other that do not know simply to leave those bits alone and continue normally can be faulted. The TCP protocol said those bits might be used later, but many programmers did not heed that warning. Instead, they drop packets using the once reserved bits, send TCP or ICMP reset messages, etc.

    So in a way, it is the client's fault for supporing a newer extension of TCP/IP that the older one. The extension works fine -- as long as the other end still tries to establish a connection reguardless of ECN support!

    The reason you have trouble with these sites is because you have a client which respects the ECN bit, and there are thousands/millions of other clients which don't, which has the effect of you never reaching the site, since you always back off in deference to those clients which don't.

    Major sites must be busy to the point their links are congested, aren't they? I hope not. Read the article; the problem is routers, firewalls, and other devices seeing the bits marked "for furture use" being used, and considering packets invalid. Again, the fact that an ECN host tries contacting a host that does not support ECN is irrelevant; as long as the packets get through, the ECN-aware end will realize the other end does not, and revert to normal congestion behavior.

    If no device on the other end spoke ECN, you wouldn't have this problem, as it wouldn't have any way to know to treat an ECN aware client differently than one that wasn't.

    The ECN aware client is in charge, at least in the failure cases cited by the article. In most failure cases (at least those I have seen), it is the *client* requesting that the connection use ECN in the first place (although servers are welcome to as well). If after the initial handshake it discovers the remote host does not know ECN, it uses the old-style of TCP throttling behavior in response to bad packets. The ECN extension was designed to allow backwards compatibility with older clients; the people who designed it were not that foolish.

    Get an education before you start posting pretending you know what you're talking about.

    Is the fact I have a bachelors degree in Electrical and Computer Engineering (with honors), 99% of the work for a masters degree in the same, and the fact I was accepted to one of the top doctoral schools in the country enough education? I have spent many many years studying network protocol theory, and several years administering servers. I even wrote my own IRC client at ones point in time based off the RFC documents on it, and that protocol is hardly "experimental" anymore...

  • by Cerlyn (202990) on Monday April 23, 2001 @06:22PM (#269621)

    Let me just say that it is the systems that do *not* handle ECN that are at fault, not the systems that *do* support it. Read the RFC specification here here [ietf.org], or from your nearest RFC mirror (#2481). Note how bits marked as "presently unused" and "reserved for future use" are used for explicit congestion notification.

    Any protocol implementation with a bit of sanity would know to leave reserved bits it did not how handle unchanged. Unfortunately, many systems do not do this. Some firewalls see reserved bits being used as a threat, and reset connections. Other systems have no clue how to react if a reserved bit is not the default value.

    A partial list of sites I know have trouble with ECN enabled (thank goodness they are the minority of web sites out there) is below. But this is like the Y2K bug; it never really should have existed.

    Sites with known ECN problems (that I've seen, anyway)

    • www.zdnet.com
    • www.theregister.co.uk
    • www.returnbuy.com
    • www.uscourts.gov (the entire tree, more or less)
    • www.burstnet.net (at least I don't see their ads!)
    • www.time.com
    • www.latimes.com
    • www.e3expo.com

    (These are only sites I visit rarely, thank goodness; I typically surf another 20+ websites daily without incident)

  • Dropping packets silently is more secure. Don't ask me why, I asked one time, and they just said, "dropping packets silently is more secure, now shut up and sit down, you non-NANOG-reading luser, whilst I upgrade my routers to the latest FreeBSD-STABLE ".

You know you've been spending too much time on the computer when your friend misdates a check, and you suggest adding a "++" to fix it.

Working...