Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

[ Create a new account ]

squiggleslash (241428)

squiggleslash
  (email not shown publicly)
http://disneytrademarklawyers.com/

squiggleslash at yahoo dot com

Journal of squiggleslash (241428)

(Crosspost) Has TCP/IP changed and I wasn't told?

[ #202678 ]
Monday May 12, @09:52AM
Networking

Update - 14th May: Problem solved. Looks like it's a bug in OpenBSD's "pf" (well, one that was fixed in a later version) - they were ignoring the "wscale" field of the TCP set up packets. Not an issue until people actually started using wscales of anything other than zero, which is a fairly recent phenomenon. I'm going to use this as a final excuse to abandon OpenBSD. Anyway, thanks to everyone who helped, particularly LarsG and jesup.

So here's the thing: a few months ago my wife bought a new PC, with Vista, and it doesn't connect properly to the outside world via my network. The relevent part of the network looks like this:

Vista Box <Ethernet> OpenBSD NAT gateway/router <PPPoE> DSL modem <DSL> Earthlink

This was kind of the first time I'd ever noticed a machine on my network having these kinds of problems, and I thought it was just a Vista thing. My other OpenBSD box, similarly connected, has no problem. Well, except for Multiply and Wikipedia where normal queries work fine but trying to post anything is a PITA. I'll talk about that in a moment.

After a while I just set up a SOCKS proxy on the OpenBSD server and that was enough to get things working. But I also noticed that my HD DVD player was having problems connecting to the outside world - and get this, when pointed at my internal web proxy, it worked too.

Skip forward to Ubuntu 8.04. I install this, and suddenly my laptop - which worked fine with 7.0something - has exactly the same problems as the Vista box. And probably the same issue as the HD DVD player - I haven't yet taken advantage of Toshiba's offer to send me the source code to the latter.

My network is the same as it ever was. The version of OpenBSD I'm running is ancient. I. Have. Not. Changed. Anything. Just upgraded one operating system on a device that no longer works and added two other devices. Anything that was working is still working. Anything "new", be it a new PC with Vista or an old PC with an upgraded version of Linux is not.

Conclusion: something has changed. Someone's introduced something new and funky into the way TCP/IP works, that's excited Microsoft and the Linux people, and it's totally screwed up my connection.

I've Googled around. Nobody else seems to be having the same issue. Or perhaps they are and I'm not using the right keywords. PPPoE has a known MTU/MRU issue, but the PPP client I use actually rewrites the headers of outgoing packets to force the other end to use the smaller packet size. Until "the upgrade" this worked fine.

I'm going to temporarily post this in my old Slashdot journal too in case anyone there can think of anything. What's changed? What was introduced in the last year or so that could be causing problems with a NAT/PPPoE connection?

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login
Loading... please wait.
  • seekrit government routing monitoring packet cooties. All I can think of. Or looking for "pirate" content to get onto the throttling down list. The biggest thing lately is ISPs mucking around with who knows what, selling advanced mined content to advertisers, snooping around, etc.
    • Nope. Everything that was working is still working. We can safely rule out activity by my ISP or the NSA (well, the NSA may have submitted patches to Linux that made it into Toshiba's HD DVD players and Ubuntu 8.04, but whether it's a conspiracy or not, the point is something changed there and I need to know what it is.)

      BTW Malda, YOUR FUCKING WEBSITE JUST TRIED TO PUT UP A POPUP. I am so glad I avoid this worthless piece of junk. You fuck up D1, insult those who have problems with it, and you whore your
  • But could it be that your ancient OpenBSD simply isn't fully implementing TCP/IP? And now the newer systems are trying to implement fully even though no one had the resources all those years back? And again, though I I know you are used to this by now, I am very much speaking out of my arse, but could it be some sort of cap imposed by OpenBSD which made sense years ago, but doesn't anymore? Some type of bandwidth constrictor?

    I ask as I think about the motherboard for my play computer at home. It has 5

    • Well, if they're suddenly now implementing some feature of TCP/IP that's been ignored for the last thirty years, I'd still like to know what it is. The point is something has changed, and it appears to be an outside world thing rather than anything on my network, given everything that was working is still working.

      As far as bandwidth caps et al go, nah. That would affect everything, not just stuff running newer software.

  • Feel like providing a tcpdump? This sounds fun. :)
    • Ok, here's a first stab:

      14:50:18.998683 10.0.0.12.43631 > 66.35.250.150.80: F 1491635601:1491635601(0) ack 2800832567 win 122 <nop,nop,timestamp 210135543 999222130> (DF)
      14:50:19.128394 66.35.250.150.80 > 10.0.0.12.43631: R 2800832567:2800832567(0) win 0 (DF)
      14:50:20.578680 10.0.0.12.33543 > 66.35.250.150.80: S 1678693152:1678693152(0) win 5808 <mss 1452,sackOK,timestamp 210135938 0,nop,wscale 7> (DF)
      14:50:20.719969 66.35.250.150.80 > 10.0.0.12.33543: S 3093837837:3093837837(0) a

      • Raw dumps are better, generally. But when debugging routing problems, you really need to trace BOTH sides of the router. You need a dump of the PPPOE link to match up with the plain TCP side. Run a tcpdump on both interfaces at the same time on the BSD box.

        That said, the big difference I see is the handling of win. Perhaps this is tickling a bug in the OpenBSD box's support for slow-start? This is a TCP/IP tuning thing, which modern OSes will have played with, not a change in the protocol.

        I VERY strong
        • I'm not an Earthlink tech so there's absolutely no chance of me running tcpdump on their side I'm afraid. The only thing I can trace is on my end.

          There may be a bug in OpenBSD's PPPoE/PPP clients, but given this bug has only just surfaced, something clearly has changed in the way Linux and Windows do things that's causing the bug to show, hence the question.

        • Did some Googling on "openbsd window size" and noticed 4.3 had a change that might be relevant, though it's a stretch. The line goes:

          Make the TCP input code take the window size into account from the first ACK packet.

          Now the problem with this is that this would make sense if the computer having problems was the OpenBSD box, but it's fine, it can connect to everything, and indeed my workaround right now is for my machines to use the OpenBSD boxes as proxies, either Squid or SOCKS depending upon the appl

        • ...and we're definitely on to something. I deleted the default route on one of the affected computers, and entered:

          sudo route add default gw router window 16384 mss 1460

          ...and now it comes up without problems. You're obviously right about the problem being in the OpenBSD box, as there's no way the smaller window size should be a problem. Now the question is can this be made to work via a configuration change, or does something need to be upgraded? Urgh!

          • Notice the initial window size of 10.0.0.12 is 5808. You stop receiving packets from 66.35.250.150 at the packet where it would go beyond the initial window size (4643 + 1440 > 5808).

            My guess is your NAT gateway (or some other connection tracking firwall on the path) doesn't understand wscale and drops packets that it thinks are outside the window. Do a tcpdump on your pppoe interface and compare, if it receives one additional packet from 66.35 but doesn't forward it to 10.0.0.12 then that's your problem
            • And the "Has TCP/IP changed and I wasn't told?" would be that modern IP stacks use large values for wscale (shift win x bits to the left), thus making devices that do not understand wscale think that the window size is silly low.

              wscale 7 win 1 = wscale 0 win 128.
              • The system I'm using is OpenBSD 3.2. From the OpenBSD 3.3 changes list [openbsd.org]:

                Fixes to pf(4)'s TCP window scaling support

                Alas, pf is a kernel feature, so I have to upgrade the entire operating system to fix this.

                Anyway, thanks to everyone for their help. I'm probably going to redo the server closet now. The cheapest system I could put together (that isn't reconditioned and the wrong size and shape) is a Core Duo with 1G of RAM and a 160G HD which is way larger than what's necessary for a PPPoE NAT router/fire

                • Fixes to pf(4)'s TCP window scaling support

                  [voice style="Adam Savage"] Well, there's your problem! [/voice]

                  Alas, pf is a kernel feature, so I have to upgrade the entire operating system to fix this.

                  Two potential work-arounds: Disable window scaling in the tcp stack on all clients, or write some pf rule to drop wscale from the initial syn/syn-ack packets. Drawback is that window will be limited to 64K, which will degrade performance on high-speed/high-latency links.

                  Core Duo with 1G of RAM and a 160G HD

                  Gotta love Moore's law eh? Virtualbox, xen, vmware would probably all work well. Otoh, something like a wrt54gl running Tomato would be just fine in most situations.

                  • Yeah, right now the only workaround is to set a window size on the clients I have access to. Unfortunately while my HD DVD player runs Busybox, I don't actually have shell access to it (damn you Toshiba! I thought the TOSLINK cable was bad enough, but this!) and my wife's PC runs Vista and, frankly, I don't know where to begin with that one.

                    I don't see any way to write a pf.conf rule to modify anything, let alone the window size. I've searched the manpage all over, and there's nothing obvious - the only

    • Ok, second stab, this time ensuring all traffic, not just TCP port 80, is shown as long as it is to and from the Ubuntu 8.04 box and isn't ssh or telnet traffic:

      15:18:01.480222 10.0.0.12.56356 > 10.0.1.2.53: 41441+ A? slashdot.org. (30) (DF)
      15:18:01.483286 10.0.1.2.53 > 10.0.0.12.56356: 41441 1/3/3 A 66.35.250.150 (156)
      15:18:01.485281 10.0.0.12.33471 > 66.35.250.150.80: S 1975151567:1975151567(0) win 5808 <mss 1452,sackOK,timestamp 210551201 0,nop,wscale 7> (DF)
      15:18:01.603958 66.35.250.15

    • For comparison's sake, here's an example of it working, using my other OpenBSD box:

      15:25:25.450106 10.0.1.1.9006 > 10.0.1.2.53: 47769+ A? slashdot.org. (30)
      15:25:25.452835 10.0.1.2.53 > 10.0.1.1.9006: 47769 1/3/3 A 66.35.250.150 (156)
      15:25:25.463428 10.0.1.1.8845 > 66.35.250.150.80: S 452356023:452356023(0) win 16384 <mss 1460,nop,nop,sackOK,nop,wscale 0,nop,nop,timestamp 1546222779 0> (DF)
      15:25:25.590173 66.35.250.150.80 > 10.0.1.1.8845: S 3123104914:3123104914(0) ack 452356024 win 5792

  • Turned out that several machines all used the same default IP address, so nothing worked.

    Aother time, mixed up a crossover cable with a straight-through one ... gee, that didn't work too well either ...

    Hopefully it will be something equally annoying ...