The inter-frame spacing is mostly the same. It's tiny compared to contention window handling and actual frame duration. It's not really the main reason for drastic slowdowns in mixed networks.
The main issue in mixed networks is:
* having to enable RTS and CTS-to-self frame protection to interoperate with legacy stations that don't understand MCS rates, and
* just sheer length of non-aggregate frames (ie, 11abg frames, and 11n stations that aren't doing aggregation - eg if they're doing voice data that isn't being aggregated into A-MPDU or A-MSDU for whatever broken reason they have.)
The other major thing is that most consumer grade APs don't do fair scheduling very well, so when you have multiple stations all doing traffic, they can end up with an uneven balance of traffic, causing drastic reductions in throughput. I won't go into the handwave details unless people care; I've written about it before.
Now, _I_ get ~ 170mbit TCP throughput on FreeBSD -> FreeBSD atheros 11n devices (AR9280 2x2, 5GHz) _WITH_ RTS/CTS and legacy interoperability enabled. Things just tend to slow down when multiple stations show up.