It's not that supporting the old things slows things down, it's that it doesn't speed things up. It actually does cause some problems, because various things in the X11 protocol use 8-bit fields of which a significant space is used by legacy stuff that no one uses anymore, but that's largely worked around in newer extensions.
If you're in a world where most applications are sending commands like 'draw line from x,y to x1,y1' then X11 network transparency is really fast. At the protocol layer, anyway - if you use xlib then performance will suck unless network latency is very low because it adds a synchronous API on top of an asynchronous protocol (XCB fixes this). Modern applications don't do that, they typically render pixmaps and just have the X server composite them. X11 can still do a reasonable job here, with XDAMAGE, XFIXES, and XRENDER, allowing you to keep most of a pixmap (a Picture, in fact) on the server, update image data in selected parts, and do all of the compositing in the server. The problem is that none of the X11 toolkits actually do this very well. Wayland doesn't solve this at all - it simply says 'well, grab an OpenGL context and send drawing commands'. That works okay - the OpenGL protocol allows you to copy textures to the server (and the GPU) and composite them very fast. The problem is that this approach also works fine in X11, and with X11 you get network transparency when you do it (which works reasonably).
The main criticism I'd have of X11 is that it puts too much state on the server. There is no way, at the X protocol layer (or even in the low-level X libraries) of saying 'disconnect this window from this display, reconnect it here', or 'oh, my X server has crashed, recreate my state on this newly restarted version'. The latter worked fine in BeOS almost 20 years ago and works fine in Windows today. The former worked on NeWS 30 years ago. Both are use cases that I'd love to see addressed for modern devices. The Wayland solution to this is 'write a web app'.