Ultimately, I cannot explain to you it's not possible; absence of a solution is not proof of it's non-existence.
I can enumerate the known difficulties.
1. Fastest rendering method we have nowadays is direct rendering on the GPU hardware. Second fastest is often server side software rendering. X11 server side rendering often comes last.
The issue with methods 1 and 2 is that they include the application sending large amounts of pixmaps to the display server (akin to playing video).
Only effective use of method 3 makes X11's network capabilities useful.
But, when faced with the decision, application developers tend to favor 1 or 2 and ignore 3.
History teaches us this: it was happening before the Xrender extension; it happened again now, as Qt5 does not have an effective server side rendering backend.
Thus, making network capability a core part of the display system can be an exercise in futility, as application developers may simply not make use of it.
2. Server side rendering, which as I've shown is an integral part of X11's network capabilities, adds considerable complexity to the display server.
This is an huge issue in Wayland. In Wayland, the display server, the window manager and the compositor are rolled-in into a single program, known as the Wayland compositor.
To make this work, Wayland compositors must be relatively simple, not much more complicated than X11 window manager/compositors.
This is not compatible with supporting the sophisticated server side rendering features provided by X11. The X.org display server is a huge program, in part because the X11 protocol is complex.