First bear in mind the attacker has local code execution. If they can put up a fake screengrabber, it's just a logout/reboot away from running a trojaned compositor (if you use Wayland), a trojaned screenlocker (if you use X) and on either system without even a reboot, a trojaned browser, terminal, ssh program and so on and so forth. So to say this is a serious flaw with X is hyperbole.
The next case is that you also claim Wayland is secure. Therefore X11 running on Wayland is secure. Therefore in that case X11 is being run in a secure manner. I claim that if that is the case, then X11 could very easily be secured, because it's eassy to see it in operation nowrunning in a way that the additional insecuritu doesn't break things.
I'm not really sure how creating yet another way for a "designated program" to monitor input events is supposed to address the problem that any X11 client can monitor keyboard events on any window in the absence of a grab, unless you intend to rewrite all existing software to grab the keyboard on receiving input focus, and force all the desktop environments to implement support for the extension and move their global keybindings into a specially designated client. At that point you might was well switch to a system designed for secure I/O from day oneâ"like Wayland.
OK, I'm lightly lost so I'm going to swing back to the original point.
First there's the one about server grabs which prevent other windows from opening. Well, you could easily have a protocol extension that allows only one connected client to bring up windows anyway. The continuation of the grab could either be faked to the grabber, or killed outright (the latter feature---killing grabs---was removed from Xorg by the wayland people because they decided we didn't need it!). Let's say it's first come, first serve, so that the first client to request this feature is the only one to get it. Or the screenlocker could get that command. This requires the WM and screenlocker to be run on boot before a trojan, but as I pointed out, if the system is that deeply trojanned anyway, then this is all pointless.
That requires some rewriting to whichever screenlockers you want to add the feature to, hardly a major undertaking since there's about 3 in common use and a few, more obscure, ones.
The other problem---a designated screen lock key combo. Well, if the screen locker has a passive grab on ctrl-alt-delete, then the fake screenlocker can't grab that, so that already works.
It's easy to implement the insecure X11 model on top of a secure system. The reverse is much more difficult.
Why? Why not have exactly the same security model? You haven't explained, only asserted, that your chosen security feature couldn't be easily available under X.
In fact when it comes to locking things down, there are things like the X security protocol, which blocks untrusted programs from executing various protocol commands. This already exists and could (I haven't checked if it does) easily block things like receiving events from a window on another connection, reparenting or redirecting a window on another connection, diddling with the global keymap and so on.
Anyway if there's unsanboxed local code execution, you're basically screwed on any system.