There are a number of alternatives -- flushing the BTB on ring switch seems a reasonable starting point. It should eliminate most privilege escalations.
Making the address randomization affect bits outside the range seen by the BTB indexing scheme would also make the attack much more difficult. This would require some non-trivial OS kernel changes
The BTBs themselves can be multi-level and pretty large -- they could form part of a process context, but they'd add several kbytes to it. There is no hardware support to save/restore this resource, and it'd have to be *fast* to be of any use. For paranoid people, flushing the BTB on every process (not thread) switch would pretty much stop this attack in its tracks, with a small performance penalty.
It's not clear that making the BTB part of the process context would make things faster overall -- you'd get better prediction, and worse ctx switch overhead. It's not clear to me which would win.