"They only need to be prepared to change state information if the context pointer changes." How do you know when the context pointer changes? Assuming a context pointer is passed on each API call, what do you compare it to in order to detect the change? Either a thread-local copy of the previous pointer (in which case you're still using thread-local storage, and you might as well keep enough thread-local state to avoid the comparison), or an identifier maintained by the hardware (in which case you've incurred the cost of a synchronous access to the hardware state). I suppose you could be assuming that there would be a single global rendering context for the entire process (as opposed to each thread), but that's a performance loss for apps that need to render from multiple points of view (think of CAD apps rendering multiple-axis views simultaneously, or games that want to render environment and shadow maps before performing a final scene rendering).
"The same thing can happen if two different threads access the GPU without any explicit calls by the API signaling a context change. The answer is the same, don't do that." You don't have any choice. Two different (unrelated) processes can access the GPU without any explicit calls signalling a context change, and you have to be prepared to handle that. In workstation and modern general-purpose (as opposed to some gaming) systems, this is common enough to need support. There are always multiple graphics processes running, and if they're maintaining interactive performance levels, un-synchronized context switch happens all the time.