I'd say the fundamental problem is that the specifications themselves are a patchwork of code changes written in a natural language.
The original specification is written before the original driver code is modified, or derived from an existing driver for one hardware system, and then recoded for a new driver for another hardware system. With other device drivers (networking), each extension specification is actually specified in a high-level language which can be processed straight into device driver code.
Direct3D has the advantage that the hardware must match the software specification, while OpenGL is more extension applied over extension on different hardware. Since each vendor has different hardware and supported extensions, the implementation of one extension may or may not affect other extensions. For example, you could support FBO (framebuffer objects) using textures as a destination. But then if you implement compressed textures, then those textures can't be used with FBO's, and so additional code has to be added to prevent that use. Usually the reason that you can't use a particular combination of extensions is simply because the hardware logic hasn't been implemented yet.