No, it absolutely does come from designing the API differently.
The key conceptual difference is that in OpenGL, you change state, change state, change state, change state, render, change state, change state, change state, change state, change state, change state, render. You do this every render loop.
In {Vulcan | Direct3D 12 | Metal | Mantle}, you define two states at program launch, then each render loop, you do: bind existing state, render, bind existing state, render.
There's two gains here
1) A small one - you're calling fewer driver functions per frame. Many of these cross the kernel barrier, and as such are actually fairly large performance drags. When you're talking about doing a few thousand renders per frame, cutting out 2 kernel calls per render is a significant win. Cutting out 6 library calls per render is a less-significant, but still reasonable win.
2) The calls to change state can do much less. Like, unbelievable less. In OpenGL, the driver does not know when a render call is going to come, so it has two choices, either 1) it can do all the work for a state update every time a state change call comes in 2) it can cache all the state changes until a render call comes in. In the case of 1, this means it does a lot of duplicated work, in the car of 2, this means render calls lag a really long time. In {Vulcan | Direct3D 12 | Metal | Mantle}, instead, the driver can do all of the state verification and preparation work only once at application launch.
Why is state verification and preparation work so expensive I hear you ask. Well, a state change can have surprisingly large knock on effects. For example, on many graphics cards, blending is implemented as a frame buffer fetch, plus a few instructions at the end of the shader. That means that if you change the blending state, the driver actually has to re-compile the shader. Similarly, if you change the vertex format (e.g. to normalised vertices), again, that's implemented as a few instructions on the front of the vertex shader, so... gotta recompile and relink again.
Basically, a surprising amount of stuff requires a complete recompilation of the shader, and that's really costly. OpenGL makes the driver do this lots of times per frame. {Vulcan | Direct3D 12 | Metal | Mantle} do not.