If you ever write a module extension that requires interaction with the GIL, you'll find yourself trolling through the interpreter source code trying to figure out when you should acquire/release the GIL. I can say, the experience was pretty fun and educational, but it just seems like concurrency within the interpreter itself was an afterthought. I would think that a relatively modern language like python would have 1st class support for threading in it's "official" runtime. (CPython). I agree with the original post, that the GIL is a huge shortcoming.
As someone who's also written a lot of .NET wrappers around c++ libraries, I can tell you that in theory it's great, but in practice it's no where near as nice as having a fully "managed" implementation. For starters, portability now relies on that wrapper being available on your target platform. If you own the entire stack, that's fine, but there's still an additional maintenance cost. Also, a lot of people writing python code don't have the expertise to simply drop into c/c++, re-write a critical section of code, THEN write the language binding. No to mention when they're debugging, their code goes into this sort of "black whole" method call and comes back out... hopefully in a good state.