Unfortunately, the GIL does not only inhibit performance scaling with threads, it causes performance to scale *negatively* on multicore systems. Don't believe me? Try to run a threaded code on a single core system, and then on multicore systems (`cpusets` come in handy for this test). That holds for I/O bound threads just as well as for CPU bound ones or mixtures thereof. Singlecore systems are hard to find nowadays, and cpusets are not always available (in terms of permission and kernel support), and it impacts process based scaling obviously. Negative scaling is hard to accept (for my use cases at least).
Further, the GIL causes some strangeness (to put it mildly) with respect to signals and interrupt handling -- simply put, signals and interrupts can get lost in multithreaded python codes, which makes mix of threading and processing *very* cumbersome and unstable.
*Only* using multiprocessing is hard, too, for two reasons: not having threads makes async programming difficult (think GUIs). It makes I/O or event driven applications much harder. And, multiprocessing has its own quirks: ever tried to spawn a process in a daemon (hits an `assert`)? Ever tried to pass around a process handle to a central process manager (`wait()` will cease to work)? And some other funny things. Python does not make any serious attempt to provide POSIX level process handling, in my opinion, and that shows for non-trivial use cases.
In summary: people like me whining about the GIL have more reasons than performance. But, performance is one, too. Threading in Python is in a sorry state, and that is a shame. There is no solution in sight.
Python is a neat language. I would not ever (again) recommend it for production code.