Also, why is a microkernel OS so apparently difficult to construct?
They aren't. There are many microkernel OSes out there that are successful, like QNX (which has made plenty of noise about how it runs nuclear reactors and such). Hell, even Windows was completely microkernel at one point.
The main problem is performance. This comes from two problems - repeated kernel requests, and IPC.
Kernel requests happen because device drivers are run at application level (which provides great isolation). However, device drivers tend to require a lot of stuff at the kernel level (which is why they're typically in the kernel...) - things like interrupts, physical memory access, DMA, memory allocations (both physical and virtual), and such. Each one of those things it can't do alone (because well, it's an application - if applications can do those things, your microkernel is no better than DOS - the goal is to isolate things from each other). So it becomes an kernel API call to request an interrupt, to register an event object (the interrupt handler runs in the driver server as an interrupt thread), to get memory mappings installed, etc. Each API call is a system call in the end, which are generally expensive things because they require context saving and switching (some microkernel OSes use "thread migration" to mitigate this) and so forth.
The second problem is IPC. All the servers are isolated from each other and can only communicate through IPC mechanisms. So a microkernel has to end up being a message routing and forwarding service as well. Let's say an application wants to read a file it has open. It calls read(), which traps into the kernel (system call, after all), which the kernel then sends a message to the server which can handle the call (filesystem), so it passes the message to the filesystem server and then switches back to user mode so the filesystem server can handle it. The filesystem server then translates it to a block and issues a read to the partition driver (which if it's a separate server is yet another user-kernel-user transition), which then goes to the disk driver (u-k-u). From here, it goes to the bus handler (because said disk can be on SATA, IDE, USB, Firewire) where the transfer actually happens, and then the message winds it way back to the disk driver, the partition driver, the filesystem driver, then the application.
Switching from user to kernel is expensive - generally requires generating a software interrupt (system call) which triggers into the kernel's exception handler which then has to decode the request. Switching back is generally cheaper (usually just a return instruction which sets the proper mode bits), but you're still taking several mode switches per API call.
No big surprise, these things add up into a ton of cycles.
Microkernel OSes have developed means to alleviate the issue - thread migration being a big one (typically a server is implemented as a thread waiting on a mailbox, it gets the message then handles it). Thread migration means the application's thread context isn't saved, but migrated to the kernel, then passed onto the servers as necessary so instead of having to wake up threads and run the server loop, it becomes more expensive function calls, almost like RPC except the thread that called it is where it's executing on.
In a monolithic OS like Linux, all those messages and IPC are reduced down to function calls (usually through function pointers) - so the application making the system call becomes the only transition - the virtual filesystem handles the call, calls the filesystem driver, which calls the partition driver, which calls the disk driver, which calls the bus driver, ... and then they return up the stack like a typical subroutine call.
Oh, and Windows NT 3.51 did this as well. Guess what? Graphics performance sucked, which is why in NT 4, Microsoft moved the graphics driver into ring0 (kernel mode), thus creating the ability for poorly written graphics drivers to crash the entire OS. But, graphics are faster because you're not shuffling so much messages around. I think Windows has steadily put more and more of the graphics stack in the kernel since then, as well.