The article implies that it's easy to do, there was simply never a need before. I seriously doubt that it's a trivial thing to accomplish a four-fold increase in bandwidth on existing infrastructure.
It's not, as you have pointed out. My interpretation is that, on the contrary, phase and polarization diversity (which I'll lump into "coherent" optical transmissions) are hard enough to do that you'll try all the other possibilities first: DWDM, high symbol rates, differential-phase modulation... All these avenues have been exploited, now, so we have to bite the bullet and go coherent. However, on coherent systems, some problems actually become simpler.
Polarization has a habit of wandering around in fiber.
Quite so. Therefore, on a classical system, you use only polarization-independent devices. (Yes, erbium-doped amplifiers are essentially polarization-independent because you have many erbium ions in different configurations in the glass; Raman amplifiers are something else, but sending two pump beams along orthogonal polarizations should take care of it.)
For a coherent system, you want to separate polarizations whose axes have turned any which way. Have a look at Wikipedia's article on optical hybrids, especially figure1. You need four photoreceivers (two for each balanced detector), and reconstruct the actual signal by digital signal processing. And that's just for a single polarization; double this for polarization diversity and use a 2x2 MIMO technique.
That's why it's so expensive compared to a classical system: the coherent receiver is much more complex. Additionally, you need DSP and especially ADCs working at tens of gigasamples per second. This is only just now becoming possible.
Phase-encoding has similar problems. Dispersion, the fact that different frequencies travel at different velocities (this leads to prisms separating white light into rainbows), will distort the pulse shape and shift the modulation envelope with respect to the phase. You either need very low dispersion fibers, and they already need to use the best available, or have some fancy processing at a receiver or repeater.
Indeed. We are at the limit of the "best available" fibers (which are not zero-dispersion, actually, to alleviate nonlinear effects, but that's another story). Now we need the "fancy processing". And lo, when we use it, the dispersion problem becomes much more tractable! Currently, you need all these dispersion-compensating fibers every 100km, and they're not precise enough beyond 40Gbaud (thus 40Gbit/s for conventional systems). With coherent, dispersion is a purely linear channel characteristic, which you can correct straightforwardly in the spectral domain using FFTs. Then the limit becomes how much processing power you have at the receiver.
The article downplays how hard these problems are. It implies that the engineers simply didn't think it through the first time around, but that's far from the case. A huge amount of money and effort goes into more efficiently encoding information in fiber. There probably is no drop in solution, but very clever design in new repeaters and amplifiers might squeeze some bonus bandwidth into existing cable.
Well, yes, much effort has been devoted to the problem. After all, how many laboratories are competing for breaking transmission speed records and be rewarded by the prestige of a postdeadline paper at conferences such as OFC and ECOC ;-)?
As for how much bandwidth can be squeezed into fibers, keep in mind that current systems have an efficiency around 0.2bit/s/Hz. There's at least an order of magnitude left for improvement; I don't have Essiambre's paper handy, but according to his simulations, I think the minimum bound for capacity is around 7-8bit/s/Hz.