To paraphrase, you can't be too rich, too thin, or have too many bits of precision in a calculation. With single precision you have to be enormously careful not to drop digits even in comparatively modest loops; with double precision you can many digits before you run out. You can see it in almost any computations involving trig and pi -- single precision pi degrades in series much faster than double precision pi. It isn't just a matter of not using forward recursion to evaluate bessel functions, which is unstable in any precision (or for that matter, using book definitions of e.g. spherical bessel functions in terms of trig functions) or reordering series to avoid subtracting big numbers and running small to big instead of big to small -- there is simply a big difference between cumulating a random walk with a random digit at the 16th place and one at the 8th place.
A second problem is the exponent. 10^38 just isn't "big" in a modern large scale computation. It is easy to overflow or underflow a single precision computation. 10^308 is a whole lot closer to big, even expressed in decibels. One can concentrate a lot more on writing simple code, and a lot less on handling exponent problems as they emerge.
A final problem is random numbers. This is actually a rather big problem, as lots of code (all Monte Carlo, for example) relies on a stream of algorithmically random numbers that (for example) do not have a period less than the duration of the computation and that do not have significant bunching on low dimensional hyperplanes or other occult correlations. It is much more difficult to build a good random number generator on fewer bits, because the periods of the discretized iterated maps scale (badly) with reduced numbers of bits and it is more difficult to find acceptable moduli for various classes of generators from the significantly smaller discretized space. You can watch this problem emerge quite trivially by building a Mandelbrot set generator in float and rubberbanding in -- oops, you hit bottom rather quickly! Rebuild it in double and you at least have to work to rubberband in to where it all goes flat. You have to build it in a dynamically rescaleable precision to rubberband in "indefinitely" as the details you wish to resolve eventually become smaller than any given finite precision. This actually illustrates the overall problem with single precision quite nicely -- the emergent flat patches in an graphical representation of an iterated map are isomorphic to the establishment of unintended correlations in long runs of iterated maps in a random number generator and the clipping of the graphical representation of small numbers illustrates the problems with mere underflow in real computations of interest.
Personally, I dream of default quad precision and 128 bit processors. 34 decimal digits of precision means that a random walk with n unit steps (which accumulates like \sqrt{n}) require (10^30)^2 = 10^60 steps to get to where I don't still have 4 significant digits. Even a rather large cluster running a rather long time would have a hard time generating 10^60 add operations. In contrast, with only (say) 8 decimal digits a mere 10^16 operations leaves you with no digits at all, assuming you haven't overflowed already. I've run computations with a lot more than this number of operations. I also like the idea of having overflow around 10^5000. It takes quite a while adding numbers at the overflow of double precision to hit overflow, and one basically could add overflow scale single precision floats forever and never reach it. That gives me comfort. It would also make writing a Mandelbrot set explorer tool where one would be likely to give up before rubber banding all the way to the "bottom" -- there are a whole lot of halvings of scale in there to play with that still leave you with much more resolution than needed on the screen.
rgb