First I'd love to cite an extremely good video on this topic https://www.xiph.org/video/vid...
I'll try to distil down the relevant portion here.
Nyquist showed us that a bandwidth limited signal sampled by a discrete time system can be reproduced perfectly using 2n samples per unit time where n is the bandwidth of the signal in hertz.
Perfectly isn't hyperbole here. That is mathematically shown.
The other half of digital audio is the accuracy of measurement of those discrete samples. “Bit depth” or bits. While we can reproduce a signal perfectly with perfect samples there is some noise that is added by imperfect sampling of a signal. This is mathematically identical to tape hiss and can be manipulated to less noticeable frequencies using a technique called dithering.
Digital audio can and does faithfully reproduce the original signal with levels of noise below human perception even at a meager 16 bit depth and 48KHz sampling rate (44.1 is also very popular but 48 allows easier low pass filter design).
The stair-steps don't come out of the audio jack, the signal is reproduced by the imaging circuit.
Fast attacks that fall “in-between” the samples are NOT delayed or lost since, again using Nyquist, the signal can be perfectly reproduced (and this is demonstrated directly in the video).
There is a lot of myth and misunderstanding when it comes to digital audio, and there is a lot of truth too. The loudness wars, as other posters have pointed out, has done more to damage the reputation of digital audio than anything else and there are plenty of examples of compressed (both kinds) audio sounding just terrible. One being too low a data rate combined with a terrible encoder, the other just using a small fraction of the overall dynamic range. Those are real issues but they aren't fundamental to signal reproduction.
Hope that explains some of it!