Oh please, another software engineer? Amplifiers are by their very nature non linear devices as a whole (they just happen to have a linear region which we can make use of). The amplifiers in question are operated within their linear region as much as possible where possible, but certain requirements like efficiency force the designers to drive the transistor partly into its non-linear region (closer to P1dB). Some non-linearity is tolerated and is dictated by the FCC, ETSI or CRTC in the form of emissions masks or by the wireless standard in the form of modulation quality. The only way to ensure the amplifier is always inside the linear region under all conditions would be to back off from P1dB by so much that your efficiency tanks. But that is entirely not feasible for cellular design...consumers like long battery life and carriers like low operating costs.
Now, getting back to your comments. "As long as the mismatch is within spec, the only problem will be reduced efficiency". Amplifiers (or to be more specific, the transistors used in amplifiers) do not have real imput and output impedances. The real (resistive) component will generally not have the desired characteristic impedance (usually 50 ohms) and can be quite small (sometimes a few ohms or even tenths of an ohm). The imaginary (reactive) component will also be non-zero (which is undesirable, but a fact of life) which will tell whether the output (or input) is capacitive or inductive (depending on the sign of the reactive element). Real "high power" amplifiers (I say "high power" to describe the condition where the amplifier is operated towards the upper bounds of the linear region) are not simply matched for maximum power transfer and your done (the input is often matched this way since you would like to ensure any power available to the transistor will actually be taken into the device to be amplified...this is different for low noise amplifiers). This is called conjugate matching (where you set the real parts equal, and negate the reactive part).
On the output a different set of techniques is used. Loadpull is one technique which allows you to design your output matching network not only for linearity, but also efficiency or any other characteristic you can measure. The output matching network that produces the best efficiency (which is what we are talking about here) is most likely not the same as the one that produces the best P1dB or linearity. Also note that conjugate matching or other types of matching do not mean zero reflection (or VSWR=1). By the nature of the networks, the resulting VSWR (albeit low VSWR) is actually part of the desired characteristics of certain matching networks. Put another way, having the best VSWR response (i.e. zero reflection) will not get you the best efficiency (this is the aspect of your post that I take issue with). Reactive components do not dissipate energy (well, if you cosider the small resistive component they do, but this is orders of magnitude smaller than the other resistive components).
All this being said, another way to look at it is that if the reflections occur as part of the matching network, these can be tolerated since they are an inherent part of the design. Reflections after you have reached 50 ohms (i.e. between the matching network and the antenna) can be devastating to an amplifier. This is why they place a circulator or isolator directly after the matching network in most cases...this allows the output of the matching network to see a 20 dB match at least (depending on the circulator) regardless of what happens after (the antenna breaks, cable breaks, etc.). This prevents potentially devastating power from returning to the amplifier.