So, one problem is that there is not always more data. In my field, we have a surplus of some sorts of data, but other data requires hundreds of thousands of hours of human input, and we only have so much of that to go around. Processing all of that is easy enough, getting more is not.
Also, by "effective", I should have made it clear that I meant "an effective overall solution to the problem", which includes all costs of training a wider, lower-precision network. This includes input data collection, storage and processing, all of the custom software to handle this odd floating point format, including FP16-specific test code and documentation, run time server costs and latency, any increased risks introduced by using code paths in training and , etc.
I'm not saying that I don't believe it's possible, I've just seen absolutely no evidence that this is a significant win in most or even a sizable fraction of cases, or that it represents a "best practice" in the field. Our own experiments have shown a severe degradation in performance when using these nets w/out a complete retraining, the software engineering costs will be nontrivial, and much of the hardware we are forced to run on does not even support this functionality.
As an analog, when we use integer based nets and switch between 16-bit and 8-bit integers, we see an unacceptable level of degradation, even though there is a modest speedup and we can use slightly larger neural nets. I'm very wary of anything with a mantissa much smaller than 16 bits for that reason--those few bits seem to make a significant difference, at least for what we're doing. We're solving a very difficult constrained optimization problem using markov chains in real time, and if the observational features are lower fidelity, the optimization search will run out of time to explore the search space effectively before the result is returned to the rest of the system. It's possible that the sensitivity of our optimization algorithm to input quality is the issue here, not the fundamental usefulness of FP16, but I'm still quite skeptical. If this were a "slam dunk", I'd expect to see it move through the literature in a wave like the Restricted Boltzmann Machine did.
Oh, and thank you for the like (great reading) and the thoughtful reply. Not always easy to find on niche topics online.