is part of a healthy and balanced diet.
is part of a healthy and balanced diet.
So, one problem is that there is not always more data. In my field, we have a surplus of some sorts of data, but other data requires hundreds of thousands of hours of human input, and we only have so much of that to go around. Processing all of that is easy enough, getting more is not.
Also, by "effective", I should have made it clear that I meant "an effective overall solution to the problem", which includes all costs of training a wider, lower-precision network. This includes input data collection, storage and processing, all of the custom software to handle this odd floating point format, including FP16-specific test code and documentation, run time server costs and latency, any increased risks introduced by using code paths in training and , etc.
I'm not saying that I don't believe it's possible, I've just seen absolutely no evidence that this is a significant win in most or even a sizable fraction of cases, or that it represents a "best practice" in the field. Our own experiments have shown a severe degradation in performance when using these nets w/out a complete retraining, the software engineering costs will be nontrivial, and much of the hardware we are forced to run on does not even support this functionality.
As an analog, when we use integer based nets and switch between 16-bit and 8-bit integers, we see an unacceptable level of degradation, even though there is a modest speedup and we can use slightly larger neural nets. I'm very wary of anything with a mantissa much smaller than 16 bits for that reason--those few bits seem to make a significant difference, at least for what we're doing. We're solving a very difficult constrained optimization problem using markov chains in real time, and if the observational features are lower fidelity, the optimization search will run out of time to explore the search space effectively before the result is returned to the rest of the system. It's possible that the sensitivity of our optimization algorithm to input quality is the issue here, not the fundamental usefulness of FP16, but I'm still quite skeptical. If this were a "slam dunk", I'd expect to see it move through the literature in a wave like the Restricted Boltzmann Machine did.
Oh, and thank you for the like (great reading) and the thoughtful reply. Not always easy to find on niche topics online.
It seems like these systems are exploitative by design, even if exploitation wasn't explicitly the goal. They're designed with every possible algorithm and available data to maximize labor output at the lowest possible cost. Individual workers are operating at extreme information asymmetry and against a system which does not negotiate and only offers a take it or leave it choice.
This is by far the best comment I've ever seen regarding this sort of algorithmic labor management.
Normally I'm all for this sort of thing--my company is a client and uses it to handle large bursts of data processing quickly--but the information symmetry argument is a powerful one. Also, there doesn't seem to be a lot of competition in this space, which might otherwise ameliorate a lot of the problems induced by the "take it or leave it" bargaining approach.
The analysis provided by the article is absurd, but yours seems to lead to the inescapable conclusion that some kind of regulation is necessary to prevent blatant exploitation. Maybe just reducing information asymmetry in some way, or requiring transparency in reports available to the public on the website regarding effective wages paid to workers as a fraction of the minimum and average wages of employees in their respective countries. Surely someone can find an answer to this.
So they're all excited about the lowest-precision, smallest-size floating point math in IEEE 754?
FP16 is good enough for neural nets. Do you really think the output voltage of a biological neurons has 32 bits of precision and range? For any given speed, FP16 allows you to run NNs that are wider and deeper, and/or to use bigger datasets That is way more important than the precision of individual operations.
There's a lot of rounding error with FP16. The neural networks I use are 16-bit integers, which work much, much better, at least for the work I'm doing. Also, do you have a good citation that FP16 neural networks are, overall, more effective than FP32 networks, as you've described?
In 2005 I was a sysadmin at the IceCube Neutrino Observatory headquarters in Madison, Wisconsin. Biggest project I worked on was porting RS 485 serial drivers from a legacy unix system to Linux 2.6 and setting up the HP rack servers which we then shipped down to the pole from New Zealand on a C-130 Hercules. Also, I built a data visualization system in python+django which ran over a 1km-long DSL network between the drilling site and the south pole base. Never got to down there myself (my FTE boss did), but it was a fun project for a student and looks good on the resume and all. Did I mention SSH connections over satellite to Antarctica are pretty slow?
"Catch a wave and you're sitting on top of the world." - The Beach Boys