Comment Re:Seriously though... (Score 1) 24
If you train a system with protected content in a way that makes it possible to produce a duplicate of that content or an identifiable, non-trivial portion of that content under any conditions, your system is a derived work of the input, and you cannot use it to create additional derived works of the input without permission of the copyright holder unless whatever it is you are doing qualifies as fair use, which seems relatively unlikely if your system can replicate its inputs wholesale and those inputs have commercial value as such.
Systems like that that process images, audio, or video in particular tend to look like a walking talking copyright violation of the first order and unless you basically want Congress or whoever to eliminate copyright protection as we know it someone had better figure out a way for such systems to identify what is uniquely protectable about everything they scan (brute facts probably not for example) and either not to scan it or to quit parrotting anything that is. The same rules that any human author, creator, or publisher is required to abide by in other words. Just because you consume enough electricity to boil the oceans does not give you an exemption from federal law.