Anthropic wouldn't have gotten into any trouble at all with this if they hadn't torrented a metric shit ton of books. If they had simply purchased a copy of each one second hand, they would have been clear training their models on those.
Even though I have a subscription. I cannot stand having to sit through ads anymore.
Should and will are two entirely different things. If people have to take a drug to actually lose weight, let them take it!
So right now the doomsday clock is closer to midnight than it was during the Cuban Missile Crisis (when it was at 7 minutes)?
Excuse me if I don't take it very seriously.
>They make copies in order to do training.
Yes, but making copies in order to compile information about a protected work was already litigated and won in the Authors Guild v Google case.
"The court's summary of its opinion is:
In sum, we conclude that:
Google's unauthorized digitizing of copyright-protected works, creation of a search functionality, and display of snippets from those works are non-infringing fair uses. The purpose of the copying is highly transformative, the public display of text is limited, and the revelations do not provide a significant market substitute for the protected aspects of the originals. Google's commercial nature and profit motivation do not justify denial of fair use.
Google's provision of digitized copies to the libraries that supplied the books, on the understanding that the libraries will use the copies in a manner consistent with the copyright law, also does not constitute infringement."
So at what point could the copyright infringement even happen? Is it in the training of the data and adjusting the weights? Is it the fact that those weights exist in a state that can somewhat reproduce the data it was trained on? Or is it when we prompt it to reproduce that data? My bet is, if anything, the courts will come down against the last one. Training data is already sufficiently transformative as to not infringe copyright. It's only when it actually spits out something substantially similar to an existing work that copyright is violated.
Incidental copying is allowed under copyright. If no copy of the source exists in the LLM, there is no copyright infringement.
>Without a license, purely for developing software, and en masse, yes.
You don't need a license to compile data about an article, and that's all that an LLM is - data about the frequency of words and their order. It's highly transformative, and this sort of thing was already fought an won by Google when they digitized and made searchable millions of books without their copyright holder's permission and for commercial purposes. An LLM doesn't even retain the original work.
https://en.wikipedia.org/wiki/....
"The court's summary of its opinion is:
In sum, we conclude that:
1. Google's unauthorized digitizing of copyright-protected works, creation of a search functionality, and display of snippets from those works are non-infringing fair uses. The purpose of the copying is highly transformative, the public display of text is limited, and the revelations do not provide a significant market substitute for the protected aspects of the originals. Google's commercial nature and profit motivation do not justify denial of fair use.
2. Google's provision of digitized copies to the libraries that supplied the books, on the understanding that the libraries will use the copies in a manner consistent with the copyright law, also does not constitute infringement.
Nor, on this record, is Google a contributory infringer."
I think you're looking at the post id and not the user id. I did the same thing at first.
Check the edits on the page just from today. Lol.
"Facts are stupid things." -- President Ronald Reagan (a blooper from his speeach at the '88 GOP convention)