Comment Reading copyrighted material is part of learning (Score 1) 196
We go to school and read copyrighted material. We are tested on how well we can remember the copyrighted material. We are given essay assignments where we are to analyze and draw conclusions about the copyrighted material. Learning simply involves ingesting copyrighted material and making sense of it. There's no getting around it. We all did it in grade school.
Copyright holders do not seem to be lining up to sue Doctors who have memorized the copyrighted material. The only reason they would even entertain the idea of suing AI companies is the perception that their material singularly caused the model to make money. In fact, we don't know how the model learns. When/why does it get good at math? When does it get good at translations? What causes it to understand logic problems? If we knew exactly how it worked, we could probably get the same effect with synthetic data. For now we should treat the ingestion of copyrighted material just as we do for students. It's part of an education of how the world is.