Comment GNU Virility Thought Experiment (Score 4, Insightful) 54
I'll preface this by saying I don't think LLM creators should be able to use content without permission/license. This is just an interesting discussion.
LLMs generally do not reproduce text. They can be made to do so with specifically crafted prompts but no current LLM is just going to regurgitate "Free as in Freedom" unless asked to do so. Instead it will use statistical matching to apply the text to probable matches, a very crude version of what we do. LLMs are starting to approach the way we meat sacks use books. We take in the information and then we apply it to problems. Where do we cross the line? Where do we say anything (or anyone) who is trained on (has read) this material is now required to do their work for free because they have the knowledge from that book as part of their training set?
It seems a little preposterous but that's where this is headed logically. It's shifting from "You can't reproduce this book." to closer to "You can't use the knowledge in this book except under the conditions we dictate." That's dangerous.
LLMs generally do not reproduce text. They can be made to do so with specifically crafted prompts but no current LLM is just going to regurgitate "Free as in Freedom" unless asked to do so. Instead it will use statistical matching to apply the text to probable matches, a very crude version of what we do. LLMs are starting to approach the way we meat sacks use books. We take in the information and then we apply it to problems. Where do we cross the line? Where do we say anything (or anyone) who is trained on (has read) this material is now required to do their work for free because they have the knowledge from that book as part of their training set?
It seems a little preposterous but that's where this is headed logically. It's shifting from "You can't reproduce this book." to closer to "You can't use the knowledge in this book except under the conditions we dictate." That's dangerous.