Comment Re:At least they are consistent (Score 1) 48
If I ask it for the lyrics to "Silver Springs" and it spits the lyrics out, that means it got the lyrics from one of the million+ websites it crawled or did audio recognition on a copy of the song posted someplace, either one isn't probably licensed by Warner Brothers or whoever... and I'm sure that WB didn't license OpenAI for this.
Nor did they need to.
AI training is already ruled fair use, which is logical, because you can't reasonably argue that the resulting network isn't just about as transformative as you can make it.
In the end, it boils down to... LLM-AI is quite a bit like the predictive text on your cell phone when you text someone (I type "I was going to say", and it gives me some possibilities for the next word)... just the LLM-AI is doing that on a much larger scale, referencing not just the last few texts I sent, but entire books and encyclopedias and whatever else you want to name.
No.
Text prediction is usually done with a simple Markov chain.
The LLM references precisely nothing coming up with your answer.
Your input tokens led to a path through the latent space to lead to the progressive generation of those tokens.
You could literally never hope to extract any sequence of tokens from that network without providing the correct input tokens and running the network.
Yeah, someone is gonna say there's hundreds of layers to it... it all boils down to predictive text. It's not intelligent, it's a big database of everything they could steal from every website out there that gets referenced for each user.
Wrong.
You can keep repeating yourself, but you'll never not be wrong.