Statistically speaking, the room full of monkeys, given an infinite timeline, might eventually type Shakespeare.
True, and irrelevant.
A LLM-AI simply regurgitates what it found in its city-sized database.
Incorrect.
Incorrect.
intelligence (n)
the ability to acquire and apply knowledge and skills.
it does not make decisions on its own
Demonstrably and idiotically incorrect.
You're arguing that the sky is neon black- what angle are you going for?
it's not working to solve world hunger without any human input of any kind...
This is just completely wrong.
Though I certainly wouldn't test it- without alignment training, I'd say it's about equally as likely to move to end human hunger via genocide.
it's not writing the next great American novel when it's not busy regurgitating 'how to solve long division'.
They're quite capable of writing a novel. Also composing music. Designing a building.
It responds to queries however it's programmed to, that's it.
Incorrect.
It is not programmed to do shit.
It responds to a set of tokens by turning them into N-dimensional vectors, and running it through a network that was randomly initialized and tuned to turn those into a certain set of output probabilities via gradient descent. The solutions it comes up with for doing so are literally bounded only by the standard Turing limits- limits a human has not ever been demonstrated not to have.
It's a simulation of a conversation, even when it "hallucinates", that's it.
I really think you might just be an idiot.
The only way you can define "conversation" to make what the LLM does a simulation, and what you do real, is by specifically defining "conversation" in anthropocentric terms- i.e., saying it's only a conversation if a human does it.
That's circular and idiotic. Are you an idiot?
So, being that it's a computer, it's legal for it to cough up entire sections of text from The Stand and not pay royalties to Stephen King?
Is it legal for a set of monkeys to do so?
Right, you don't have a right to perform a copyrighted work, but because it's a computer/cell phone mostly used in the privacy of your home, it's legal for it to spew copyrighted info without ever paying for it.
In short, yes.
Of course- you don't really use it in the privacy of your own home. I can tell from your general level of ignorance on the topic that you aren't the kind of person that can afford to. Sure, you might be able to run a lobotomized model on your 3060, or some shit, but you're just playing.
"It didn't copy material": Your words...
Correct.
"To be clear: Reproducing exact texts is a training failure. It's a mistake." That implies copying...
Incorrect. It means the embedding vectors were trained to the point of single descent, without going to double descent, where generalization happens.
Are you seriously trying to argue that anything that can produce a set of text has copied it?
That not only doesn't meet the legal definition, it doesn't even meet fucking logical muster.
if it can reproduce exact texts, that means it has a copy of a book in the database someplace.
That is absurd, and incorrect.
If I can produce the digits of pi, does that mean I have an exact copy of it some place?
Same thing with the code for a text box...
Wrong.
it was trained (read: crawled) the web
The problem here, is that you don't actually know what the word "trained" means in this context. You're using it how you understand it, but you have the understanding of a 5 year old, and it's leading you to misuse it.
most likely including GitHub... it doesn't create new code, it spews what it was trained on.
Demonstrably false.
You're out of your depth.