Your implication here is that an imperfect or lossy copy isn't a
copy. I'd have to disagree with that.
I wasn't implying that. I was just saying, based on information
theory, these LLM can't store a perfect copy of all the training data.
That's it. It can't be reversed.
A hard disk may only theoretically store X TB of information and X*Y
TB with compression but in reality using side channel methods we can
extrapolate double or even triple that by looking at and contrasting
the the analog values and finding interference patterns left behind by
previously stored data.
I don't see the point of this. Technically there's close to infinite
"information" on the drive, just based on the organization of the
atoms, it's just that we didn't write it, so we don't know how to
access it or what to make of it if we could.
But aside from that we have lossy copies all over the place in life
and computing.
So you are saying the LLM has a lossy copy of the data where lossy
just means that some of data can be recovered. Well I guess I can't
disagree. The point of the research was to get the LLM to regurgitate
some of its training data. Therefore some of the data is effectively
stored. That's not surprising.
You are giving a circular definition by defining an average as giving
an average. An average is meant to record the significant data in the
value. It is an abbreviated summary of that information, in other
words, a lossy memorization.
No I'm using the word average in two different ways. You could
call it equivocation, but it's not really meant to confuse the issue,
and I don't think it confused you. I guess you're upset I hijacked
your example, but it was kind of silly. You playing semantic games is
also a bit silly.
Not really, they can only generate recombination of their training data.
I guess you could say that but it's pretty meaningless. Their
training data contains almost every word in existence, and they are
returning a combination of words...
Honestly, it sound like you have never used one of the
sophisticated LLM in a significant way. Ask it to do something silly.
I just asked GPT 3.5 to pretend it was from Shakespeare's time and
tell me how to clean my tankless water heater. That's probably the
first time that output text has existed.