I've been making analogies trying to get you to understand my point and it's not working. This will be my last attempt.
Paraphrasing your first post:
1. Over 2 billion images got processed into 2 gigabytes of data (I'm just using your numbers here, the specific numbers don't really matter for what I'm trying to convey)
2. That's an average of approximately 1 byte per image
3. Therefore each source image contains 1 byte of creativity
4. Therefore the source images are not deserving of copyright protection
And here is what I'm saying:
#1 and #2 are ok. The numbers don't lie.
#3. No. A bunch of images were processed and resulted in a 2 GB dataset, and you can compute that's an average of 1 byte per source image, but it does not follow that the original images only had one byte of creativity. The original images maybe only contributed 1 byte, on average, to your dataset but that has everything to do with the algorithm used to create that dataset, and cannot be used to infer the creativity that was involved in creating the original images because the algorithm wasn't designed for that. The logical error is saying that because you averaged their contributions to 1 byte each, that all 2 billion source images were equally non-creative ("less than 1 byte of originality" each!).
#4. No. The false conclusion in #3 leads to another false conclusion here. You say something correct like "It is a human judgement as to how much creative effort was put in" and then erroneously assume that your computation in #3 of 1 byte per source image has anything to do with a human judgement of how creative the source image is. They're not related. Even if someone came up with an algorithm where the input is a source image and the output is a creativity score, that's not how judges evaluate copyright protections in court today.
If your "creative work" contains less than 1 byte of originality, how can that even qualify for copyright protection in the first place?
Because those copyrights are judged by other people, who don't know or care about stable diffusion, or machine learning, or AI, or whatever. They just apply the law the way it's already written. Those images existed before someone created the 2 GB dataset. If any of them are copyrighted, that already happened using a process that has nothing to do with your computations here. I don't know why that's so hard to understand. Maybe the rules will change one day but right now it's the way it is.
Only frequently-repeated motifs, and even then, only to a limited degree
Certainly more than one byte. Try "fox playing cards", it uses hand-drawn elements of foxes and cards there that take a lot more than one byte to represent digitally.
It can barely even reproduce a US flag, and that's everywhere
Irrelevant. The current limitations of the copyright-infringing technology doesn't make it any less copyright infringing, when the source image used is copyrighted. The US flag is in the public domain, but if another source image is used, and it's copyrighted, and the output looks sufficiently similar to the original (that's up to a judge to decide, not up to you) then it might be considered a derivative work, and if you're selling that derivative work for money (via Shutterstock) you are violating the original author's copyright. It doesn't matter how you generated that image.
If it's a copyrighted image, then pretty much by definition it's not going to be a common motif
That depends on if the people who put together the dataset took care to mark copyrighted images appropriately. Try "charlize theron walking in forest", and the images of the woman in the forest look enough like actual photos of Charlize Theron that a professional photographer who took those photos can recognize them and maybe sue for infringement. Your explanation doesn't satisfy here.
The average image contributes less than a byte to the weightings.
Irrelevant. You can run an algorithm on someone's work and if that algorithm ends up using only one byte of that work in its intermediate data representation, it doesn't mean there wasn't anything original in the work. It's just how that algorithm works. And you already agreed above that some "common motifs" take more than one byte to represent, and I've already shown that some of those "common motifs" come from copyrighted images. The intermediate data representation used by the algorithm doesn't have any bearing on whether the source image would be considered creative or original or worthy of copyright protection by the people who decide these things.
If your creativity can be summed up in less than 1 byte, then you have offered no relevant creativity.
I agree with you. The values in the range [0, 255] are well-known.
But where we disagree is 1) whether the algorithm being discussed here actually does that, and it's obviously using more than one byte of people's original work or else that work wouldn't be recognizable in the output and there wouldn't be a lawsuit; and 2) whether an algorithm processing a bunch of images has any bearing on the creativity in the original images.
Consider an algorithm that sorts through a bunch of images, compute the overall brightness of an image, and for each source image stores a single bit -- 0 if it's overall a dark image, and 1 if it's overall a light image. Every image can be reduced to one bit, but that doesn't mean anything about the creativity in the original image. The algorithm wasn't designed to measure that. An even simpler one is to process the images and store one bit showing whether the image fits in a 1024x1024 box or not. Again, the one bit stored by the algorithm has no bearing on the creativity in the original image.
Yes, these are very simple example algorithms, nothing like stable diffusion. But none of them, including stable diffusion, are designed to measure the creativity in someone's work. You are reading too much into the data used by image generating algorithm. It wasn't designed to measure creativity, it was designed to generate new images based on existing ones. The one byte, or whatever, doesn't mean anything about the creativity in the original images.
If you're arguing that a JPEG which contains less than 1 byte of image data - aka, an 8x8 box of static - violates your copyright,
I'm not saying that. I'm saying if you reduce a JPEG image to one byte, it's a one-way trip. You can't reproduce the original image from that. So if the only contribution of the original image to your dataset is that one byte, it's not likely to be an issue. But if your output looks a lot like the original image, then obviously more than one byte of it got into your dataset.
then you're claiming that your image has no more creativity than an 8x8 box of static.
No, that's you're claim not mine. You're saying the original image only contributed one byte of creativity. I'm saying if you reduce an image to one byte, and you can't reproduce the original image from that one byte, then that one byte has no bearing on the creativity of the original image. The one byte in your dataset is not a measure of the creativity of the original image.
It is a human judgement as to how much creative effort was put in
Exactly. And a judge doesn't decide that by counting up bytes of originality.
but nobody is going to grant a copyright on a work that could be summed up in less than one character
If someone is suing for copyright infringement, the judge is going to consider the original work and the alleged infringement and the judge will apply some criteria, which are somewhat subjective, to decide if it's actually infringing or not. Looking at whether a stable diffusion dataset and counting up bytes of originality isn't part of that process. Whatever the algorithm does with that intermediate data is not relevant.
You also can't restore an image from the weightings dataset alone. pixelz.ai has limited free play with some different algorithms, including stable diffusion. The results obviously reuse significant parts of the original images.
Nobody is trying to sue people for hashing images.
Exactly. They are suing because the generated images contain significant bits of copyrighted images for which they didn't give a license.
You mentioned some number as the size of the weightings dataset, and then made a claim about the amount of originality in the sources based on that, and this is wrong.
In addition to the hash analogy I made, you can think about whether reducing the quality in a JPEG and being able to save a version of the image in a much smaller size (as small as you can) means there is less originality in the source image. It doesn't mean that. You can't go the other way. Something was lost, obviously.
I'll try to answer your question again:
If your "creative work" contains less than 1 byte of originality, how can that even qualify for copyright protection in the first place?
Copyright protection was invented long before computers, academic work on information theory, and the movie Office Space were invented/written/produced. Therefore "bytes of originality" and "pieces of flair" are not requirements to qualify for a copyright.
If someone were to propose an amendment to the law in that direction, how will we set a minimum? Is 37 enough? Does it need to be compared to the sum of all work produced so far, even work you're not aware of, and get a high enough score to qualify? Will it be a moving target and we'll need another AI to tell us for each work if it qualifies for copyright protection?
less than 1 byte of originality
I can hash the same image data into a single value of 16, 20, 32, 48, or 64 bytes depending on which algorithm I use. Does that mean they each have less than one bit of originality, in average? No, it doesn't. You're mixing up what the algorithm does with what humans value, but they are not the same.
Applications that access restricted user data (encryption or access control enforced by the platform or remote system) have these two options:
Ask for the password (or other credential) every time. This might be practical for a command line tool where it doesn't keep any state between invocations, but it's not practical for desktop or web applications where the user interacts with them for more than one action. Asking the user to provide a credential on every action they take is annoying, and will train the user to mindlessly input or authorize use of their credential on every prompt, which is a bad habit that can also contribute to security incidents.
Ask for a password (or other credential) once and then exchanging it for a limited time token to be used for subsequent requests. This generally solves the usability issue (depends on how it's implemented and configured) but introduces new issues that must be considered, such as how long is the token valid, can it be renewed (and if so what is the maximum duration of access), can it be revoked, should there be an idle timer to automatically revoke the access, is it valid only from the same client IP address that initially requested it or from any IP address, etc.
Each applications may have different needs so the default behavior and what users are allowed to customize and to what extent can all vary from application to application. Using a computer with full disk encryption, signing in to a website and getting a session cookie, using OAuth, or SSO to the employee portal on a company network, these are all different.
If your complaint that a refresh token in some or most systems allows clients to maintain access "forever and a day", that's a complaint about configuration of those systems and not the concept of access tokens and refresh tokens as a whole. It's like saying all passwords are bad because some applications use or store them insecurely, or mobile phones are bad because you don't have reception everywhere, or that employment is bad because some employers don't have good benefits.
can be anonymously transacted
You can be anonymous with cash, gloves, and other precautions.
Perfect anonymity is hard. People have been identified who used cash, and also in cryptocurrency if someone is able to link your public key to your identity, your past transactions may also be exposed via the blockchain. That's also happened.
virtually immune from inflation
This was never true, it was a bogus claim from the beginning. Inflation is a general increase in prices and fall in the purchasing value of money.
Inflation can happen in an economy when important resources become scarce -- whatever currency is being used will become inflated as people charge more for those scarce resources and that causes a cascade of other prices increasing. When this happens it doesn't matter if your currency is based on a metal, or a government IOU, or cryptography and consensus protocols.
As you mentioned, any currency is only as valuable as participants in a transaction agree, so when the value of Bitcoin falls from $60,000 to $20,000 that's a general 3x increase in prices for anyone using Bitcoin. You can compare that to the dollar's fluctuations, or to what happened in Zimbabwe, but the fact is that Bitcoin and other cryptocurrency values do fluctuate and this happens because people decide the value.
Your "mathematical certainty" that only a certain amount of Bitcoin will be minted is also subject to the people involved deciding to change the rules. Changing that number is just a code change, and if enough people agree to it they redefine Bitcoin, and if not enough people agree they can fork it and define a new Bitcoin-2 with whatever rule changes they want.
a currency that another buyer in the marketplace can print at will for free
The government is not just "another buyer" in a dollar marketplace. When the government prints more dollars, and specifically when it prints substantially too many dollars, there is definitely a price and the consequences are an ongoing concern here.
Ideally the government would act responsibly but the government is made of people, and those people come and go, and some are more responsible than others, so it's not a certainty and people have to keep watch.
The vast majority of people using Bitcoin wouldn't be able to use it independently of the core network of people driving it -- developers, operators, and influencers. So the role of government is replaced by a different network of people that everyone else has to trust.
Imagine being so stupid as to advocate for... the dollar
When there's something better, I'll advocate for it. Cryptocurrency, at least in it's current forms, isn't better.
1) not invented here, 2) not enough XML involved in that spec, 3) it's nearly two decades old so there must be a better and more modern way of setting a bit by now, 4) it's strictly binary and this time around we want to make it more inclusive, 5) junior developer we recently hired recommended doing this with React instead of vanilla TCP/IP, 6) bits can be flipped so it should be securely recorded on the blockchain instead plus we could create an entire NFT economy from this to replace that archaic insurance policy, 7) it will be better with AI this time, 8) that old RFC is merely an April fool's joke, but now we're serious!!
Isn't that information that just goes against the government narrative?
Misinformation is information that is false or inaccurate. It's not defined by who is disseminating it. There are people who claim anything opposing their own agenda is misinformation... And that's also misinformation. You can find some of these people in government, and you can also find them outside of government.
You have to apply critical thinking if you want to defend yourself against misinformation. If you think it's only government, or that it's never government, you've already been manipulated. Wake up and recheck your beliefs. Or don't.
#2 should be rephrased as people who have too many distractions at home. Hating your family can certainly be a distraction but there are many others. Noises and interruptions from spouse, kids, pets, nearby construction work, loud music at neighbors, people ringing the doorbell, sirens from emergency vehicles, etc.
Why do you say nobody cares? Haven't you read any articles about it? At least all those authors care, plus many of their readers. Maybe that's still a minority of people compared to the general public, but it's certain that some people care.
That said, you're right that winning such a lawsuit cannot fix all this person's problems that were mentioned in the summary, but maybe he can get some compensation and maybe more people will care.
I don't know much about it, but it seems to me that 1) water level being lower would expose more of the stone, and it's possible some of the commemoration of the worst droughts would be etched into the lowest part of the stones that were exposed at that time, but underwater during lesser droughts; and 2) geologists can get more information about past water levels from the exposed ground, and it's not just about whether the stones were visible, because there can still be a significant range of "how bad was it".
You're saying they have no evidence but the article was light on details, and they probably have lots of evidence that wasn't mentioned, that they used in their analysis. Maybe there is or will be a paper published with all of it for people to review. But let's for a moment pretend there isn't any evidence about how bad the drought was in past years. In that case, on what do you base your assertion that it's about as bad as a "once-in-50-years" drought?
Staff meeting in the conference room in %d minutes.