Researchers Warn of 'Model Collapse' As AI Trains On AI-Generated Content (venturebeat.com) 159
schwit1 shares a report from VentureBeat: [A]s those following the burgeoning industry and its underlying research know, the data used to train the large language models (LLMs) and other transformer models underpinning products such as ChatGPT, Stable Diffusion and Midjourney comes initially from human sources -- books, articles, photographs and so on -- that were created without the help of artificial intelligence. Now, as more people use AI to produce and publish content, an obvious question arises: What happens as AI-generated content proliferates around the internet, and AI models begin to train on it, instead of on primarily human-generated content?
A group of researchers from the UK and Canada have looked into this very problem and recently published a paper on their work in the open access journal arXiv. What they found is worrisome for current generative AI technology and its future: "We find that use of model-generated content in training causes irreversible defects in the resulting models." Specifically looking at probability distributions for text-to-text and image-to-image AI generative models, the researchers concluded that "learning from data produced by other models causes model collapse -- a degenerative process whereby, over time, models forget the true underlying data distribution ... this process is inevitable, even for cases with almost ideal conditions for long-term learning."
"Over time, mistakes in generated data compound and ultimately force models that learn from generated data to misperceive reality even further," wrote one of the paper's leading authors, Ilia Shumailov, in an email to VentureBeat. "We were surprised to observe how quickly model collapse happens: Models can rapidly forget most of the original data from which they initially learned." In other words: as an AI training model is exposed to more AI-generated data, it performs worse over time, producing more errors in the responses and content it generates, and producing far less non-erroneous variety in its responses. As another of the paper's authors, Ross Anderson, professor of security engineering at Cambridge University and the University of Edinburgh, wrote in a blog post discussing the paper: "Just as we've strewn the oceans with plastic trash and filled the atmosphere with carbon dioxide, so we're about to fill the Internet with blah. This will make it harder to train newer models by scraping the web, giving an advantage to firms which already did that, or which control access to human interfaces at scale. Indeed, we already see AI startups hammering the Internet Archive for training data." schwit1 writes: "Garbage in, garbage out -- and if this paper is correct, generative AI is turning into the self-licking ice cream cone of garbage generation."
A group of researchers from the UK and Canada have looked into this very problem and recently published a paper on their work in the open access journal arXiv. What they found is worrisome for current generative AI technology and its future: "We find that use of model-generated content in training causes irreversible defects in the resulting models." Specifically looking at probability distributions for text-to-text and image-to-image AI generative models, the researchers concluded that "learning from data produced by other models causes model collapse -- a degenerative process whereby, over time, models forget the true underlying data distribution ... this process is inevitable, even for cases with almost ideal conditions for long-term learning."
"Over time, mistakes in generated data compound and ultimately force models that learn from generated data to misperceive reality even further," wrote one of the paper's leading authors, Ilia Shumailov, in an email to VentureBeat. "We were surprised to observe how quickly model collapse happens: Models can rapidly forget most of the original data from which they initially learned." In other words: as an AI training model is exposed to more AI-generated data, it performs worse over time, producing more errors in the responses and content it generates, and producing far less non-erroneous variety in its responses. As another of the paper's authors, Ross Anderson, professor of security engineering at Cambridge University and the University of Edinburgh, wrote in a blog post discussing the paper: "Just as we've strewn the oceans with plastic trash and filled the atmosphere with carbon dioxide, so we're about to fill the Internet with blah. This will make it harder to train newer models by scraping the web, giving an advantage to firms which already did that, or which control access to human interfaces at scale. Indeed, we already see AI startups hammering the Internet Archive for training data." schwit1 writes: "Garbage in, garbage out -- and if this paper is correct, generative AI is turning into the self-licking ice cream cone of garbage generation."
Nice to have it confirmed (Score:4, Insightful)
This is an obvious enough problem that people have been predicting it for a long time. It's still critical to have what was just a hunch confirmed by a serious study. The worrisome part is that AI generated stuff is rarely marked as such, so it will be very difficult for corpus curators to filter it out. It will be crucial to maintain an old, uncontaminated corpus.
Re: (Score:3)
It will be crucial to maintain an old, uncontaminated corpus.
Perhaps, but it will become dated quickly. In 2030, resetting to 2023 even to remove the discussed artifacts will be of limited utility.
Re: (Score:2)
but would it? Are we not already seeing this with ordinary human intelligence?
Sure there are areas in the hard science were progress is being made, however in the space of general knowledge and culture are we actually better off than we were before the commercial internet?
I think we have a ton of people whose intelligence have been trained on total crap they repeat on reddit, that other people feed on and repeat... In the span of three decades we have arrived at a point where 1/3 of the general public can't
Re: (Score:2)
"1/3 of the general public can't or won't say which sex in their species gives birth"
Care to back that up with an actual random poll or other research, or is this just an excellent example of somebody repeating crap they saw on the internet.
Hmmm. Wondering about the details (Score:2)
Presumably, each individual LLM instance could be designed to recognize and filter out its OWN output from its training data. But still would imbibe the output of different LLMs.
Then the premise that the information will get degenerate seems to assume that the percentage of hallucinated misinformation in the AI outputs is higher than the percentage of misinformation in a genera
Re: (Score:2)
If multiple independently trained LLMs learn from each others' output, wouldn't that initially maybe strengthen the knowledge bases?
No, for now. Right now there is nothing substantial to tell a LLM what it's saying is wrong. All it does is collect information and spit it out when asked. It doesn't associate ideas and concepts like you or I do.
As someone further up said, copy a photograph, then copy the copy, and so on. The system will degrade the more it is used. The same here. Since, as mentioned abov
Re: (Score:2)
With statistics from enough examples, the semantic s
Re: (Score:2)
Imagine they all have the same code, then being fed your output as an input would yield a perfect fit, meaning you wouldn't alter the model a single bit. So I would imagine that as time grows, the model just converges to something. With some noise, to account for new information.
Now they're not all the same. But they're probably similar enough that it doesn't matter.
Re: (Score:3)
It's the input (ie. training) data set (and the order it is encountered, to some extent) which gives each LLM model instance its unique character.
By different LLMs, I'm not even talking about whether the bog-simple neural net training and traversal algorithms are the same or slightly different. I'm talking about LLMs that have been trained on different (non identical corpuses) input data and/or different sequenc
Re: (Score:2)
Re: (Score:2)
If multiple independently trained LLMs learn from each others' output, wouldn't that initially maybe strengthen the knowledge bases?
The content produced is going to reflect the information encoded. That's true. That's the whole point. The problem is that content will also contain error. Remember that what is encoded by any model is just some of the information from the training data. (The model is necessarily imperfect.) In the absolute best case, which is extremely unlikely, a model trained exclusively on the output from another would be an equivalent model, error and all.
Then the premise that the information will get degenerate seems to assume that the percentage of hallucinated misinformation in the AI outputs is higher than the percentage of misinformation in a general corpus of human discourse. Which I would strongly guess is false.
You've misunderstood the problem. LLMs do not encode facts
Re: (Score:2)
So what about the case of multiple models? Absolutely nothing changes. Imagine training three models on different data sets and one model on all three.
Indeed: multiple models are in fat equivalent to one, larger model.
Re: (Score:2)
Neural net learning turns large numbers of examples which contain similarities (
Re: (Score:3)
you may be missing that it generally encodes (more heavily weights) the repeated patterns (i.e. arguably the important semantics) in the input (in the music case, the weighting distribution in the neural net comes to represent the "common musicality" of it
What you're calling "common" will drift. When you train a model, you only capture some of the information in the training data. What your model generates will also not be perfectly representative. You will lose information and you will introduce error with each incestuous generation. The model is guaranteed to degrade.
You don't have to take my word for you. Do the experiment I've suggested and see for yourself. It really won't take long. You could also try reading the paper [arxiv.org]. The results of both, as i
Re: Nice to have it confirmed (Score:2)
ãSconfirmed by a ~~serious~~ noise studyã
Can I fix that for you?
Re: (Score:2)
The resultant effect will be much like training humans on facebook.
The infinite photocopy problem (Score:5, Interesting)
Unless the original is perfect, every generation introduces new flaws on top of old ones.
Nature takes care of this with feedback mechanisms - random variation is curated by natural selection. Novel AI models need a selection mechanism outside of their base training model, to prune the bad results.
Of course, this is a problem because the whole point is to get the AI to do something so a human doesn't have to... but having a human tuning an AI model as it is formed results in an AI you can copy and use forever, so it's not really that big a barrier.
Re: The infinite photocopy problem (Score:3)
Before AI, was there no garbage on the internet?
Re: (Score:2)
And that was MY mistake. I meant "unless the copying is perfect".
Too funny (Score:4, Funny)
I'm not sure what this is called in computer terms, but in human terms this is what's known as a circle jerk.
Re: (Score:2)
Or "Groupthink".
Re: in human terms (Score:2)
What's the difference between circle jerk, echo chamber, and group think?
Re: in human terms (Score:5, Funny)
Linguistic incest (Score:5, Insightful)
I'm seeing a parallel here between biological inbreeding and its AI equivalent. Lower IQ and mental deficiencies often result from inbreeding, and it seems something analogous may happen with LLM's.
It would be cool if at some point this "incestuousness" also compromised the hardware that the LLM's are running on - kind of an "art imitates life" phenomenon. It won't happen, but it's fun to speculate.
Re: (Score:2)
There could be other parallels, like prion disease, but they only sound similar, like your incest analogy. Probably says more about you ;)
I'd say it's more like a steady diet of Fox News, quite literally. It produces SuperKendall models, forcing out everything that doesn't support the world view.
Re:Linguistic incest (Score:5, Insightful)
I'm seeing a parallel here between biological inbreeding and its AI equivalent.
I see an even more direct parallel between human dialogue and learning and AI model training. We increasingly see people trapping themselves in media bubbles and social media echo chambers where there is little or no correction from objective reality, and thinking based on distorted input produces even more distorted mental models which generate more distorted output... repeat ad absurdum.
The proliferation of media options and the ability of individuals to isolate themselves online in a group (albeit a group often containing millions of co-believers) has made this possible in many ways that it wasn't previously -- though obviously it did happen before in walled-off communities with little outside interaction, and has always and probably will always happen in some ways, even in the best of circumstances.
"Model collapse" seems like a good description of what happens to the brains of, say, cult members. Or Q-anoners, or flat earthers, or 9/11 truthers, or Twitter users (kidding... sort of).
Re: (Score:2)
Human dialogue is a great example. "I could care less" is a shining example of a stupid statement that should have been corrected, yet through lack of correction became part education itself. Now when you are surrounded by people who say "I could care less" you start to think it's correct and start using it yourself.
Re: (Score:2)
or Democrat party members
Re: (Score:2)
"Or Q-anoners, or flat earthers, or 9/11 truthers, or Twitter users (kidding... sort of)."
Oh, I see...you not at all a part of "a group (albeit a group often containing millions of co-believers)".
Much better to have one narrative, controlled by the right people (my side) for all. Right, comrade?
Re: (Score:2)
Am I the only one (Score:2)
who looked at the headline and thought it was to do with AI controlling Model Trains?
(Like HO scale)
Well no shit (Score:2)
Don't these things all have a random number generator in series with their output?
Anyone old enough to remember the days before room cancellation filters when the microphone got a little too close to the speaker? Same idea.
Re: (Score:2)
"Don't these things all have a random number generator in series with their output?"
I seem to remember that RAH thought that random numbers were important to AI
(Mike, Gay Deceiver, Minerva/Athena)
Re: (Score:2)
Not really, but often they amplify input noise. Feedback loops are a bitch.
I always viewed these things as (Score:5, Insightful)
Useful? Yes. But, fundamentally, it’s basically an automated way of treading old ground. For the time being, humans still need to expand the boundaries of knowledge. It’s gonna decimate the jobs that involve writing short amounts of text about already-done-stuff. In other words, a LOT of jobs.
Re: (Score:2)
That is pretty accurate. These things cannot create anything original. Sometimes averages can _look_ original (see stable diffusion), but then most art is already derivative, so it is not that visible what the machine actually does.
This means that these systems cannot generate new ideas. They can help with data collection in some cases and that is very useful. They will likely be able to do simple, no-insight white-collar jobs in the near future with good accuracy and that is indeed threatening a lot of job
Re: I always viewed these things as (Score:2)
Can they combine concepts in ways no one thought of?
Re: (Score:2)
By pure random chance, yes. But it cannot filter that for merit. So, say, every 1'000'000 hallucinations you may actually get some combination that is valid and has some merit. But somebody has to recognize and filter that, the "AI" cannot do it. This process does not work in practice.
Re: I always viewed these things as (Score:2)
Why can't an AI test its own insight? Do LLMs enable us to teach AIs to evaluate for themselves?
Re: (Score:2)
ChatAI has no insight. LLMs are purely statistical. It cannot be taught specific behaviors based on insight.
Re: (Score:2)
Why can't an AI test its own insight?
The lack of an appropriate mechanism.
Do LLMs enable us to teach AIs to evaluate for themselves?
Obviously not.
sounds like an opportunity (Score:2)
An approach to training that provides immunity to such degradation would be both conceivable and incredibly valuable. A nice patent opportunity.
Re: (Score:3)
It is Math. As in "even you exceptionally greedy scum cannot patent it".
No surprise (Score:5, Insightful)
ChatAI already messes it up frequently when trained on real data. Hence training it on data from ChatAI just amplifies the nonsense and reduced the depth of "understanding" it has even further.
Re: No surprise (Score:2)
How accurate is real data? Was misinformation a thing long before large language models?
Re: (Score:2)
How accurate is real data? Was misinformation a thing long before large language models?
Hello computer. Please tell me which came first LLM or Fox News.
Re: (Score:2)
This is a about a different thing. Overall, ChatAI is always _less_ accurate than its training data as it combines things without understanding or cross-checks. It also creates additional inaccuracies by combining things that cannot be combined. These effects add inaccuracies. Hence iterating the process corrupts more and more of the training data produced and used in every step.
Re: No surprise (Score:2)
If you give it the right prompts, can you learn a combination of concepts you never thought of before?
Re: (Score:2)
Irrelevant for the discussion at hand. And no. Because you cannot find the right prompts except by pure random chance.
Ooooh... do I detect a whiff of... (Score:3)
blockchain in the air ?
Re: (Score:2)
If by "Blockchain" you mean the use of signed timestamps on the data to prove its existence before a particular date...
Error correction (Score:2)
Just an idle thought, but since this is largely a statistical model, and the desire is to have 'correctness', could the errors not be used intentionally as an anti-seed to prevent future regressions along a similar path - similar to error correction encoding, of sorts?
Re:Error correction (Score:4, Insightful)
...could the errors not be used intentionally as an anti-seed to prevent future regressions along a similar path...
Potentially, but it would still require human labor to determine what is true and what is not. The LLM's are incapable of making that determination, and always will be.
LLM's cannot ever exceed their basic programming and purpose, no matter how much OpenAI wishes it. The current LLM craze is snake oil and Blockchain mixed together: total bullshit just waiting to collapse.
Re: (Score:2)
Potentially, but it would still require human labor to determine what is true and what is not. The LLM's are incapable of making that determination, and always will be.
"Always will be" is too strong. Well, I suppose maybe it's okay if you restrict your comment to LLMs, but there's no reason to believe that AI will always be less capable than humans at doing research to separate fact from fiction. Even with LLMs, it's hard to be certain just how capable they might become.
In any case, current LLMs are clearly incapable of doing it.
Re: (Score:2)
LLM's cannot ever exceed their basic programming and purpose, no matter how much OpenAI wishes it.
The limiting factor is computation itself. Simple machinery like GAs can accomplish anything with sufficient computational resources. It for example created people from dirt.
Neural networks can accomplish tasks with far less resources by exploiting learned experience to achieve results with less trials. Phase transitions have already occurred in large models where capabilities far in excess of linear expectation have emerged. Things nobody on earth had any a-priori clue would happen.
Likewise it has been
Re: (Score:2)
Likewise it has been demonstrated trained models can exceed the capabilities of the model by applying simple reflective techniques.
Can you elaborate? May be some references for a layman?
Re: (Score:2)
Can you elaborate? May be some references for a layman?
Here are two..
"For example, Reflexion achieves a 91% pass@1 accuracy on the HumanEval coding benchmark, surpassing the previous state-of-the-art GPT-4 that achieves 80%"
https://arxiv.org/pdf/2303.113... [arxiv.org]
"Experiments on three large language models show that chain-of-thought prompting
improves performance on a range of arithmetic, commonsense, and symbolic
reasoning tasks. The empirical gains can be striking. For instance, prompting a
PaLM 540B with just eight chain-of-thought exemplars achieves state-of-the-art
a
Re: (Score:2)
Likewise it has been demonstrated trained models....
That training is on data from human creativity. It makes perfect sense that neural networks trained on human data will increasingly reach a point where use of that data will become highly efficient. However, the training data represent the limits of what a neural network can do.
No neural network will ever transcend its training, as the NN is limited by the very nature of computation. You cannot compute inspiration. You can iterate over all possible combinations of a dataset, and neural nets may be able to m
Re: (Score:2)
The current LLM craze is snake oil and Blockchain mixed together: total bullshit just waiting to collapse.
Not being capable of becoming a true AI does not make LLM snake oil. Even it its current forms it has very real, practical and useful applications. Unlike blockchain which after nearly a decade is still looking for a problem to solve.
LLMs are not capable of creativity they only summarises and repackage the information they already have. However that kind of pointless busy work consumes a scary amount of our human brains and is already potentially capable of surpassing many human jobs.
Ha Ha, ePrions (Score:2)
...eat it up, bots!
Also why AI generated code will always suck. (Score:2)
PAY ATTENTION, MIDDLE MANAGEMENT: AI that can read and write can't solve society's literacy problems. At some point, near the top of the knowledge chain, there has to be someone who actually knew what they were doing, or nothing will work and nobody will even be smart enough to notice, let alone fix it. There would be an inevitable downward spiral of deterioration of competence and understanding across the board, no matter how slightly you angle the spiral. Deal with it.
Re: Also why AI generated code will always suck. (Score:2)
Are you making the argument for anarchy?
Not a journal! (Score:3)
Signal to Noise Ratio (Score:2)
This is the classic signal to noise ratio. When AI starts generating noise and feeding itself back as the signal you end up with more noise.
Elementary school gossip game (Score:2)
In grade 2 our class were taken outside and lined up. A teacher whispered a phrase into the ear of the person at one end of the line, said to pass the phrase on to the next person in line then remember what one heard, and repeat it later when asked. The phrase was "rubber baby buggy bumpers". It survived recognizably through maybe 6 tellings.
In grades 3 and 4 arithmetic, and in all later grades, we were told "show your work". A fuzzy correlator cannot do this: Bender's "stochastic parrot". Neither can
Re: Elementary school gossip game (Score:2)
How is this slashdot article and comments not another example of noise outcompeting any signal?
Re: (Score:2)
Sometime in the not-too-distant future (Score:3)
Human: Is it OK to marry my cousin?
AI: Boy howdy! Go to town.
Human: Is there anything else you can tell me?
AI: That depends. Would you like to make moonshine or become the king of Spain?
"Blah" (Score:2)
"Just as we've strewn the oceans with plastic trash and filled the atmosphere with carbon dioxide, so we're about to fill the Internet with blah.
Yeah... "about to"... What, we're going to litter on our landfill?
Paper does not address real world usage (Score:2)
Expectation of cascading generational loss is not reasonable as humans are not likely to accept resulting auto-generated garbage.
In the real world those using tools like SD to produce imagery are likely to spend time developing prompts, iteration and spot editing so the resulting imagery reflects what they expect.
Likewise for automated document writing a human is likely to read/review material to make sure it has the required quality and meaning.
Whatever amount of laziness is induced by better tools enabli
Re: (Score:2)
We've already seen a real world case where AI delusions were submitted unchecked to a court in a legal filing.
AI output is not going to be double checked consistently. At best the degeneration will just take a little longer.
Re: Paper does not address real world usage (Score:2)
As Theodore Sturgeon would say, 90% of everything is crap.
AI generated outputs will likely be no different. From what I have seen, a lot (and I do mean a lot) of it is generic porn. Actual decent quality outputs still do take some level of time that is noticeably higher then the bargain bin trash that fills the subreddits, ergo they will likely never outnumber them in the long-run.
Boy if we thought the internet was filled with garbage, it is going to get much worse.
Re: (Score:2)
We've already seen a real world case where AI delusions were submitted unchecked to a court in a legal filing.
We've seen a real world case of it not being acceptable to humans.
AI output is not going to be double checked consistently. At best the degeneration will just take a little longer.
My argument is there is a floor to what people are willing to accept. This applies concurrently to both the automation tools themselves as well as work products.
GIGO still applies (Score:3)
Garbage in, Garbage out
Re: (Score:2)
Garbage in, Garbage out
Actually we already have Data in Garbage out, so it will become Garbage in even more Garbage out.
Sentience (Score:2)
So this is what AI sentience looks ike?
Echo chambers and cults. (Score:2)
How do people not already know that Frank Herbert was right? It's beyond obvious.
Re: Echo chambers and cults. (Score:2)
Are you calling for a jihad? Because it sounds like you're calling for a jihad...
Thermodynamics (Score:2)
The stupidity of humankind is unimaginable (Score:2)
I give you the story of how Belgian drivers became one of the worst in western Europe. Young drivers were allowed to learn driving from their parents. Who learned how to drive from their parents and forgot everything that's useful for safe driving. And those parents also learned from their parents. Who didn't really learn how to drive, they just got in a car and drove (because there was no official license).
And now you have to do resear
...& IP holders breathe a sigh of relief (Score:2)
So, what's likely to become valuable now is detecting how "pure", i.e. completely human generated, content is
So? (Score:2)
Scientists right now train on faked scientific papers written by other 'scientists' with fake statistics, beautified results and other crap.
So, it's like Reddit? (Score:2)
Yes models can access data after 2021 for processing, you just wouldn't use it for training.
Here we are now... (Score:2)
GIGO (Score:2)
Sounds to me like the old "garbage in, garbage out" problem. Old is what's new!
Vindicates (Score:2)
AI is NOT IMMUNE to Garbage IN = Garbge OUT
Re:Confusion (Score:5, Insightful)
Go find a copy machine.
Make a copy of a photograph.
Make a copy of the copy.
Do that about 5 times.
Compare what you started with against what you ended with.
You'll understand the problem referenced here pretty quickly.
Re: Confusion (Score:2)
How do you know the original was any better to start with?
Re: (Score:2)
Its the age old problem repeating itself -> Garbage in, Garbage out (exponentially)
Re: (Score:2)
Re: (Score:3, Insightful)
Agism adds so much to the conversation. Kudos, SuperKendall. All your heroes are boomers, though.
"AI is going to kill us how, and how that would work exactly."
Imagine an AI telling us that we have natural immunity to a disease we've never seen but as lethal as ebola, like you did with COVID only with a 100% death rate. In other words, by being as stupid as you but with the ability to convince people.
But those sounding the alarm are just people not in a position to profit obscenely. Everyone knows that i
Re: (Score:2)
Re: Confusion (Score:2)
The problem is the words AI. You do not have AI. We have LLM that create a search tree based on data. Modify the input data and the output changes in unpredictable weighs as you don't know what the weighted tree data is setup as.
A super intelligent AI is a long ways away. All current AI is generally stupider than an Ant and is less useful except for the exact trained data.
Need a new idea start by loading up new training data.
Re: (Score:2)
Re: (Score:3)
Oh I understand the problem described, better than you.
A bit presumptuous, but whatever.
Now do the other side of the coin where AI boomers claim AI is going to kill us how, and how that would work exactly.
Individualized human behavior is infinitely more complex than describing how an ML model trained on ML generated output could result in a badly trained model.
If I had to summarize from broad observation, I'd describe it as a generalized fear of how they'll maintain and even improve their current standard of living in a future with prevalent usage of AI systems, with intersections of both financial and privacy based manifestations. It's really no different than any other tech
Re: (Score:2, Troll)
Oh I understand the problem described, better than you.
+5 funny!
We don't call you StuporKenDull for nothing!
The guy who worships billionaires and masturbates to Apple products thinks he understands something! Ho! and he think he understand the problem better than someone who gave a clear example of the problem!
The hubris is too much! I'm going to die laughing!
Re: Confusion (Score:2)
It will end up playing banjo in the rural wetlands
I've explained this many times already. (Score:3)
This shouldn't come as any surprise. I've explained the problem on this site countless times already. This shouldn't be a revelation to anyone, least of all a properly credentialed AI researcher!
I've called AI generated content "poison" for future models for this exact reason. Here's the executive summary: Models are both necessarily imperfect and can't produce new information. If you train a new model on content produced by another model, the new model can, at best, contain the same information as wa
Re: (Score:2)
Please tell me what I am supposed to fear next, I'm getting a bit confused now.
The fact that you think everything you read is supposed to scare you means nothing should scare you anymore. Clearly your brain is already broken.
Re: (Score:2)
Will it also achieve perfect compression down to a single bit?