Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
AI

AI Language Models Can Exceed PNG and FLAC in Lossless Compression, Says Study (arstechnica.com) 57

In an arXiv research paper titled "Language Modeling Is Compression," researchers detail their discovery that the DeepMind large language model (LLM) called Chinchilla 70B can perform lossless compression on image patches from the ImageNet image database to 43.4 percent of their original size, beating the PNG algorithm, which compressed the same data to 58.5 percent. For audio, Chinchilla compressed samples from the LibriSpeech audio data set to just 16.4 percent of their raw size, outdoing FLAC compression at 30.3 percent. From a report: In this case, lower numbers in the results mean more compression is taking place. And lossless compression means that no data is lost during the compression process. It stands in contrast to a lossy compression technique like JPEG, which sheds some data and reconstructs some of the data with approximations during the decoding process to significantly reduce file sizes. The study's results suggest that even though Chinchilla 70B was mainly trained to deal with text, it's surprisingly effective at compressing other types of data as well, often better than algorithms specifically designed for those tasks. This opens the door for thinking about machine learning models as not just tools for text prediction and writing but also as effective ways to shrink the size of various types of data.
This discussion has been archived. No new comments can be posted.

AI Language Models Can Exceed PNG and FLAC in Lossless Compression, Says Study

Comments Filter:
  • Subject (Score:5, Interesting)

    by Artem S. Tashkinov ( 764309 ) on Thursday September 28, 2023 @02:09PM (#63883963) Homepage

    This is an interesting concept albeit almost completely useless and quite energy/resources wasteful at that.

    Outside PNG we have WEBP and JPEG-XL both supporting lossless compression and doing so sometimes several times better than PNG. AV1 also supports lossless compression in the form of AV1F but it's not optimized yet and loses to WEBP. VVC must support lossless image compression as well but AFAIK it's not yet implemented by any available encoder.

    And since they are classic compression algorithms they don't need a GPU to compress/uncompress images and several gigabytes (terabytes? petabytes? not sure what LLMs operate with) dictionaries to boot.

    And WebP/JPEG-XL are not even the best in this regard but they are quite efficient. There are experimental compression algorithms such as paq8px which takes ages to compress/decompress data but they are pretty much unbeatable.

    As for FLAC, it is not the most efficient audio compression algorithm either (e.g. OptimFrog compresses a whole lot better but it's very CPU intensive for both compression and decompression) but it has a very good tradeoff between speed and efficiency. I'm afraid this LLM when applied to audio/image compression will be as slow as molasses.

    • Re:Subject (Score:4, Insightful)

      by algaeman ( 600564 ) on Thursday September 28, 2023 @02:24PM (#63884013)
      AFAIK, png uses gzip internally. This is kinda important, since the device decompressing the data may only have 8k of memory, and yet needs to be able to extract that data without having to pick from 70 different compression mechanisms that this LLM may be using.
    • Rather than keep adding file formats (which gums up conversions), maybe have a format that allows a wide enough variety of rendering methods and put the onus on the compression engines to squeeze out better compression scores/results.

      For example, let's say there are 7 different ways to encode pixels/vectors for a given image type (or even image section, as one type may not fit all parts of an image well). While there may be more than 7 known, they are close enough to the 7 to not bother adding in the name o

    • by gweihir ( 88907 )

      This is an interesting concept albeit almost completely useless and quite energy/resources wasteful at that.

      Indeed. Probably just some people trying to keep the AI hype going so the dollars of the clueless keep rolling in.

      You are also completely correct that efficiency and speed matter very much.

      • Re:Subject (Score:5, Insightful)

        by Rei ( 128717 ) on Thursday September 28, 2023 @03:09PM (#63884131) Homepage

        I disagree - there very much are applications for extreme compression even at the cost of high computational loads. Transmission from spacecraft, for example. Very-long-wave transmission through water. Down here on Earth, GSM-by-satellite is in theory coming over the next few years, and the datarate on that is expected to be *awful* - enough for text, but not pictures or video (except at extreme low quality). Being able to squeeze down media by throwing compute at it is very much a useful task.

        And honestly, I'm not sure why people assume "neural network = incredibly wasteful". This particular one may be - they're basing it on a 70B parameter model designed for text, after all - but at the most basic level, there's nothing really inefficient about the logic processes used by neural networks, and they parallelize really well. I imagine you could still get great performance with a vastly smaller network (maybe quantized to 3 bits or so) optimized to the task.

        • by gweihir ( 88907 )

          Not really. These are not orders of magnitude better. They are just "somewhat" better.

          • by Rei ( 128717 )

            It's also a text-based LLM being used for something it's not remotely designed or trained for, and doing lossless compression. They're just showing it byte sequences and having it - *based on its training on text, not images* - guess what the next byte will be, and then coding the failures with arithmetic compression.

      • This idea is pretty much already the very definition of what an autoencoder is supposed to do.
    • by znrt ( 2424692 )

      my thoughts aswell. albeit a remarkable finding, the practical application seems unclear to me. first for cost reasons, such an engine is very costly to create and maintain to just be used as an utility or general purpose compressor for which we have alternatives more than enough. maybe for very specific applications where the amount of data is really huge it could make sense.

      second, besides the catchy tagline (that actually echoes a deep conceptual realization about language models, but in a different sens

  • 1. Is the compression genuinely lossless or merely perceptually identical?

    2. Will it work on any data, or merely data it has been trained on?

    3. Do you need to decompress to be able to see the results, or is the new data directly displayable/playable?

    • by gweihir ( 88907 )

      I strongly expect it will be far worse than PNG and FLAC on data it has not be trained on (so basically "almost all"). Overfitting is a bitch. Also probably will have massive, massive overhead in compression and maybe decompression.

    • If it's perceptually identical, who cares if it is genuinely lossless? Isn't any image made up of pixels inherently lossy?
      • "Image" is not always the same as "picture I took with a camera".

        "Image" could be human drawn art, camera picture, randomly generated, or not even in the human visible spectrum. Maybe it's IR security footage. Maybe it's graphs and charts from excel, or pdfs.

      • If you don't care about lossiness, then you can use JPEG or MP3 (or whatever your lossy compressor of choice is; there's lots of things better than those, but those are the ones people recognize immediately) and get something "good enough", lossy, and much better than the lossless equivalents for anything complex (PNG might win for a screenshot of squares, but an image of nature will be quite a bit smaller in default JPEG compression than in lossless PNG). If you're just saying it's a new lossy compressor t
    • (1) Unless they're lying, lossless means lossless.

      (2) Unless the summary is deceptive, it works on any A/V data. How well is another question.

      (3) If by "directly displayable" you mean "will serve as input for a DAC driving a physical monitor or speaker", the answer is no, which is the answer for every A/V compression scheme.

  • When comparing to FLAC, you need to measure the tradeoff between compression ratio and CPU time to decode. I'd also like to see a few other (more problematic) datasets used as source audio to see how consistent the compression ends up being. A dataset named LibriSpeech leaves me wondering how things will work on something nastier, like dubstep/harpsichord music.
  • Snap .. Aghate Power ..

    Michael Jackson/Dirty Diana: "Da geht der GÃrtner"
    Queen/Flash Gordon, Narator: "Gordon is alvie" people heard "Gurkensalat"
    Hot Chocolate / "Alle Lieben Mirko"

    Using AI what can go wrong .. when even humans sometimes fail.

    And I just don't want to think about when these codecs will record speaches and a Mr. Peter File is called. ("The IT Crowd" (Season 2 Episode 4) )

  • by PhrostyMcByte ( 589271 ) <phrosty@gmail.com> on Thursday September 28, 2023 @02:58PM (#63884103) Homepage

    LZ and other compression algorithms work by maintaining a dictionary of patterns to reference. Usually this is megabytes in size, some have a fixed dictionary some have one that evolves over the course of the data.

    It seems like the AI model in this case is being used as a gigabytes-sized fixed dictionary. The trick is you need to download the AI model too in order to decompress your files.

    • by evanh ( 627108 )

      This! It's a perfect demonstration of just how gimmicky LLM really is. Total waste of resources.

  • by Chelloveck ( 14643 ) on Thursday September 28, 2023 @03:17PM (#63884157)

    I think most of the commenters (so far) are missing the point.

    I don't think the point here has anything to do with finding a better compression algorithm. The article claims that compression and prediction are functionally equivalent. Given that fact, we can better understand how LLMs work by examining them as if they were compression engines, which we already understand pretty well.

    So forget all the posts about "this other algorithm compresses better" or "this is too expensive to be practical". It's not about practicality as a compression engine. It's about a practical technique for understanding something that's pretty abstract. It's science, not engineering.

    • by grmoc ( 57943 )

      Compression is prediction. This is true.

      The more predictable something is, the more easily it compresses.
      A.k.a. the lower the "entropy," the better the prediction.

      Knowing what patterns are most likely by being pre-fed things is a fabulous way to lower entropy.

      It is just surprising that this is an insight. It follows naturally from the definition.

    • by ceoyoyo ( 59147 )

      Equivalent might be a little too strong. The goal of a compression algorithm is to find a representation that is sufficient to reconstruct the input, but smaller than that input. The goal of a generative model is to find a representation that is sufficient to reconstruct samples from the input distribution but is smaller than an actual reasonable set of samples. Both need to identify and exploit structure in the input.

    • Not only that. Thanks to this study we now have a bound to how much generic images and sound can be compressed without loss and practically (in the real world, not in theory). This can serve, for example, as a benchmark for current improvements. Say you use a generic compression algorithm to pre-compute a gigabyte-sized dictionary using a set of images, for as long as it takes to teach a LM. What final compression ratios would we achieve? Would it be close to using a neural network? or would the network be
  • Compression effectiveness depends on the amount of state fed into the compressor.
    The more state fed in, the more compression is possible.

    One of the ways to "cheat" here is to already have a library/dictionary of things that you've "fed" the compressor and decompressor.
    If you have an image that was similar to one of the dictionary images, then you'll have lots of similarity, and a better compression ratio.

    These 'dictionaries' can take many forms, algorithms, prior data, or pre-processed data, such as weights

  • You mean having grad students spend hundreds of hours hand-tuning fractal comprression is no longer the most effective compression technique? Now what will the grad students do to fill their time?

    Should we be teaching Generative AI how to do fractal compression? It seems to have a lot of spare time... unlike me.

  • As long as the authors are training and testing on stock images in various databases, a sufficiently large AI model should be able to losslessly compress any image down to a handful of bytes (less than 100). The neural networks are fully capable of storing and reproducing known images.

    At a certain point, any AI compression algorithm boils down to an image recognition / database lookup algorithm, with the "details" hidden in the neural network model.

    It's really hard to know precisely what these researcher

  • by Great_Geek ( 237841 ) on Thursday September 28, 2023 @03:55PM (#63884273)
    Table 1 on Compression rates shows TWO different rates. Everyone is getting excited over "raw" compression rate that is pretty good; while ignoring the "adjusted" compression rate that is pretty bad.

    The adjusted rate takes into account the size of the models and Chinchilla 70B goes from 8.3% "raw compress rate" to 14,008.3% "adjusted compression rate".

    It's well known that "classical" compressors can be improved (by a lot) just by adding a pre-defined dictionary/model, but that would hugely bloat the programs so people don't. These LLM model do exactly that, so they are better. Nothing new.
  • This isn't a fair comparison. The amazing LLM "compression" is simply the LLM reproducing an image that it was been trained on and is recalling by reference. That's not compression the way most people think about it.

    There are already compression algorithms that require a large shared data library on the decompression side that is used improve efficiency. It can drastically improve on zip, arj, rar, .gz etc but that's sort of cheating. Compression algorithms that don't use a library must embed the equiva

  • ...On many different images, as well.
  • I can beat the pants off it with an md5 hash of all the images, but my compressor would be kind of a hefty download. ie, to what extent is the compressor full of "quotations", and thus a rather large download, which would make it impractical?

    • by isomer1 ( 749303 )
      Yes! Thank you! How come nobody talks about this? Don't they have to download/install the entire LLM to recover the compressed information? Or transfer back and forth to a cloud service to do the same?
  • ...can it beat off middle-out compression?
  • requiring a model instead of a key
  • One of the interesting features of LLMs is that they normally incorporate some random seed value so a defined input does not always return the same result. This means I can ping the ChatGPT-3.5-turbo API 50 times with the same prompt and get back 50 different results. My experience is that the results will be similar, but it's rare to get a word-for-word duplicate. I wasn't aware that was something you could turn off, and now I have to figure out how they did it. :)

  • Given the ImageNet is already JPEG compressed, is it possible their model just learned to discard the same information as JPEG?

  • These things aren't language models, they're information models trained on language. Someday we'll distill them down to their innermost core and suddenly be able to build information-integration/comprehension/analysis systems that behave like magic.

  • why do you have to keep reminding me you exist

The biggest difference between time and space is that you can't reuse time. -- Merrick Furst

Working...