Ten Dropbox Engineers Build BSD-licensed, Lossless 'Pied Piper' Compression Algorithm

Slashdot is powered by your submissions, so send in your scoop

Ten Dropbox Engineers Build BSD-licensed, Lossless 'Pied Piper' Compression Algorithm 174

Posted by Soulskill on Friday August 28, 2015 @04:15PM from the what-if-i-don't-have-a-wind-instrument-handy dept.

An anonymous reader writes: In Dropbox's "Hack Week" this year, a team of ten engineers built the fantasy Pied Piper algorithm from HBO's Silicon Valley, achieving 13% lossless compression on Mobile-recorded H.264 videos and 22% on arbitrary JPEG files. Their algorithm can return the compressed files to their bit-exact values. According to FastCompany, "Its ability to compress file sizes could actually have tangible, real-world benefits for Dropbox, whose core business is storing files in the cloud."The code is available on GitHub under a BSD license for people interested in advancing the compression or archiving their movie files.

This discussion has been archived. No new comments can be posted.

Ten Dropbox Engineers Build BSD-licensed, Lossless 'Pied Piper' Compression Algorithm

Load All Comments

Search 174 Comments Log In/Create an Account

Comments Filter:

From TFA: bit-exact or not? (Score:5, Interesting)

by QuietLagoon ( 813062 ) writes: on Friday August 28, 2015 @04:22PM (#50412317)

...Horn and his team have managed to achieve a 22% reduction in file size for JPEG images without any notable loss in image quality....
Without any notable loss in image quality.
.
Hmmm... that does not sound like "bit-exact" to me.

Share
twitter facebook
- Re: (Score:2)
  
  by suutar ( 1860506 ) writes:
  
  bit-exact is easier to test than "image quality". I suspect a less than tech-savvy reporter heard "no loss" and stuck in "notable".
  - Re: (Score:2)
    
    by lq_x_pl ( 822011 ) writes:
    
    SSIM does a pretty decent job though
- Re: (Score:3, Interesting)
  
  by JoeMerchant ( 803320 ) writes:
  
  If you are viewing images on an LCD monitor, the first thing you can do is strip them down from 24bit color to 18bit color, because your sub-$1000 monitors don't display more than 6 bits per color channel.
  - Re: (Score:3)
    
    by mentil ( 1748130 ) writes:
    
    Even cheap TN monitors use FRC to interpolate to 8-bit, which is better than nothing. IPS monitors can be had for $120, with an 8-bit color panel. Several gaming monitors use native 8-bit with FRC to 10-bit for less than $800, and a few even use native 10-bit.
    - Re: (Score:2)
      
      by JoeMerchant ( 803320 ) writes:
      
      Sorry, I was probably off on the price point - technology has moved on. Still, it wasn't a widely advertised fact that almost all "gaming" LCD monitors sold before IPS were 6 bit, or 6 bit with "dithering" which is not really much better.
    - - Re:From TFA: bit-exact or not? (Score:4, Interesting)
        
        by pla ( 258480 ) writes: on Friday August 28, 2015 @06:34PM (#50413189) Journal
        
        Interpolation is WORSE than nothing. you're discarding signal then adding noise in the hopes that it matches up with what should've been there kinda okay.
        
        1, 2, 3, X, 5, 6. Guess the value of X... Congratulations, you just interpolated the right answer.
        
        In the case of what the GP described, though, it works out even better than that, because the panel actually "knows" the right answer, so it hasn't "thrown away" information; it just lacks the luminance resolution to display it. It can, however, interpolate in the temporal domain way, way faster than the human eye can tell, to create a color we perceive as the correct value.
        
        / Go ahead, twitch gamers, tell us all about your ability to resolve sub-millisecond 1.5% color changes. XD
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by Lisandro ( 799651 ) writes:
        
        1, 2, 3, X, 5, 6. Guess the value of X... Congratulations, you just interpolated the right answer.
        
        Cool. Was it 3782?
        
        Re: (Score:2)
        
        by serviscope_minor ( 664417 ) writes:
        
        I get x=10
        My interpolating formula is roughly:
        0.10323x^6 + -1.66775x^5 + 9.56461x^4 + -22.37135x^3 + 14.13956x^2 + 16.90876 x -15.67704
      - Re:From TFA: bit-exact or not? (Score:5, Informative)
        
        by Megol ( 3135005 ) writes: on Friday August 28, 2015 @07:47PM (#50413481)
        
        Interpolation isn't about adding noise.
        6 bit (per component) LCDs have for at least 10 years and probably much longer used dithering techniques to produce effective 16.2M colors (compared to a true 8 bit panel with 16.7M colors). This works very well for almost all use cases and provides smooth gradients but have the disadvantage that some image patters can produce flashing due to interference with the dithering algorithm.
        Dithering isn't about adding noise either BTW.
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by subreality ( 157447 ) writes:
        
        Dithering isn't about adding noise either BTW.
        "Dither is an intentionally applied form of noise used to randomize quantization error..." -- https://en.wikipedia.org/wiki/... [wikipedia.org]
        It's also unrelated to interpolation ("...a method of constructing new data points within the range of a discrete set of known data points..."). No new data points are generated - the monitor knows the exact RGB values it wants to display; instead, it's about doing the best job presenting them within the limits of the hardware.
        Regardless, your original point is still correct: 6-bit
        
        Re: (Score:2)
        
        by thegarbz ( 1787294 ) writes:
        
        One of the great ways Wikipedia can contradict itself.
        They call dithering noise
        They describe the process of dithering as adding a pattern.
        They describe noise a stochastic and therefore random phenomenon.
        All three can't be right, and in this case it's the first. Dithering can be achieved by adding noise or by adding engineered patterns and in visual processing it's usually the latter. In audio processing it's usually the former.
        
        Re: (Score:2)
        
        by serviscope_minor ( 664417 ) writes:
        
        Quantization is also noise. Quantization + dithering doesn't necessarily add more noise overall.
        Interpolation is something else entirely and can replicate the exact fourier spectrum (i.e. be noiseless).
        
        Re: (Score:2)
        
        by Khyber ( 864651 ) writes:
        
        "Quantization + dithering doesn't necessarily add more noise overall."
        Have you ever used node-based texture generation? Quantization + dithering = INSANE FUCKING AMOUNTS OF NOISE.
        
        Re: (Score:2)
        
        by serviscope_minor ( 664417 ) writes:
        
        No I haven't (I don't know what node-based texture generation is) but my comment was "not necessarily", not "never".
        Here's some code (octave) which generates a signal, quantizes it to 10 levels with and without dithering. If you run it, you'll see that you start getting substantial extra noise from dithering below about 1E-3. I've put noise in too, and 1e-3 corresponds to noise with a scale of 0.03, which is much less than the quantization error of 0.1
        xs = -1000:1000; % A signal ys = min((xs/50).^2, 100) / 1
  - Re: (Score:2)
    
    by UnknownSoldier ( 67820 ) writes:
    
    That's complete nonsense and easy to disprove.
    Your are claiming only 18-bit color - a total of (2^6)^3 = 262144 colors; or 2^6 = 64 colors for primary RGB gradients. That would mean every 4 colors out of 256 primaries you wouldn't be able to tell the difference! Since one can easily tell the difference between:
    0xFF, 0xFE, 0xFD, 0XFC, 0xFB
    That means your claim is complete bullshit.
    QED.
    - Re: (Score:2)
      
      by JoeMerchant ( 803320 ) writes:
      
      Depends on your monitor, of course, but a whole (recent) generation of "LCD gaming screens" only showed 6 bits of color depth:
      http://compreviews.about.com/o... [about.com]
      Also, even when you show people the bottom 2 bits, they usually don't perceive them:
      http://rahuldotgarg.appspot.co... [appspot.com]
      - Re: (Score:2)
        
        by UnknownSoldier ( 67820 ) writes:
        
        And the specific manufacturers and models listed with 18-bit are listed where again???
        Oh wait, they aren't.
        Stop spouting bullshit. At least link to an article with hard facts and a timestamp.
        
        Re: (Score:2)
        
        by AaronW ( 33736 ) writes:
        
        This information is often transmitted over EDID from the monitor to the host computer. Some graphics cards can use this information to automatically turn on and configure temporal dithering. The Linux nVidia driver can do this with the nvidia-settings utility. It will also report what the monitor is actually capable of.
        
        Re: (Score:3)
        
        by UnknownSoldier ( 67820 ) writes:
        
        Look, it is real easy to prove whether a monitor is 24-bit or 18-bit:
        * https://imgur.com/XF3LBOz [imgur.com]
        Do you see Mach banding in the rows? (Easiest to tell in the greens and grays)
        Yes - your monitor is 24-bit
        No - your monitor is 18-bit
        Show me proof of _any_ LCD monitors that are 18-bit.
        
        Re: (Score:2)
        
        by viperidaenz ( 2515578 ) writes:
        
        Cool, my laptop is 24 bit
        
        Re: (Score:2)
        
        by Khyber ( 864651 ) writes:
        
        Sweet. I see the mach banding all over the fucking place.
        Perhaps you should ditch your old 6-bit Apple monitors, sonny, and catch up with current technology.
        Even my MONOCHROME CRT shows all the mach banding. 24-bit GREYSCALE FTW.
        
        Re: (Score:2)
        
        by Chelloveck ( 14643 ) writes:
        
        The original poster wasn't clear. In the linked image the top half of each row of color is 24-bit. The bottom half of each row is 18-bit. So on a 24-bit display you should see color banding in the bottom half of each row, but not in the top half. I had to zoom way in before I realized that each row was split in half.
        
        Re: (Score:2)
        
        by Bengie ( 1121981 ) writes:
        
        I can immediately see the mach banding if I'm about 1.5' away, but if I sit back in my chair, about 2.5' away, I don't see it at all for the first few seconds, but it can be see after a few seconds. I do have a cheap $130 LED LCD 2ms gaming monitor. Very power efficient though, 23 watts.
      - Re: (Score:2)
        
        by AaronW ( 33736 ) writes:
        
        There are a couple settings in the nVidia tool for Linux to turn on the temporal dithering so it can be done in the graphics card when the monitor doesn't do it. It's easy to turn on in nvidia-settings.
      - Re: (Score:2)
        
        by UnknownSoldier ( 67820 ) writes:
        
        I said **primary RGB gradients**, as in, Red, Green, Blue or White for a reason.
        Quit changing the topic to steganography which is an apples to oranges comparison and no one gives a shit about _that_ to tell if your monitor is 24-bit.
        
        Re: (Score:2)
        
        by Khyber ( 864651 ) writes:
        
        "no one gives a shit about _that_ to tell if your monitor is 24-bit."
        As I said before, get with real fucking technology. Catch up, you're way the fuck behind. 10-bit (that's 30-bit A-RGB colorspace) 4K monitors, S-IPS, 28" for $600.
        Yawn. As I told you before, we're not in the days of your shitty Apple displays.
    - Re: (Score:2)
      
      by Bruce Perens ( 3872 ) writes:
      
      There used to be a web page called "Your Eyes Suck at Blue". You might find it on the Wayback machine.
      You can tell the luminance of each individual channel more precisely than you can perceive differences in mixed color. This is due to the difference between rod and cone cells. Your perception of the color gamut is, sorry, imprecise. I'm sure that you really can't discriminate 256 bits of blue in the presence of other, varying, colors.
- Re:From TFA: bit-exact or not? (Score:5, Informative)
  
  by danielreiterhorn ( 4241309 ) writes: on Friday August 28, 2015 @04:42PM (#50412471)
  
  I'm the author of the algorithm and it's bit-exact. It has no quality loss. I just committed a description of the algorithm https://raw.githubusercontent.... [githubusercontent.com] It is bit exact and lossless: you can get the exact bits of the file back :-)
  
  Parent Share
  twitter facebook
  - Re: (Score:3)
    
    by thesupraman ( 179040 ) writes:
    
    OK, As you are the author..
    Care to comment to the performance and window length of your encode/decode?
    As of course there is an innate difference between algorithms that must run streaming (for example... h264) and ones
    that can consider all of the content - the same for computational complexity - for video to be useful it must decode in
    real time on 'normal' machines.. Memory footprint for the compression window also matters a lot..
    My guess is that your decode overhead is not high, but you need a LOT of memor
    - Re: (Score:2)
      
      by thesupraman ( 179040 ) writes:
      
      And just to reply to myself.. it is generally a BAD idea to imply you have an encoding method better than arithmetic (lets hope
      the article horrible miss quoted you there..
      'yet it is well known that applying an additional arithmetic coder to existing JPEG files brings a further 10% reduction in file size at no cost to the file," he says. "Our Pied Piper algorithm aims to go even further with a more efficient encoding algorithm that maps perfectly back to existing formats."'
      As of course it is a numerical impo
      - Re:From TFA: bit-exact or not? (Score:5, Informative)
        
        by danielreiterhorn ( 4241309 ) writes: on Friday August 28, 2015 @05:49PM (#50412913)
        
        We also use arithmetic coding...but the gist of the improvement is that we have a much better model and a much better arithmetic coder (the one that VP8 uses) than JPEG did back then. I tried putting the JPEG arithmetic coder into the algorithm and compression got several percent worse, because that table-driven Arithmetic Coder just isn't quite as accurate as keeping counts as the VP8 one.
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by thesupraman ( 179040 ) writes:
        
        You seem to be confused as to what arithmetic coding is..
        What you seem to be talking about is the accuracy of the token counts being used to drive the arith coder.. arithmetic coding says nothing about those, except that they have to exist.
        Beating a given implementation? of course, there are several ways..
        But claiming to have better arithmetic coding itself is silly, what you have is better token distribution figures.
        Want to pony up some estimates on performance and memory requirements?
        
        Re: (Score:2)
        
        by AaronW ( 33736 ) writes:
        
        That jives with my experience when I took a class that covered compression back in college. The professor, Glen Langdon held a bunch of patents at the time on arithmetic coding. Encoding efficiency could be improved by having it forget old data and making it more dynamic as I recall.
        
        Re: (Score:2)
        
        by fnj ( 64210 ) writes:
        
        That jives with my experience
        It is always jarring to find a college grad who is not fluent in the difference between such common words as jive and jibe.
        "Harlem jive is the argot of jazz."
        "Your belief does not jibe with reality."
      - Re:From TFA: bit-exact or not? (Score:4, Interesting)
        
        by Cassini2 ( 956052 ) writes: on Friday August 28, 2015 @06:08PM (#50413025)
        
        The grandparent poster is talking about compressing videos. If something is known about the data being encoded, then it is trivial to show that you can exceed the performance of arithmetic coding, because arithmetic coding makes no assumptions about the underlying message.
        For instance, suppose I was encoding short sequences of positions that are random integer multiples of pi. Expressed as decimal or binary numbers, the message will seem highly random, because of the multiplication by an irrational number (pi). However, if I can back out the randomness introduced by pi, then the compression of the resulting algorithm can be huge.
        The same applies to video. If it is possible to bring more knowledge of the problem domain to the application, then it is possible to do better on encoding. Especially with real-life video, there are endless cheats to optimize compression. Also, Dropbox may not be limited by real-time encoding. Drop-box might not even need intermediate frames to deal with fast-forward and out-of-order viewing. Dropbox may be solely interested in creating an exact image of the original file. Knowing the application affects compression dramatically.
        Lastly, application specific cheats can save real-world companies and individuals money and time. Practical improvements count as advancements too.
        
        Parent Share
        twitter facebook
        
        Re:From TFA: bit-exact or not? (Score:5, Insightful)
        
        by Bruce Perens ( 3872 ) writes: <bruce@perens.com> on Friday August 28, 2015 @08:06PM (#50413559) Homepage Journal
        
        Rather than abuse every commenter who has not joined your specialty on Slashdot, please take the source and write about what you find.
        Given that CPU and memory get less expensive over time, it is no surprise that algorithms work practically today that would not have when various standards groups started meeting. Ultimately, someone like you can state what the trade-offs are in clear English, and indeed whether they work at all, which is more productive than trading naah-naahs.
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        Well said.
        
        Re: (Score:2)
        
        by Solandri ( 704621 ) writes:
        
        Given that CPU and memory get less expensive over time, it is no surprise that algorithms work practically today that would not have when various standards groups started meeting.
        I remember when the preliminary JPEG standard first showed up in the early 1990s, a 640x480 8-bit GIF would decode and display in about a second on my PC. A 640x480 24-bit JPEG took about 30 seconds. JPEG's strength back then was its much smaller file size. Aforementioned GIF was about 200 kB, while the JPEG was about 35 kB wit
        
        Re: (Score:2)
        
        by Khyber ( 864651 ) writes:
        
        "Welcome to the real world. This has been looked at many times, and the questions that matter are well established."
        And were answered about ten years ago when we got better fucking technology.
        I've bothered to compile the source package and look at the straight-forward code. You very fucking obviously have not.
    - Re:From TFA: bit-exact or not? (Score:5, Interesting)
      
      by danielreiterhorn ( 4241309 ) writes: on Friday August 28, 2015 @05:47PM (#50412897)
      
      Very insightful comments... let me go into detail
      I would say we have several advantages over H.264
      a) Pied Piper has more memory to work with than an embedded device (bigger model)
      b) Pied Piper does not need to seek within a 4 Megabyte block (though it must be able to stream through that block on decode) whereas H.264 requires second-by-second seekability (more samples in model).
      c) Pied Piper does not need to reset the decoder state on every few macroblocks (known as a slice), whereas H.264 requires this for hardware encoders (again, more samples per model).
      d) As opposed to a committee that designed H.264, Pied Piper had a team of 10 creative Dropboxers and guests, spending a whole week identifying correlations between the decoder state and the future stream. That is a big source of creativity! (design by commit, not committee)
      Our algorithm is, however streaming---and it's happiest to work with 4 MB videos or bigger
      Our decode window is a single previous frame--so we can pull in past information about the same macroblock-- but we only work in frequency space right now (there are some pixel space branches just to play with, but none has yielded any fruit so far) so the memory requirements are quite small.
      We are doing this streaming with just the previous frame as our state--- and it may matter--but we have a lot of work to do to get very big wins on CABAC... but given that we're not limited by the very small window and encoding parallelization requirements that CABAC is tied to, Pied Piper could well be useful soon!
      
      Parent Share
      twitter facebook
      - Re: (Score:3)
        
        by thesupraman ( 179040 ) writes:
        
        Its good that you understand that bold claims require clear evidence.. Thank you for replying.
        It is not surprising you can compress h264 using a 4mb block and token decode/recode, because of course that means you are using more resources than it (as you state) and removing functionality..
        I refer you to the following, hopefully you are aware of it..
        http://mattmahoney.net/dc/text.html
        Perhaps you should try your core modeling/tokenising against that, then consider how the ones that beat you do so.. not as an i
        
        Re: (Score:2)
        
        by danielreiterhorn ( 4241309 ) writes:
        
        Fair, but not all streaming use cases require seeking within a 4MB block (depends on the application). For those applications that require sub-4MB seeking, this won't be a good algorithm.
        
        Also there is a branch off the master repo that is exactly "h264 that is extended to use similar resources." (branch name h264priors) So yes--great idea.
        h264priors does pretty well, but not quite as good as the master branch--we're still getting to the bottom of how it does on a representative set of videos-- this is a
        
        Re: (Score:2)
        
        by thesupraman ( 179040 ) writes:
        
        I would really REALLY suggest you spend a little more time researching those other compressors you so easily consider to be 'text streams', they are not.
        for example, one of them also happens to hold the current record for non lossy image compression..
        Its all a matter of feeding them the right models, and I can guarantee that a good PPM or CM set of models will do much better than a weeks worth
        of model development - but of course they reason they WILL is because they take care of the downstream details - the
  - Re: (Score:3)
    
    by Punto ( 100573 ) writes:
    
    That's nice but did you ever find out what is the optimal way to jerk off all those people?
    - Re: (Score:2)
      
      by JWSmythe ( 446288 ) writes:
      
      I believe this was covered in the documentary [youtube.com].
  - Re: (Score:2)
    
    by AmiMoJo ( 196126 ) writes:
    
    So, correct me if I'm wrong, but you are basically fixing a few known limitations of JPEG and mobile recorded video files.
    For example JPEG uses RLE, and for decades we have been able to shave about the same as you do off their size by replacing that with a more efficient compression scheme in a lossless way. Mobile recorded video makes similar compromises to reduce processing overhead.
    To be clear, you have no invented a really new, revolutionary compression algorithm like the TV show. No 4k uncompressed vid
    - Re:From TFA: bit-exact or not? (Score:5, Interesting)
      
      by danielreiterhorn ( 4241309 ) writes: on Friday August 28, 2015 @06:07PM (#50413021)
      
      No one has tried to undo and redo compression of video files before. There are still doom9 forum posts asking for this feature from 12 years ago. I would say that saving lossless percentage points off of real world files is novel and important. And, since it's open source, if someone else gets more %age improvement than what we have, it could become as transformative as you describe.
      But the point is that we have something that's currently useful. It's out there and ready to be improved. It's lossless. And it has never before been tried.
      Also we did the entire algorithm in a week and aren't out of ideas!
      Besides we never claimed it was a revolution--leave that sort of spin to the marketeers...
      we're engineers trying to make things more efficient, a few percentage points at a time :-)
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by AmiMoJo ( 196126 ) writes:
        
        Sure, I'm not saying it isn't useful, it certainly is... But Pied Piper, really?
        It's a good project, no need to over-sell it.
        
        Re: From TFA: bit-exact or not? (Score:2)
        
        by sonicmerlin ( 1505111 ) writes:
        
        Stop whining jesus.
        
        Re: (Score:2)
        
        by SirSlud ( 67381 ) writes:
        
        May I interest you in a tall glass of perspective?
      - Re: (Score:2)
        
        by serviscope_minor ( 664417 ) writes:
        
        No one has tried to undo and redo compression of video files before.
        Great job! I've thought about it before, as clearly have others given those doom9 posts. I'm glad to hear it works well and that someone's done it! It sounds like you were going for 4mb blocks if I gather correctly?
        How much do you gain/lose going to 8 or 2mb block.
      - Re: (Score:2)
        
        by Khyber ( 864651 ) writes:
        
        "No one has tried to undo and redo compression of video files before."
        I'm sorry, that's just nonsense. What do you think a format converter does?
  - - Re:From TFA: bit-exact or not? (Score:5, Funny)
      
      by dskoll ( 99328 ) writes: on Friday August 28, 2015 @04:54PM (#50412571) Homepage
      
      Compress his comment and all the redundancy will be gone.
      
      Parent Share
      twitter facebook
  - - Re: (Score:2)
      
      by Khyber ( 864651 ) writes:
      
      "I guess that the problem is that this sounds like a PR stunt: Ten engineers at Dropbox do what many more engineers have been doing for years and only in a week!"
      Technology advances. What took a dedicated team of people a couple of years to do 20 years ago (2D game design) takes maybe a month or two at most with a couple of people now days. MY current 2D game is proof of that. There's two bits of code to fix and then it's ready to release.
- Re: (Score:2)
  
  by gtwrek ( 208688 ) writes:
  
  Both the summary and the article are a little light on details, however the article mentions replacing, (or extending) the arithmetic (lossless) encoder - i.e. Huffman - used within the JPEG and H264 standards.
  This would result in a lossless reduction in size of those files.
  Again, short on details. Any size reduction claims are sorta hand wavy without more details.
  But I'd think the loss-less label (or bit-exact) are ok in this context. Loss less from Jpeg -> DropJpeg.
  - Re: (Score:3, Informative)
    
    by danielreiterhorn ( 4241309 ) writes:
    
    This is an excellent summary and spot on! Our movie reduction claims are still early on. We'll need to find a more comprehensive set of H.264 movies to test on--and that requires the algorithm to understand B-slices and CABAC. These are both very close, but the code was only very recently developed. We're confident about the JPEG size reduction, however. If you want to learn more about how the JPEG stuff works, you can start with the open source repository from Matthias Stirner here http://www.matthiasst [matthiasstirner.com]
    - Re: (Score:2)
      
      by fredgiblet ( 1063752 ) writes:
      
      I'm curious to know if you've tried this on a video compressed using the H.264 lossless compression settings.
- - Re: (Score:2)
    
    by davester666 ( 731373 ) writes:
    
    There's always somebody who will say they can tell the difference between the original file and the compressed/decompressed copy...
    - - Re: (Score:2)
        
        by arglebargle_xiv ( 2212710 ) writes:
        
        Loseless PNG images just lack the warmth and vibrance of raw BMP images. Even RLE introduces noticable artifacts.
        That only works if you use a tube-based computer to do the decoding.
        
        Re: (Score:2)
        
        by Existential Wombat ( 1701124 ) writes:
        
        You need to use a high performance digital video cable for best results. I recommend Monster.
"Lossless" on lossy encodes? (Score:2)

by bradgoodman ( 964302 ) writes:

"22% better compression" without "notable" quality loss on files which are ALREADY compressed in formats in which loss may be apparent is a far cry from their ultimate "goal" of "lossless" compression.
- Bad Car Analogy (Score:2)
  
  by PPH ( 736903 ) writes:
  
  We put a spoiler on a Prius.
  - Re: (Score:2)
    
    by tepples ( 727027 ) writes:
    
    We put a spoiler on a Prius.
    Which one? "The Lone Gunmen are dead"? "Snape kills Dumbledore"?
- bah, I've got it down to 50% compression: (Score:3)
  
  by Thud457 ( 234763 ) writes:
  
  switch (rand % 2) { case 0 : /* here's some pr0n */; break; case 1 : /* here's a funny cat pikshur */; default : /* are you're really sure you aren't looking for some pr0n? */ }
- Re: (Score:2)
  
  by SirSlud ( 67381 ) writes:
  
  There is nothing confusing here. And yet you've managed to be completely confused.
naysayers are missing the point (Score:5, Informative)

by Ionized ( 170001 ) writes: on Friday August 28, 2015 @04:35PM (#50412417) Journal

comparing this to PNG or h.265 is missing the point - this is not a compression algorithm for creating new files. this is a way to take files you already have and make them smaller. users are going to upload JPG and h.264 files to dropbox, that is a given - so saying PNG is better is moot.

Share
twitter facebook
- Re: (Score:2)
  
  by Dahamma ( 304068 ) writes:
  
  Except unless standard DECODERS can handle them it's fairly useless in practice.
  From what I can tell from the source & description posted it does NOT conform to H.264, so what's the point? SOMETHING has to decode it, and it's clearly not going to be standard hardware decoders. So it's useless as CDN storage. Same applies to PNGs for most usage.
  And besides, H.265 implements everything they did and MUCH more. And if you want even further lossless compression that humans can't notice there are proprie
  - Re: (Score:3)
    
    by mobby_6kl ( 668092 ) writes:
    
    It's not useless - it can be decoded by dropbox when serving the files.
    Seriously, it's that simple: users upload existing files to dropbox, they get loslessly compressed by this algorithm, and decompressed on access. Bam.
  - Re: (Score:2)
    
    by SirSlud ( 67381 ) writes:
    
    Jesus dude. Dropbox controls the in and the out of the pipe. So their client can compress further on upload and decompress when downloading/streaming. I don't understand how a simple business case can be so confusing for people.
Stacker (Score:2)

by HalAtWork ( 926717 ) writes:

Time for the new wave of Stacker clones, maybe a new DoubleSpace err DriveSpace?
- Re: (Score:2)
  
  by denis-The-menace ( 471988 ) writes:
  
  I'd settle for Mr. 7-Zip (Igor Pavlov) to add this method to his program.
  With a BSD license, he should be able to do it. (Time permitting)
Can it compress 3d videos? (Score:4, Funny)

by leipzig3 ( 528671 ) writes: on Friday August 28, 2015 @04:51PM (#50412547)

Can it compress 3d videos? That seems to be a real challenge.

Share
twitter facebook
- Re: (Score:2)
  
  by danielreiterhorn ( 4241309 ) writes:
  
  If you organize the pixel data of the 3d movie into a spiral, then the algorithm my 9 colleagues and I put together will operate on it "middle-out". This can allow us to compress movies with a Weissman score that's off the chart!
- - Re: (Score:2)
    
    by Anne Thwacks ( 531696 ) writes:
    
    I have a more efficient system: put them on VHS tapes and shove them under a road roller!
Image/Video libraries (Score:2)

by phorm ( 591458 ) writes:

I wonder if somebody can develop this into a transparent kernel-module.
13-22% of a video library could mean saving several hundred GB on a multi-terabyte collection. Depending on if it decompresses on-the-fly and how hard it is on a CPU, it may also reduce disk I/O somewhat.
- Re: (Score:2)
  
  by godrik ( 1287354 ) writes:
  
  Indeed, CPU cost is a real problem. The key in a compression algorithm for a network storage server is to be able to perform compression/decompression without much impact on latency and bandwidth. Also in these days of massive server farm, the impact on energy consumption might be interesting to see; but it is difficult to measure directly as compression might result in less disk spinning, machine kept on, but more CPU usage.
  An interesting engineering problem overall.
- Re: (Score:2)
  
  by AaronW ( 33736 ) writes:
  
  This would be better suited for FUSE.
Call me back ... (Score:2)

by quenda ( 644621 ) writes:

when they have made Nip Alert a reality.
This isn't really a new thing. (Score:2)

by An Ominous Cow Erred ( 28892 ) writes:

Lossy codecs typically have two major stages -- the lossy parts (e.g. dct while throwing out some component frequencies, motion prediction, etc.) -- followed by lossless entropy coding (e.g. Huffman in JPEG) to further compress the resultant data.
These compression algorithms just decompress the lossless part of the process and then recompress it with a more efficient lossless algorithm. On decompression, it then recompresses with the standard algorithm. In some cases (e.g. JPEG) you can keep a copy of the H
- Re: (Score:2)
  
  by Dahamma ( 304068 ) writes:
  
  Postprocessing software like Beamr (look it up yourself...) can often do even better for video. Basically the H.264 codecs are fairly conservative on their quantizers, with a minimum that's way above what they could get away with. Way better off throwing away useless data than figuring out how to compress it.
- Re:Real Numbers? (Score:4, Insightful)
  
  by HornWumpus ( 783565 ) writes: on Friday August 28, 2015 @04:31PM (#50412391)
  
  How much CPU time to compress/decompress. Standard compression is hardly the best, just a good compromise between compression and usability.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Funny)
    
    by Anonymous Coward writes:
    
    Meh, doesn't matter. Any processing load will be moved to an unoptimized javascript implementation that runs in the end users browser.
- Re: (Score:2)
  
  by Dracos ( 107777 ) writes:
  
  Exactly. We need to see the Weissman score.
- Re: (Score:3)
  
  by Bengie ( 1121981 ) writes:
  
  Would be nice to compare it against PNG, but the context is if you're storing other people's data and you have no control of what format they use.
- Re: (Score:3, Insightful)
  
  by harrkev ( 623093 ) writes:
  
  And yet you can download the source code yourself and compile it.
- Re: (Score:3, Informative)
  
  by danielreiterhorn ( 4241309 ) writes:
  
  Link to a layman's description of the algorithm here: https://raw.githubusercontent.... [githubusercontent.com] It's bit exact and lossless. We haven't done comprehensive studies, but on the included test files it gets 13% compression on H.264 movies. Similarly the not-committed, but similar JPEG algorithm gets 22% on a comprehensive sample set of photos from a variety of devices.
  - Re: No description (Score:2)
    
    by thePig ( 964303 ) writes:
    
    Superb work, Danielreiterhorn . Amazing work, and amazing, providing it as open source.
    Would you mind if I ask for the motivation to put it as open source?
    When it provides 10-20% compression, it would be worth a bit of money, right. In such a case why are you keeping it under BSD licence?
    I am in awe of people who do great things without expecting anything in return. Because try as I may, I can never be truly altruistic. So, I try to pick the brains of the ones who are to really understand their motivations
    - Re: (Score:3, Interesting)
      
      by danielreiterhorn ( 4241309 ) writes:
      
      It depends if the goal is to a) market a hip algorithm or b) store movies more efficiently.
      
      Open source makes it easy for anyone to contribute to the algorithm.
      The more people contribute, the better the code will be at compressing movies.
      
      The better it is at compressing movies, the fewer resources it will take to store them.
      This isn't a zero-sum game we're talking about: it's about making the world a more efficient place, one bit at a time.
      
      But the bottom line is that, it's a lot easier for many org
- Re:No description (Score:4, Insightful)
  
  by ottothecow ( 600101 ) writes: on Friday August 28, 2015 @06:13PM (#50413059) Homepage
  
  Yeah, but I've got to say that it is nice to see a bunch of comments actually talking about the compression algorithm.
  The tiny bit of slashdot community that is left still talks about the actual things. If this were on Reddit, it would just be a stream of lame, overused references to the Silicon Valley show. Somebody would say "This guy fucks". Somebody else would make a joke about "Optimal tip-to-tip efficiency". Then somebody would ask "Do you know what tres commas means".
  Those things were hilarious when put forth by a group of comedic actors. They are incredibly lame when they are overused every single time something even comes tangentially close to referencing them.
  So while this particular story still sucks...it could be a lot worse.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by AuMatar ( 183847 ) writes:
    
    This guy fucks.
    Had to be done.
- Re:Hard to believe (Score:4, Informative)
  
  by unrtst ( 777550 ) writes: on Friday August 28, 2015 @04:54PM (#50412575)
  
  H.264 and JPEG are supposed to output random-looking bytes, by definitions.
  If you can compress those, something is very wrong.
  Where'd you get that idea?
  $ bzip2 test.jpg
  $ gzip -9 test.jpg
  $ ls -la
  -rw-r--r-- 1 me me 1519279 Feb 7 2012 test.jpg
  -rw-r--r-- 1 me me 1430059 Aug 28 16:42 test.jpg.bz2
  -rw-r--r-- 1 me me 1427872 Aug 28 16:44 test.jpg.gz ... I also tried it on a max-compressed file. Opened that test.jpg up in gimp, then saved with quality at 0 (lowest), and re-did the compressing on both:
  -rw-rw-r-- 1 me me 189230 Aug 28 16:50 test2.jpg
  -rw-rw-r-- 1 me me 111623 Aug 28 16:50 test2.jpg.bz2
  -rw-rw-r-- 1 me me 117971 Aug 28 16:51 test2.jpg.gz
  Feel free to try the same experiment yourself on random jpg's you find online, or your own.
  The goal of H.264 and JPEG isn't minimum file size at all costs. It's also not encryption. Your premise is wrong, and even old tech can compress this stuff further than it may already be.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by sjames ( 1099 ) writes:
    
    Actually, perfect compression WILL make the output resemble random bits if looked at statistically. What OP was missing is that perfect compression is hard and that most compression features a number of compromises for CPU speed, memory requirements, seekability, resilience, etc.
  - Re: (Score:2)
    
    by Dahamma ( 304068 ) writes:
    
    The goal of H.264 and JPEG isn't minimum file size at all costs. It's also not encryption. Your premise is wrong, and even old tech can compress this stuff further than it may already be.
    True, but that's obvious to you and me - which does reinforce the point that the article & Dropbox "innovation" is pretty stupid.
    Not to mention JPEG and H.264 are old news - if you want to compare "new" development JPEG2000 and H.265 are the benchmarks...
  - - Re: (Score:3)
      
      by ttucker ( 2884057 ) writes:
      
      Try the -h param for ls, calculating is for computers.
      Not really useful in this context, because it truncates significant digits.
- Re:Hard to believe (Score:4, Interesting)
  
  by Kjella ( 173770 ) writes: on Friday August 28, 2015 @05:09PM (#50412657) Homepage
  
  H.264 and JPEG are supposed to output random-looking bytes, by definitions. If you can compress those, something is very wrong.
  Well, it seems to be applied per codec not a general compression algorithm like zip. And they probably say mobile-encoded for a reason, simple encoders have to work on low power and in real time, random JPGs from the Internet is probably the same. From what I can gather the algorithm basically take a global scan of the whole media and applies an optimized variable-length transformation making commonly used values shorter at the expense of making less commonly used values longer. Nothing you couldn't do with a proper two-pass encoding in the codec itself, the neat trick is doing it to someone else's already compressed media afterwards in a bit-reversible way. Very nice when you're a third party host, assuming the increase in CPU time is worth it but not so useful for everyone else.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Dahamma ( 304068 ) writes:
    
    And they probably say mobile-encoded for a reason, simple encoders have to work on low power and in real time,
    Actually, the encoders are rarely limited by power or CPU cycles. The decoders are, but the great thing about lossy encoding like JPEG/H.264/H.265 is the encoders can continually be improved without affecting the decoders.
    That said, the reason this article is pointless is you can't USE the results - it breaks H.264 standards so HW decoders can't handle it, and no one wants to decode some proprietary format on the fly to stream to standard H.264 decoders...
    - Re: (Score:2)
      
      by dbIII ( 701233 ) writes:
      
      it breaks H.264 standards so HW decoders can't handle it
      It's not pointless for dropbox, since they can store it compressed a bit more and decompress it when a user asks for it. It's also not pointless for software decoders such as VLC that have access to a bit more memory and CPU capability to deal with it.
      Two points is a bit more than pointless by my count.
      
      Along those lines it doesn't seem that long ago that arguments about using floating point in mp3 decoding was seen as a flaw.
- Re: (Score:3)
  
  by sribe ( 304414 ) writes:
  
  H.264 and JPEG are supposed to output random-looking bytes, by definitions.
  Bullshit. JPEG, *by its definition*, after the quantization step, uses a fairly modest & inefficient compression algorithm, because it was designed to be run on embedded systems with very modest processing power.
- - Re: (Score:2)
    
    by knorthern knight ( 513660 ) writes:
    
    > Weissman Score you fucking prick!
    You have a +10 Wiseguy Score.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

From TFA: bit-exact or not? (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:3)

Re: (Score:2)

Re:From TFA: bit-exact or not? (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re:From TFA: bit-exact or not? (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:From TFA: bit-exact or not? (Score:5, Informative)

Re: (Score:3)

Re: (Score:2)

Re:From TFA: bit-exact or not? (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:From TFA: bit-exact or not? (Score:4, Interesting)

Re:From TFA: bit-exact or not? (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:From TFA: bit-exact or not? (Score:5, Interesting)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re:From TFA: bit-exact or not? (Score:5, Interesting)

Re: (Score:2)

Re: From TFA: bit-exact or not? (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:From TFA: bit-exact or not? (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

"Lossless" on lossy encodes? (Score:2)

Bad Car Analogy (Score:2)

Re: (Score:2)

bah, I've got it down to 50% compression: (Score:3)

Re: (Score:2)

naysayers are missing the point (Score:5, Informative)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Stacker (Score:2)

Re: (Score:2)

Can it compress 3d videos? (Score:4, Funny)

Re: (Score:2)

Re: (Score:2)

Image/Video libraries (Score:2)

Re: (Score:2)

Re: (Score:2)

Call me back ... (Score:2)

This isn't really a new thing. (Score:2)