ZeoSync Makes Claim of Compression Breakthrough 989
dsb42 writes: "Reuters is reporting that ZeoSync has announced a breakthrough in data compression that allows for 100:1 lossless compression of random data. If this is true, our bandwidth problems just got a lot smaller (or our streaming video just became a lot clearer)..." This story has been submitted many times due to the astounding claims - Zeosync explicitly claims that they've superseded Claude Shannon's work. The "technical description" from their website is less than impressive. I think the odds of this being true are slim to none, but here you go, math majors and EE's - something to liven up your drab dull existence today. Update: 01/08 13:18 GMT by M : I should include a link to their press release.
Current ratio? (Score:2, Interesting)
Re:Current ratio? (Score:3, Informative)
For truly random data? 1:1 at the absolute best.
Re:Current ratio? (Score:2, Redundant)
Re:Current ratio? (Score:5, Informative)
For lossless (e.g. zip, not jpg, mpg, divx, mp3 etc etc) you are looking at about 2:1 for 8-bit random, much better (50:1?) for ascii text (e.g. 7-bit non-random).
If you're willing to accept loss, then the sky's the limit, mp3 @ 128kbps is about 12:1 compared to a 44k 16bit wave.
Re: Information theory says 1 (Score:2, Interesting)
Re:Current ratio? (Score:5, Informative)
Bit-mapped graphic files (BMP) vary widely in compressibility depending on the complexity of the graphics, and whether you are willing to lose more-or-less invisible details. A BMP of black text on white paper is likely to zip (losslessly) by close to 100:1 -- and fax machines perform a very simple compression algorithm (sending white*number of pixels, black*number of pixels, etc.) that also approaches 100:1 ratios for typical memos. Photographs (where every pixel is colored a little differently) don't compress nearly as well; the JPEG format exceeds 10:1 compression, but I think it loses a little fine detail. And JPEG's compress by less than 10% when zipped.
IMHO, 100:1 as an average (compressing your whole harddrive, for example), is far beyond "pretty damn good" and well into "unbelievable". I know of only two situations where I'd expect 100:1. One is the case of a bit-map of black and white text (e.g., faxes), the other is with lossy compression of video when you apply enough CPU power to use every trick known.
Re:Current ratio? (Score:2)
The mathematical implications alone of such a breakthrough would be impressive. 100:1 compression of truly random data? Wow.
Re:Current ratio? (Score:2, Insightful)
how can this be? (Score:3, Informative)
Re:how can this be? (Score:4, Insightful)
Try compressing a wav or mpeg file with gzip. Doesn't work too well, becuase the data is "random", at least in the sense of the raw numbers. When you look at patterns that the data forms, (i.e. pictures, and relative motion) then you can "compress" that.
Here's my test for random compression
$ dd if=/dev/urandom of=random bs=1M count=10
$ du random
11M random
11M total
$ gzip -9 random
$ du random.gz
11M random.gz
11M total
$
no pattern == no compression
prove me wrong, please
Re:how can this be? (Score:2)
Re:how can this be? (Score:5, Funny)
So a perl programm can't be compressed?
Re:how can this be? (Score:5, Funny)
Re:how can this be? (Score:2)
* If the data was represented a different way (say, using bits instead of bytesize data) then patterns might emerge, which would then be compressable. Of course, the $64k question is: will it be smaller than the original data?
* If the set of data doesn't cover all possibilities of the encoding (i.e. only 50 characters out of 256 are actually present), then a recoding might be able to compress the data using a smaller "byte" size. In this case, 6 bits per character instead of 8. The problem with this on is that you have to scan through all of the data before you can determine the optimal bytesize...and then it still may end up being 8.
Re:how can this be? (Score:3, Insightful)
Shannon's work on information theory is over 1/2 a century old and has been re-examined by thousands of extremely well-qualified people, so I'm finding it rather hard to accept that ZeoSync aren't talking BS.
Re:how can this be? (Score:2)
Simple, it can't be (Score:5, Insightful)
1:100 average compression on all data is just impossible. And I don't mean "improbable" or "I don't belive that", it is impossible. The reason is pigeon hole principle, for simplicity assume that we are talking about 1000bit files, although you can compress some of these 1000bit files to just 10bits, you cannot possibly compress all of them to 10bits, as with 10 bits is just 1024 different configurations while 1000bits call for representations of 2 different configurations. If you can compress the first 1024, there is simply no room to represent remaining 2-1024 files.
So every loseless compression algorithm that can represent some files with other files less than original in length must expand some other files. Higher compression on some files means number of files that do not compress at all is also greater. Average compression rate other than 1 is only achiveable if there is some redundancy in original encoding. I guess you can call that redundancy "a pattern." Rar, zip, gzip etc. all achieve less than 1 compressed/original length on average because there is redundancy in originals : programs that have some instructions, prefixes with common occurance, pictures that are represented with full dword although they use a few thousand colors, sound files almost devoid of very low and very high numbers because of recording conditions etc. No compression algorithm can achive less than 1 ratio averaged over all possible strings. It is a simple consequence of pigeon hole principle and cannot be tricked.
Re:how can this be? (Score:3, Interesting)
Re:how can this be? (Score:2, Informative)
However, in truly random data such patterns will exist from time to time. For example, I'm going to randomly type on my keyboard now (promise this isn't fixed...):
oqierg qjn.amdn vpaoef oqleafv z
Look at the data. No patterns. Again....
oejgkjnfv,cm v;aslek [p'wk/v,c
Now look - two occurences of 'v,c'. Patterns have occured in truly random data.
Personally, I'd tend to agree with you and consider this not possible. But I can see how patterns might crop in random data, given a sufficiently large amount of source data to work with.
Cheers,
Ian
Re:how can this be? (Score:3, Informative)
Re:how can this be? (Score:3, Interesting)
Here's a proposal for a compression scheme that has the following properties:
1. It works on all bit strings of more than one bit.
2. It is lossless and reversible.
3. It never makes the string larger. There are some strings that don't get smaller, but see item #4.
4. You can iterate it, to reduce any string down to 1 bit! You can use this to deal with pesky strings that don't get smaller. After enough iterations, they will be compressed.
OK, here's my algorithm:
Input: a string of N bits, numbered 0 to N-1.
If all N bits are 0, the output is a string of N-1 1's. Otherwise, find the lowest numbered 1 bit. Let its position be i. The output string consists of N bits, as follows:
Bits 0, 1, ... i-1 are 1's. Bit i is 0. Bits i+1, ..., N-1 are the same as the corresponding input bits.
Again, let me emphasize that this is not a usable compression method!. The fun is finding the flaw.
Re:how can this be? (Score:5, Informative)
Re:how can this be? (Score:2)
As far as we can tell, the digits of Pi are random. They are also, however, based on mathematical relationships which can be modeled to find patterns in the digits. There are formulae to calculate any independent digit of Pi in both hexadecimal and decimal number systems, as well as known relations like e^(i*Pi) = -1.
Anyway, the press release says that the algorithm is effective for practically random data. I'm not sure exactly what this means, but I would guess that it applies to data that is in some way human-generated. Text files might contain, say, many instances of the text strings "and" and "the", no matter what their overall content. Even media files have loads of patterns, both in their structure (16 bit chunks of audio, or VGA-sized frames) and in their content (the same background from image to image in a video, for example). Even in something as complex as a high resolution video (which we'll take to be "practically random"), there are many patterns which can be exploited for compression.
Re:how can this be? (Score:5, Informative)
Well firstly I'd say the press release gives a pretty clear picture of the reality of their technology: It has such an overuse of supposedly TM'd (anyone want to double check the filings? I'm going to guess that there are none) "technoterms" like "TunerAccelerator" and "BinaryAccelerator" that it just is screaming hoax (or creative deception), not to mention a use of Flash that makes you want to punch something. Note that they give themselves huge openings such as always saying "practically random" data: What the hell does that mean?
I think one way to understand it (Because all of us at some point or another have thought up some half-assed, ridiculous way of compressing any data down to 1/10th -> "Maybe I'll find a denominator and store that with a floating point representation of..."), and I'm saying this as not a mathematician or compression expert : Let's say for instance that this compression ratio is 10 to 1 on random data, and I have every possible random document 100 bytes long -> That means I have 6.6680144328798542740798517907213e+240 different random documents (256^100). So I compress them all into 10 byte documents, but the maximum variations of a 10 byte documents is 1208925819614629174706176 : There isn't the entropy in a 10-byte document to store 6.6680144328798542740798517907213e+240 different possibilities (it is simply impossible, no matter how many QuantumStreamTM HyperTechTM TechoBabbleTM TermsTM) : You end up needed, tada, 100 bytes to have the entropy to possibly store all variants of a 100 byte document, but of course most compression routines put in various logic codes and actually increase the size of the document. In the case of the ZeoSync claim though they're apparently claiming that somehow you'll represent 6.6680144328798542740798517907213e+240 different variations in a single byte : So somehow 64 tells you "Oh yeah, that's variation 5.5958572359823958293589253e+236!". Maybe they're using SubSpatialQuantumBitsTM.
"practically random data" (Score:3, Funny)
I think I can beat their 100:1 compression ratio with this scheme.
Re:how can this be? (Score:4, Funny)
Re:how can this be? Answer: BitPerfectTM (Score:4, Insightful)
"Singular-bit-variance" and "single-point-variance" mean errors.
The trick is that they aren't randomly throwing away data. They are introducing a carefully selected error to change the data to a version that happens to compress really well. If you have 3 bits, and introduce a 1 bit error in just the right spot, it will easily compress to 1 bit.
000 and 111 both happen to compress really well, so...
000: leave as is. Store it as a single zero bit
001: add error in bit 3 turns it into 000
010: add error in bit 2 turns it into 000
011: add error in bit 1 turns it into 111
100: add error in bit 1 turns it into 000
101: add error in bit 2 turns it into 111
110: add error in bit 3 turns it into 111
111: leave as it. Store it as a single one bit.
They are using some pretty hairy math for their list of strings that compress the best. The problem is that there is no easy way to find the string almost the same as your data that just happens to be really compressable. That is why they are having "temporal" problems for anything except short test cases.
Basicly it means they *might* have a breakthrough for audio/video, but it's useless for executables etc.
-
100:1 ? I don't think so... (Score:5, Insightful)
compress(A) = B
Now, B is 1/100th the size of A, right, but it too, is random, right (size 100).
On we go:
compress(B) = C (size is now 10)
compress(C) = D (size 1).
So everything compresses into 1 byte.
Or am I missing something.
Mr Thinly Sliced
Re:100:1 ? I don't think so... (Score:5, Funny)
Re:100:1 ? I don't think so... (Score:3, Informative)
I'm very, very skeptical of 100:1 claims on "random" data -- it must either be large enough that even being random, there are lots of repeated sequences, or the test data is rigged.
Or, of course, it could all be a big pile of BS designed to encourage some funding/publicity.
Xentax
Re:100:1 ? I don't think so... (Score:5, Insightful)
Re:100:1 ? I don't think so... (Score:3, Insightful)
I suspect that when they say "random" data, they are using marketing-speak random, not math-speak random. Therefore, by 'random', they mean "data with lots of repetition like music or video files, which we'll CALL random because none of you copyright-infringing IP thieving pirates will know the difference"
Actually, if you change the domain you can get what appears to be impressive compression. Consider a bitmapped picture of a child's line drawing of a house. Replace that by a description of the drawing commands. Of course you have not violated Shannon's theorem because the amount of information in the original drawing is actually low.
At one time commercial codes were common. They were not used for secrecy, but to transmit large amounts of information when telegrams were charged by the word. The recipient looked up the code number in his codebook and reconstructed a lengthy message: "Don't buy widgets from this bozo. He does not know what he is doing."
... all you need to do is send 1, 2... and voila!
If you have a restricted set of outputs that appear to be random but are not, ie white noise sample #1, white noise sample #2
Re:100:1 ? I don't think so... (Score:3, Interesting)
Re:100:1 ? I don't think so... (Score:2)
Re:100:1 ? I don't think so... (Score:3, Informative)
Yes you are... (Score:2)
B is not random. It is a description (in some format) of A.
But, what you say does have merit, and this is why compressing a ZIP doesn't do much - there is a limit on repeated compression because the particular algorithm will output data which it itself is very bad at comrpessing further (if it didn't why not iterate once more and produce a smaller file internally?).
Re:100:1 ? I don't think so... (Score:4, Funny)
01101011
Pop that baby in an executable shell script. Its a self extracting
./configure
./make
./make install
Shh. Don't tell anyone.
Mr Thinly Sliced
Re:100:1 ? I don't think so... (Score:2)
BTW, someone shoot them for using so many TMs...
Re:100:1 ? I don't think so... (Score:4, Funny)
So everything compresses into 1 byte.
Duh, are you like an idiot or something?
When you send me a one-byte copy of, say, The Matrix, you also have to tell me how many times it was compressed so I know how many times to run the decompressor!
So everything compresses to *two* bytes. Maybe even three bytes if something is compressed more than 256 times. That's only required for files whose initial size is more than 100^256, though, so two bytes should do it for most applications.
Jeez, the quality of math and CS education has really gone down the tubes.
Re:100:1 ? I don't think so... (Score:4, Funny)
You're the moron, moron. When you get the one byte compressed file, you run the decompressor once to get the number of additional times to run the decompressor.
What are they teaching the kids today? Shannon-shmannon nonsense, no doubt. They should be doing useful things, like Marketing and Management Science. There's no point in being able to count if you don't have any money.
Re:100:1 ? I don't think so... (Score:3, Funny)
I don't need to encode the number of compressions, every decompression consists of decompressing 256 times.
I think you mean at most 256 times. Supposing I had to perform 10 compressions to compress to a singe byte. After you had decompressed 10 times, you'd have the data. the next decompression would make some other file 100 times larger than the Matrix. So if you could recognize the correct file when you saw it, I could avoid transmitting the decompression count.
So, I just have to prepend a string saying "This is it!" before compressing!
Also, it occurred to me after my previous posting (and to another poster, I saw) that if we can compress to a single byte, why not to a single bit? This is a great advance, which I believe I shall patent quickly before that other poster does, because now I can give you my copy of The Matrix over the phone! I can just tell you if it's a 1 or 0. For that matter, I don't even have to tell you -- you can just try both possibilities!
So my question now is, does the decompressor only produce strings of bits that exist somewhere and were once compressed, or does it produce anything? Can I just think "I want a great term paper..." and then try decompressing both 1 and 0 until I get it (in no more than 8 or ten iterations of the decompressor, 'cause I want a paper, not a novel).
Re:100:1 ? I don't think so... (Score:5, Funny)
Step 1: Steal Underpants
Step 3: Profit!
We're still working on step 2
Re:100:1 ? I don't think so... (Score:5, Insightful)
ZeoSync has developed the TunerAccelerator(TM) in conjunction with some traditional state-of-the-art compression methodologies. This work includes the advancement of Fractals, Wavelets, DCT, FFT, Subband Coding, and Acoustic Compression that utilizes synthetic instruments. These are methods that are derived from classical physics and statistical mechanics and quantum theory, and at the highest level, this mathematical breakthrough has enabled two classical scientific methods to be improved, Huffman Compression and Arithmetic Compression, both industry standards for the past fifty years.
They just threw in a bunch of compression buzzwords without even bothering to check whether they have anything to do with lossless compression...
Re:100:1 ? I don't think so... (Score:3, Funny)
BS
Re:100:1 ? I don't think so... (Score:3, Funny)
Conserve Bandwidth? (Score:2, Funny)
Time for a new law of information theory? (Score:5, Funny)
Tech details from the crappy Flash-only website (Score:5, Informative)
Given a number of pigeons within a sealed room that has a single hole, and which allows only one pigeon at a time to escape the room, how many unique markers are required to individually mark all of the pigeons as each escapes, one pigeon at a time?
After some time a person will reasonably conclude that:
"One unique marker is required for each pigeon that flies through the hole, if there are one hundred pigeons in the group then the answer is one hundred markers". In our three dimensional world we can visualize an example. If we were to take a three-dimensional cube and collapse it into a two-dimensional edge, and then again reduce it into a one-dimensional point, and believe that we are going to successfully recover either the square or cube from the single edge, we would be sorely mistaken.
This three-dimensional world limitation can however be resolved in higher dimensional space. In higher, multi-dimensional projective theory, it is possible to create string nodes that describe significant components of simultaneously identically yet different mathematical entities. Within this space it is possible and is not a theoretical impossibility to create a point that is simultaneously a square and also a cube. In our example all three substantially exist as unique entities yet are linked together. This simultaneous yet differentiated occurrence is the foundation of ZeoSync's Relational Differentiation Encoding(TM) (RDE(TM)) technology. This proprietary methodology is capable of intentionally introducing a multi-dimensional patterning so that the nodes of a target binary string simultaneously and/or substantially occupy the space of a Low Kolmogorov Complexity construct. The difference between these occurrences is so small that we will have for all intents and purposes successfully encoded lossley universal compression. The limitation to this Pigeonhole Principle circumvention is that the multi-dimensional space can never be super saturated, and that all of the pigeons can not be simultaneously present at which point our multi-dimensional circumvention of the pigeonhole problem breaks down.
The real "Pigeon hole principle" (Score:3, Informative)
I don't recall any of this crap about pigeons flying out of boxes. Or am I getting old?
Re:I think their investment model requires pigeons (Score:5, Interesting)
If you look at this sequence as a one-dimensional series: 00101101, it's pretty hard (at least for a processor) to distinguish a pattern there... it's a pseudo-random sequence. But if I paint it this way, in 2d: (0,0) (1,0) (1,1) (0,1), I can step back and see a square with sides of length one.
AFAIK, what these people are claiming is that they've developed a way to step WAY back, to n-dimensions, and have patterns emerge from seemingly random data.
It's not the random-number generation that's significant here... it's the purported ability to compress a seemingly random sequence. RLE typically doesn't fare very well with pure random data because it only looks for certain types of redundancy.
If I haven't missed the boat here, it's really a very interesting achievment.
Is this April 1st? (Score:3, Informative)
The punchline to the joke was always along the lines of
Re:Is this April 1st? (Score:2, Funny)
Please note they claim to be able to compress data 100:1, but do not say they can decompress the resultant data back to the original.
By the way, so can i.
Give me your data, of any sort, of any size, and i will make it take up zero space.
Just don't ask for it back.
Press Release here (Score:2, Informative)
http://www.zeosync.com/flash/pressrelease.htm [zeosync.com]
randomness (Score:2)
- Derwen
No Way... (Score:2, Redundant)
Re:No Way... (Score:3, Insightful)
It "probably" will not.
The reason is that in a random stream you may get repeating patterns (although you may not), and it's these repeating patterns which deflate uses.
Any encoding that saves space by compressing repeating data, also adds overhead for data that doesn't repeat -- at least as much overhead as you saved on the repetition, over the long run.
There ain't no such thing as a free lunch.
Re:No Way... (Score:3, Funny)
*Reads FAQ* *Blushes*
OK, so I went the "negligable housekeeping route". Maybe I should get a job in the patent office.
Re:No Way... (Score:3, Insightful)
Bullshit. There will be patterns, but the point is, all patterns are equally likely, so this does not help you. Don't believe me ? Test it yourself. Pull say a megabyte of your
The odds are very high (as in 99.999% ++) that none of the compressors will manage to shrink the file a single byte. Infact they will probably all cause it to grow very sligthly.
The proofs in the pudding. (Score:5, Funny)
ZeoSync announced today that the "random data" they were referencing is string of all zero's. Technically this could be produced randomly and our algorythm reduces this to just a couple of characters, a 100 times compression!!
The pressrelease (Score:4, Informative)
International Team of Scientists Have Discovered
How to Reduce the Expression of Practically Random Information Sequences
WEST PALM BEACH, Fla. - January 7, 2001 - ZeoSync Corp., a Florida-based scientific research company, today announced that it has succeeded in reducing the expression of practically random information sequences. Although currently demonstrating its technology on very small bit strings, ZeoSync expects to overcome the existing temporal restraints of its technology and optimize its algorithms to lead to significant changes in how data is stored and transmitted.
Existing compression technologies are currently dependent upon the mapping and encoding of redundantly occurring mathematical structures, which are limited in application to single or several pass reduction. ZeoSync's approach to the encoding of practically random sequences is expected to evolve into the reduction of already reduced information across many reduction iterations, producing a previously unattainable reduction capability. ZeoSync intentionally randomizes naturally occurring patterns to form entropy-like random sequences through its patent pending technology known as Zero Space Tuner(TM). Once randomized, ZeoSync's BinaryAccelerator(TM) encodes these singular-bit-variance strings within complex combinatorial series to result in massively reduced BitPerfect(TM) equivalents. The combined TunerAccelerator(TM) is expected to be commercially available during 2003.
According to Peter St. George, founder and CEO of ZeoSync and lead developer of the technology: "What we've developed is a new plateau in communications theory. Through the manipulation of binary information and translation to complex multidimensional mathematical entities, we are expecting to produce the enormous capacity of analogue signaling, with the benefit of the noise free integrity of digital communications. We perceive this advancement as a significant breakthrough to the historical limitations of digital communications as it was originally detailed by Dr. Claude Shannon in his treatise on Information Theory." [C.E. Shannon. A Mathematical Theory of Communication. Bell System Technical Journal, 27:379-423, 623-656, 1948]
"There are potentially fantastic ramifications of this new approach in both communications and storage," St. George continued. "By significantly reducing the size of data strings, we can envision products that will reduce the cost of communications and, more importantly, improve the quality of life for people around the world regardless of where they live."
Current technologies that enable the compression of data for transmission and storage are generally limited to compression ratios of ten-to-one. ZeoSync's Zero Space Tuner(TM) and BinaryAccelerator(TM) solutions, once fully developed, will offer compression ratios that are anticipated to approach the hundreds-to-one range.
Many types of digital communications channels and computing systems could benefit from this discovery. The technology could enable the telecommunications industry to massively reduce huge amounts of information for delivery over limited bandwidth channels while preserving perfect quality of information.
ZeoSync has developed the TunerAccelerator(TM) in conjunction with some traditional state-of-the-art compression methodologies. This work includes the advancement of Fractals, Wavelets, DCT, FFT, Subband Coding, and Acoustic Compression that utilizes synthetic instruments. These are methods that are derived from classical physics and statistical mechanics and quantum theory, and at the highest level, this mathematical breakthrough has enabled two classical scientific methods to be improved, Huffman Compression and Arithmetic Compression, both industry standards for the past fifty years.
All of these traditional methods are being enhanced by ZeoSync through collaboration with top experts from Harvard University, MIT, University of California at Berkley, Stanford University, University of Florida, University of Michigan, Florida Atlantic University, Warsaw Polytechnic, Moscow State University and Nankin and Peking Universities in China, Johannes Kepler University in Lintz Austria, and the University of Arkansas, among others.
Dr. Piotr Blass, chief technology advisor at ZeoSync, said "Our recent accomplishment is so significant that highly randomized information sequences, which were once considered non-reducible by the scientific community, are now massively reducible using advanced single-bit- variance encoding and supporting technologies."
"The technologies that are being developed at ZeoSync are anticipated to ultimately provide a means to perform multi-pass data encoding and compression on practically random data sets with applicability to nearly every industry," said Jim Slemp, president of Radical Systems, Inc. "The evaluation of the complex algorithms is currently being performed with small practically random data sets due to the analysis times on standard computers. Based on our internally validated test results of these components, we have demonstrated a single-point-variance when encoding random data into a smaller data set. The ability to encode single-point-variance data is expected to yield multi-pass capable systems after temporal issues are addressed."
"We would like to invite additional members of the scientific community to join us in our efforts to revolutionize digital technology," said St. George. "There is a lot of exciting work to be done."
About ZeoSync
Headquartered in West Palm Beach, Florida, ZeoSync is a scientific research company dedicated to advancements in communications theory and application. Additional information can be found on the company's Web site at www.ZeoSync.com or can be obtained from the company at +1 (561) 640-8464.
This press release may contain forward-looking statements. Investors are cautioned that such forward-looking statements involve risks and uncertainties, including, without limitation, financing, completion of technology development, product demand, competition, and other risks and uncertainties.
Buzzwordtastic (Score:2, Interesting)
I simply can't believe that this method of compression/encoding is so new that it requires a completely new dictionary (of words we presumably are not allowed to use).
I can do better than that! (Score:2, Funny)
The _real_ trick is getting 100% compression. It's actually really easy, there's a module built in to do it on your average unix.
Simply run all your backups to the New Universal Logical Loader and perfect compression is achieved. The device driver, is of course, loaded as
In this house we obey the 2nd law of thermodynamic (Score:3, Insightful)
I could see it working in a specific context (Score:2)
Instead of assuming that data is static, think of it constantly moving. Even in random data, moving data can be compressed because it constantly moving along. It is sort of like when a herd of people file into hall. Sure everyone is unique, but you could organize and say, "Hey five red shirts now", "ten blue shirts now".
And I think that is what they are trying to achieve. Move the dimensions into a different plane. However, and this is what I wonder about. How fast will it actually be? I am not referring to the mathematical requirements, but the data will stream and hence you will attempt to organize. Does that organization mean that some bytes have to wait?
Re:I could see it working in a specific context (Score:2)
For lossless compression simply saying "There were 5 red shirts and 7 blue shirts" isn't enough: You'd have to also store information on exactly where those 5 red shirts and 7 shirts were in the sample to be able to recreate the situation exactly as it was. Because of this it has been found to be impossible to "compress" truly random data without actually increasing the size of the file.
Of course if you're talking lossy then everything changes: Who cares where the shirts are just tell em how many there was. Unfortunately lossy is only relevant for images and sounds.
What's random? (Score:2)
If so, they're a bunch or twits.
Been there, done that... (Score:4, Informative)
This isn't limited to the field of compression of course. There are people that come up with "unbreakable" encryption, infinite gain amplifier (is that gain in V and I?), and all sorts of perpetual motion machines. The sad fact is that compression and encryption are not well understood enough for these ideas to be killed before a company is started or stacked on the claims.
Blah! (Score:2, Funny)
It's about /practically/ random data (Score:2)
There may be something about that. However, there are also many points that make me sceptical, but maybe the press release has not been reviewed carefully enough.
This new algorithm does not break Shannon's limit, which is impossible, so the phrase about the "historical limitations" is a hoax...
Buzz-word ALERT! (Score:2)
I think they have made a buzz-word compression routine, even our sales peoply have difficults putting this many buzz-words in a press release
Some background reading: (Score:5, Interesting)
I wonder if... (Score:2)
Now, what if they look for bit-sequences (not only 8-bit sequences but maybe odd numbers) in order to generate patterns ?
I guess this could be a way to significantly compress data but this'd imply a huge number of data read in order to achieve the best result possible.
Note they may also do this in more than one pass-through but then their compression thing should be really lengthy, then.
Reminder to Self... (Score:2)
"Either this research is the next 'Cold Fusion' scam that dies away or it's the foundation for a Nobel Prize. I don't have an answer to which one it is yet," said David Hill, a data storage analyst with Boston-based Aberdeen Group.
Wonder which category he expects them to win in...
Physics, Chemistry, Economics, Physiology / Medicine, Peace or Literature
There is no Nobel category for pure mathematics, or computing theory.
And they got funding ... (Score:2, Funny)
"It was just data, you know," the sobbing wretch was reportedly told, "just ones and zeros. And hey - you can look at it as a proof of principle. We'll have the general application out
Not random data (Score:4, Redundant)
ZeoSync is not claiming to reduce random data 100-to-1. They are claiming to reduce "practically random" data 100-to-1, and Reuters appears to have misreported it. What "practically random" data should mean is data randomly selected from that used in practice. What ZeoSync may mean by "practically random" is data randomly selected from that used in their intended applications. So their press release is not mathematically impossible; it just means they've found a good way to remove more information redundancy in some data.
The proof that 100-to-1 compression of random data is impossible is so simple as to be trivial: There are 2^N files of length N bits. There are 2^(N/100) files of length N/100 bits. Clearly not all 2^N files can be compressed to length N/100.
Egads... (Score:5, Funny)
The company's claims, which are yet to be demonstrated in any public forum...
Call the editors at Wired... I think we have an early nominee for the 2k2 vaporware list.
ZeoSync expects to overcome the existing temporal restraints of its technology
Ah... So even if it's not outright bullshit, it's too slow to use?
"Either this research is the next 'Cold Fusion' scam that dies away or it's the foundation for a Nobel Prize," said David Hill...
Somehow I think this is going to turn out more Pons-and-Fleischmann than Watson-and-Crick. Almost anytime there's a press release with such startling claims but no peer review or public demonstration, someone has forgotten to stir the jar.
When they become laughingstocks, and their careers are forever wrecked, I hope they realized they deserve it. And I hope their investors sue them.
I should really post after I've had my coffee... I sound mean...
OK,
- B
Re:Egads... (Score:5, Funny)
See you all later - I have some coding to do!
OK,
- B
What is compression (Score:3, Interesting)
So, if there is no redundancy, there is nothing to remove (if you want to remain lossless).
When you use some text, you may compres by remving some letter evn if tht lead to bad ortogrph. That is because English (as other langages) is redundant. When compressing some periodical signal, you may give only one period and tell that the signal is then repeated. When compressing bytes, there are specific methods (RLE, Huffman's trees,...)
But, in all these situations, there was some redundancy to remove...
A compression algorithm may not be perfect (it usually has to add some info to tell how the original data was compressed). Then, recompressing with another compression algorithm (or sometimes, the same will do the trick) may improve the compression. But the information quantity inside the data is the lower limit.
Now, take a true random data stream of n+1 bits. Even if you know the value of the n first bits, you can't predict the value of n+1. In other words, there is no way that could allow the express these n+1 bits with n (or less) bits. By definition, true random data can't be compressed.
And, to finish, compression ratio of 1:100 can be easily archived with some data... take a sequence of 200 bytes at 0x00... It may be compressed to 0xC8 0x00. Compression ratio is really only meaningful when comparing different algorithms compressing the same data stream.
Wow, it's not 100:1 (Score:2)
Might be possible... but I doubt it... (Score:3, Interesting)
Take very large prime numbers and the like, huge strings of almost random numbers that can often be written as a trivial (2^n)-1 type formula. Maybe the massaging of the figures is simply finding a very large number that can be expressed like the above with an offset other than "-1" to get the correct "BitPerfect" data. I was toying around with this idea when there was a fad for expressing DeCSS code in unusual ways, but ran out of math before I could get it to work.
The above theory maybe bull when it comes to the crunch, but if it could be made to work, then the compression figures are bang in the ball park for this. They laughed at Goddard remember? But I have to admit, I think replacing Einstein with the Monty Python foot better fits my take on this at present...
Silly web site (Score:2)
What happens when you run it backwards? (Score:4, Funny)
They are using time travel! (Score:5, Funny)
Using time travel, high compression of arbitrary data is trivial. Simply record the location (in both space and time) of the computer with the data, and the name of the file, and then replace the file with a note saying when and where it existed. To decompress, you just pop back in time and space to before the time of the deletion and copy the file.
Directed evolution (Score:5, Funny)
Just think of it as an innumeracy tax on
venture capitalists.
ZeoTech Scientific Team fake? (Score:4, Insightful)
I've not even had time to check the rest yet.
Re:ZeoTech Scientific Team fake? (Score:5, Informative)
Well, that's because they mis-spelled his name. Seriously, I bet they are really trying to refer to Wlodzimierz Holsztynski, who posts to Polish newsgroups from the address "sennajawa@yahoo.com". His last contribution to the one Usenet thread that mentions "zeosync" and his name uses the word "nonsens" a lot [google.com], also the phrase "nie autoryzowalem", and the sentence "Bylem ich konsultantem, moze znowu bede, a moze nie, z nimi nie wiadom." Somebody who really knows Polish could probably have a field day with this and other posts...
I'm getting the idea that some people on the scientific team might be better termed "random people we sent email to who actually responded once or twice".
Re:ZeoTech Scientific Team fake? (Score:5, Informative)
Confirmed with my Polish speaking coworkers (Score:3, Informative)
Their claims are 100% accurate (Score:3, Interesting)
Their claims are 100% accurate (they can compress random data 100:1) only if (by their definition) random data comprises a very small percentage of all possible data sequences. The other 99.9999% of "non-random" sequences would need to expand. You can show this by a simple counting argument.
This is covered in great detail in the comp.compression [faqs.org] FAQ. Take a look at the information on the WEB Technologies DataFiles/16 compressor (notice the similarity of claims!) if you're unconvinced. You can find it in Section 8 of Part 1 [faqs.org] of the FAQ.
--Joeteam members (Score:3, Interesting)
so either someone has lent their names to weirdoes without paying attention or there is something of substance hidden behind the PR ugliness. after all the PR is aimed toward investors, not toward sentient human beings, and is most probably not under the control of the scientific team.
How to compress ANY data to one bit (Score:3, Funny)
(Of course, this DOES create all sorts of other problems, but I'm going to ignore those, because they'd go and spoil things.)
Infinite monkey compression. (Score:4, Funny)
It's rare to see such a baldfaced scam (Score:4, Interesting)
The beauty of this scam is that zeospace claims that they can't even do it themselves, yet. They've only managed to compress very short strings. So, they can't be called to compress large random files because, well gosh, they just haven't gotten the big file compressor work yet. So, you can't prove that they are full of shit.
Beautiful flash animation, though. I particularly like the fact that clicking the 'skip intro' button does absolutely nothing -- you get the flash garbage anyway.
thad
Not possible (Score:5, Informative)
The proof goes like this:
There are then 65536 possible input-messages, but onle 256 possible outputs. So It is mathemathically certain that 99.7% of the messages can not be represented in 1 byte. (regardless of how I choose to encode them)
These claims surface ever so often. They're bullshit every time. It's even a FAQ-entry on sci.compression
From the press release: Huh? (Score:3, Interesting)
Anyone remember the OWS hoax? (Score:5, Interesting)
Back in 1991 or 1992, in the days of 2400 bps modems, MS-DOS 5.0, and BBS'es, a "radical new compression tool" called OWS made the rounds. It claimed to have been written by some guy in Japan and use breakthroughs in fractal compression, often achieving 99% compression! "Better than ARJ! Better than PKzip!" Of course all my friends and I downloaded it immediately. Now we can send gam^H^H^Hfiles to each other in 10 minutes instead of 10 hours!
Now I was in the ninth grade, and compression technology was a complete mystery to me then, so I suspected nothing at first. I installed it and read the docs. The commands and such were pretty much like PKzip. I promptly took one of my favorite ga^H^Hdirectories, *copied it to a different place*, compressed it, deleted it, and uncompressed it without problems. The compressed file was exactly 1024 bytes. Hmm, what a coincidence!
The output looked kind of funny though:
Compressing file abc.wad by 99%.
Compressing file cde.wad by 99%.
Compressing file start.bat by 99%.
etc. Wait, start.bat is only 10 characters, that's like one bit! And why is *every* file compressed by 99%? Oh well, must be a display bug.
So I called my friend and arranged to send him this g^Hfile via Zmodem, and it took only a few seconds. But he couldn't uncompress it on the other side. "Sector Not Found", he said. Oh well, try it again. Same result. Another bug.
So I decided that this wasn't working out and stopped using OWS. Their user interface needed some work anyway, plus I was a little suspicious of compression bugs. The evidence was right there for me to make the now-obvious conclusion, but it didn't hit me until a few *weeks* later when all the BBS sysops were posting bulletins warning that OWS was a hoax.
As it turns out, OWS was storing the FAT information in the compressed files, so that when people do reality checks it will appear to re-create the deleted files, as it did for me. But when they try to uncompress a file that actually isn't there or has had its FAT entries moved around, you get the "Sector Not Found" error and you're screwed. If I hadn't tried to send a compressed file to a friend I might have been duped into "compressing" and deleting half my software or more.
All in all, a pretty cruel but effective joke. If it happened today somebody would be in federal pound-me-in-the-ass prison. Maybe it happened then too...
(Yes, this is slightly off-topic, but where else am I going to post this?)
Re:scientific method, fact... goes out the window, (Score:2, Funny)
On the contrary! (Score:3, Insightful)
Quite the contrary: if they had claimed to be achieving 100:1 compression on truly random data, they would be provably talking total rubbish. Consider the number of possible bit strings of length N. Now consider the number of possible bit strings of length N/100. There are fewer of the latter, right? Therefore, if you can compress every length-N string into a length-N/100 string, at least two inputs must map to the same output. Hence, you can't uniquely recover the input from the output - and the compression cannot be lossless.
The fact that they hedge and talk about "practically" random sequences is the only thing that makes it possible they're telling the truth!
Re:Practically Random (Score:2)
I was thinking about submitting the ZeoSync release, and then I thought, nah, it's just fluff, no one will be interested... It's true that a press release is usually written by suits, not scientists, so you can't expect too much real meat - but "ZeoSync's approach to the encoding of practically random sequences is expected to evolve into the reduction of already reduced information" is a real winner; if you're "reducing information", it's not lossless compression! I smell a rat. The whole thing sounds like it could have been written by the Onion [theonion.com], for Crom's sake.
Re:Compression to one bit (Score:3, Informative)
YES! Ditto. Seconded. Somebody mod this guy up.
Here's a bit to whet your appetite:
9.1 Introduction
It is mathematically impossible to create a program compressing without loss
*all* files by at least one bit (see below and also item 73 in part 2 of this
FAQ). Yet from time to time some people claim to have invented a new algorithm
for doing so. Such algorithms are claimed to compress random data and to be
applicable recursively, that is, applying the compressor to the compressed
output of the previous run, possibly multiple times. Fantastic compression
ratios of over 100:1 on random data are claimed to be actually obtained.
Such claims inevitably generate a lot of activity on comp.compression, which
can last for several months. Large bursts of activity were generated by WEB
Technologies and by Jules Gilbert. Premier Research Corporation (with a
compressor called MINC) made only a brief appearance but came back later with a
Web page at http://www.pacminc.com. The Hyper Space method invented by David
C. James is another contender with a patent obtained in July 96. Another large
burst occured in Dec 97 and Jan 98: Matthew Burch applied
for a patent in Dec 97, but publicly admitted a few days later that his method
was flawed; he then posted several dozen messages in a few days about another
magic method based on primes, and again ended up admitting that his new method
was flawed. (Usually people disappear from comp.compression and appear again 6
months or a year later, rather than admitting their error.)
Other people have also claimed incredible compression ratios, but the programs
(OWS, WIC) were quickly shown to be fake (not compressing at all). This topic
is covered in item 10 of this FAQ.