First Hutter Prize Awarded 191

Posted by kdawson on Sunday October 29, 2006 @10:07PM from the compress-this dept.

stefanb writes, "The Hutter Prize for Lossless Compression of Human Knowledge, an ongoing challenge to compress a 100-MB excerpt of the Wikipedia, has been awarded for the first time. Alexander Ratushnyak managed to improve the compression factor to 5.86 and will receive a 3,416-Euro award. Being able to compress knowledge well is believed to be related to acting intelligently." The Usenet announcement notes that at Ratushnyak's request, part of the prize will go to Przemyslaw Skibinski of the University of Wroclaw Institute of Computer Science, for his early contributions to the PAQ compression algorithm.

This discussion has been archived. No new comments can be posted.

First Hutter Prize Awarded

Load All Comments

Search 191 Comments Log In/Create an Account

Comments Filter:

Seems like a strange contest (Score:5, Interesting)

by Salvance ( 1014001 ) * writes: on Sunday October 29, 2006 @10:17PM (#16637656) Homepage Journal

It turns out that Alexander Ratushnyak was the only person to even meet the challenge's guidelines, and one of only 3 people who submitted entries. Seems like a strange thing to award up to 50,000 Euro for ... his compression was only 20% smaller than what I just quickly did with gzip. I'm rather surprised that more people didn't try though ... it's not like people are throwing money at data compression researchers.

Share
twitter facebook
- Re: (Score:3, Informative)
  
  by Barny ( 103770 ) writes:
  
  20% of a few petabytes is.... a lot :)
  
  And yes, offering cash in this way is a great incentive for programers.
  
  Also, if its a cpu friendly (read: reasonable cpu useage or even able to be offloaded on a processing card) method it could potentially add 20% onto your bandwidth :)
  - Re:Seems like a strange contest (Score:5, Insightful)
    
    by Salvance ( 1014001 ) * writes: on Sunday October 29, 2006 @10:35PM (#16637822) Homepage Journal
    
    20% is a lot, when the compression/decompression is fast. gzip/WinRAR/WinZip all compress this file to the 22-24MB mark in a minute or two on my desktop (which is comparable to their test machine). The winner's algorithm compressed the data to 16.9MB, but spent 5 hours and 900MB of RAM doing so. The contest would be far more interesting if it added a reasonable time/RAM restriction (e.g. 10 minutes and 500MB of RAM).
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by Skuto ( 171945 ) writes:
      
      Well, there *is* a restriction on the amount of time/CPU. Just read the website.
      
      Whatever is "reasonable" is just personal preference.
      - Re: (Score:2)
        
        by r3m0t ( 626466 ) writes:
        
        That's a limit on the decompression program.
    - Re:Seems like a strange contest (Score:4, Informative)
      
      by vidarh ( 309115 ) writes: <vidar@hokstad.com> on Monday October 30, 2006 @04:50AM (#16639693) Homepage Journal
      
      You miss the point. The goal isn't to achieve better usable compression, but to encourage research they believe will lead to advances in AI.
      
      Parent Share
      twitter facebook
    - Re: (Score:2)
      
      by aug24 ( 38229 ) writes:
      
      The point of the exercise is not really to worry about how much time/memory it took, but to improves mechanisms for finding and then understanding structure within data.
      
      Perhaps two classes would be interesting: one with, and one without, time/space limitations.
      
      Justin.
    - Re: (Score:2)
      
      by archeopterix ( 594938 ) writes:
      
      The contest would be far more interesting if it added a reasonable time/RAM restriction (e.g. 10 minutes and 500MB of RAM).
      Maybe it would be more interesting, but it would also be a totally different contest. It isn't a contest for generic file compressors, goddamit! It's supposed to drive research in the field of knowledge representation, based on the supposition that in order to compress knowledge well, you have to find good representation for it. The compression factor is supposed to be a kind of benchm
      - Re: (Score:2)
        
        by Iron Condor ( 964856 ) writes:
        
        It's supposed to drive research in the field of knowledge representation, based on the supposition that in order to compress knowledge well, you have to find good representation for it.
        The problem that I see here is that we're precisely not compressing knowledge, but a certain, precisely-drawn-but-arbitrary slice of it. It might be possible to represent, say, all knowledge of classical electrodynamics in form of a bunch of equations, but how do you represent the subset of that knowledge contained in an a
        
        Virtue from necessity (Score:2)
        
        by Baldrson ( 78598 ) * writes:
        
        Since you have to choose at least one representation from which you derive your knowledge English text is a pretty good place to start. Wikipedia consists of more than English text of course and that means those representations will need to be compressed too.
        The result of this is pretty much what you need for epistemology, software and legal disciplines: Optimal language models telling you precisely how language maps to knowledge.
        There was some debate about using the multilingual versions of Wikipedi
    - Re: (Score:2)
      
      by GreatBunzinni ( 642500 ) writes:
      
      The contest would be far more interesting if it added a reasonable time/RAM restriction (e.g. 10 minutes and 500MB of RAM).
      
      What's the point of adding processing and/or memory requirements if the sole point of this prize is to squeeze the most information into the smallest possible package? After all in this case it's the end product that matters, not what it takes to get there.
    - Re: (Score:2)
      
      by gurps_npc ( 621217 ) writes:
      
      Next year, his algorithm, running on a standard desktop will do it in 3 hours. By 2015, a standard computer using his algorithm will do it in that same minute or two.
  - - Re: (Score:2)
      
      by Barny ( 103770 ) writes:
      
      Uh, someone did it didn't they? :)
- Re: (Score:3, Interesting)
  
  by Raul654 ( 453029 ) writes:
  
  There aren't many Wikipedia related things I don't know about (I'm a Wikipedia administrator, arbitrator, and bureacrat, featured article director, and I'm on the Wikimedia Foundation's Press Committee), and this is the first time I've ever heard of this contest. I think it's fair to say it's been relatively low-profile.
  - Re:Seems like a strange contest (Score:4, Insightful)
    
    by rbarreira ( 836272 ) writes: on Monday October 30, 2006 @05:23AM (#16639817) Homepage
    
    There aren't many women related things I don't know about (I have a girlfriend, I've had sex, my mother and my sisters are all women), and this is the first time I've ever heard of the "women's olympic weighlifting" contest. I think it's fair to say it's been relatively low-profile.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by exp(pi*sqrt(163)) ( 613870 ) writes:
      
      There aren't many women related things I don't know about
      
      That's the funniest thing I've read in days!
  - - Re: (Score:2)
      
      by Raul654 ( 453029 ) writes:
      
      Why use Wikipedia then? There must be dozens (if not hundreds or thousands) of free sources of regular text out there.
      - Re: (Score:2)
        
        by rbarreira ( 836272 ) writes:
        
        More importantly: why not use wikipedia? Read their FAQ before making irrelevant comments and questions here...
  - - - Re: (Score:2)
        
        by r3m0t ( 626466 ) writes:
        
        How about mine?
        
        http://en.wikipedia.org/wiki/User:R3m0t [wikipedia.org]
        
        Seriously: not everybody's page is like that.
  - - - Re: (Score:2)
        
        by QuantumG ( 50515 ) writes:
        
        No, it was on Slashdot when it was announced, I remember because I bookmarked the "Universal AI" book that it featured. here it is.. August 13. [slashdot.org]
- Re: (Score:2)
  
  by QuantumFTL ( 197300 ) writes:
  
  It turns out that Alexander Ratushnyak was the only person to even meet the challenge's guidelines
  
  Wow, that makes me really wish I had entered. There are some great multipass lossless text compression systems that would work well for Wikipedia...
  - - Re: (Score:2)
      
      by muszek ( 882567 ) writes:
      
      but rm is lossy. you'd end up as one of those that didn't meet criteria guidelines.
      - Re: (Score:2)
        
        by Breakfast Pants ( 323698 ) writes:
        
        And conversely, modding up your post would be like adding a negative number, so it would be a net loss.
  - - Re: (Score:2)
      
      by dch24 ( 904899 ) writes:
      
      Does anyone know if WinRK [wikipedia.org] has been submitted? It's probably pretty slow, and may go over the 10 hour time limit.
      - Re: (Score:2)
        
        by Kris_J ( 10111 ) * writes:
        
        WinRK uses PAQ technology. Since the winner in this story used a tweaked PAQ engine, I doubt that WinRK would perform better. Better than the WinZip, et al, examples people have been posting, but not better than the winning entry.
- Re: (Score:2)
  
  by Shados ( 741919 ) writes:
  
  Keep in mind that the rule was that the file had to be a self executable. If you just did default gziping, you'd have to include the overhead of a self executable in the total.
  - Re: (Score:2)
    
    by gameforge ( 965493 ) writes:
    
    The overhead for both compression and decompression on my box anyway looks to be about 50k (the size of my /bin/gzip).
    
    No clue whether it requires more code to compress than to decompress, but I would guess your overhead would be less than 30k. On an order of maybe 20MB of compressed data, that's not much.
    
    GZIP is a bad example because it also works with binary data; it's too much. If you only have to worry about encountering 128 possible values, compression can get real interesting, and there are a lot mor
    - Re: (Score:2)
      
      by Firehed ( 942385 ) writes:
      
      The only problem is that if your heuristics are designed for this specific chunk of data, it'll end up being largely useless for a full database crunching. Mind you, they could still be useful, but you don't want them to be sample-specific when you're talking about orders of magnitude more data that will need to get compressed.
      
      I really don't know jack about how compression is done, but I can toss out an idea. Scan article text for all of the used characters... expect to find your A-Z a-z 0-9 and some char
      - Re: (Score:2)
        
        by Kris_J ( 10111 ) * writes:
        
        My Javascript Packer does that. [krisjohn.net] To actually compete with the recent winner you need to start exploring far more complicated stuff.
- Re: (Score:2)
  
  by complete loony ( 663508 ) writes:
  
  Finding a way to compress just a little bit more, becomes an increasingly harder problem. You begin to hit entropy limits, and have to use methods that are only relevant to the type of data being compressed. For example, a good lossless audio compressor won't handle text data very well, and vice versa.
  Most compressors in this space use a prediction algorithm to generate a signal, and then only compress the differences between the generated signal and the actual data. To get the best possible compression, yo
- Re: (Score:2, Informative)
  
  by CryoPenguin ( 242131 ) writes:
  
  Since the criteria for entry say that any new submission must beat the current record, it's no surprise that only 3 people are listed. You're not seeing any of the people who didn't win.
- Re: (Score:2)
  
  by pizza_milkshake ( 580452 ) writes:
  
  I tried. It's hard.
Lossless from Lossy? (Score:2, Funny)

by Anonymous Coward writes:

"The Hutter Prize for Lossless Compression of Human Knowledge, an ongoing challenge to compress a 100-MB excerpt of the Wikipedia"

Wikipedia? Knowledge? Isn't that already a lossy compression mechanism?
Hmm... (Score:4, Funny)

by Cinder6 ( 894572 ) writes: on Sunday October 29, 2006 @10:25PM (#16637732)

Am I the only one who finds it slightly ironic that (as of this writing), there is no entry for the Hutter Prize on Wikipedia?

Share
twitter facebook
- Re: (Score:2)
  
  by iamwoodyjones ( 562550 ) writes:
  
  Give me a few minutes...;-)
- Re: (Score:2)
  
  by Evil Pete ( 73279 ) writes:
  
  But there is for the PAQ algorithm (see link in summary) with mention of the awarding of the Hutter prize.
- Re: (Score:3, Informative)
  
  by CastrTroy ( 595695 ) writes:
  
  While this is not article on the Hutter Prize itself, you will be relieved to know that it is mentioned in the article on Marcus Hutter [wikipedia.org]
For comparison .... (Score:4, Informative)

by Ignorant Aardvark ( 632408 ) writes: <cydeweys@gma i l . c om> on Sunday October 29, 2006 @10:26PM (#16637738) Homepage Journal

For comparison purposes, WinRAR on its best setting only gets this down to 24MB. Doubtless 7zip could get even lower, but I don't think either could crack the 17MB mark. And certainly neither of those would be self-extracting, which this contest requires.

Share
twitter facebook
- WIkipedia-specific compression algorithms (Score:2)
  
  by billstewart ( 78916 ) writes:
  1. Itemize the articles in the Wikipedia extract.
  2. Edit them down to stubs.
  3. Compress File
  4. ...
  5. Profit!
  6. ...
  7. Other Wikipedia authors replace and augment the stubs.
- Re: (Score:2)
  
  by thue ( 121682 ) writes:
  
  I tried compressing it with gzip, bzip2, and lzma programs. I tried posting the results, but they do not fit within the lameness filter :(.
  
  The best result was with lzma, the algorithm used by 7zip, which got it down to 25,188,131 bytes. So the 17MB achieved in this contest is pretty impressive.
makes one wonder (Score:2)

by v1 ( 525388 ) writes:

How slow this code is. Usually when you are trying to squeak out another 1% of space in a file your algorythm triples in size and quadruples in runtime to get that improvement. I wonder how practical this new algorythm really is. Probably not so much an issue nowadays but I also wonder how big the compression program is. This used to be a really big deal many years ago when the compression programs could easily be larger than the data they were compressing.
- Re: (Score:2)
  
  by CastrTroy ( 595695 ) writes:
  
  Accoring to somebody who posted above (he probably read it in the article, but i'm not about to look there), the program took, 5 hours and 900MB of RAM. I find it's interesting that he only got down to 17MB, while consuming all those resources. It makes it seem like it isn't even worth it. I'm sure you could find some pretty small program that would compute pi to some very large number of digits, and find the wikipedia excerpt in the results, but It would take a very long time to run.
  - Re: (Score:3, Informative)
    
    by AxelBoldt ( 1490 ) writes:
    
    I'm sure you could find some pretty small program that would compute pi to some very large number of digits, and find the wikipedia excerpt in the results, but It would take a very long time to run.
    In all likelihood, that trick won't work: your program would need to know the start index of the Wikipedia text in the digit sequence of pi, and that index would be so astronomically huge that writing it down would be about as long as writing down Wikipedia itself. As a little example of the effect, think about
    - Re:makes one wonder (Score:5, Funny)
      
      by Henry V .009 ( 518000 ) writes: on Sunday October 29, 2006 @11:58PM (#16638307) Journal
      
      Assuming a random distribution of digits in pi (and why does everyone assume this? -- there is certainly no proof), the odds are that you'll see one sequence of 4 zeros every 10,000 (decimal) digits. On the other hand, you'd expect to see 4 zeros once every 16 binary digits. Wikipedia, at 100MBs would be expected to be found once every 2^800,000,000 binary digits of pi.
      
      Then again, maybe we're lucky, and this universe is God's way of storing the universal encyclopedia in the digits of pi. Wikipedia might be up near the front somewhere.
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by Fyz ( 581804 ) writes:
        
        But suppose it's somewhere near the front, except there's a speling mistake every place "speling" occurs. Would that be proof that God isn't infallible?
        
        Re: (Score:2)
        
        by prichardson ( 603676 ) writes:
        
        It would be proof that God has a sense of humor.
        
        Re: (Score:2)
        
        by shadow_slicer ( 607649 ) writes:
        
        It'd be proof that man can't spell
      - Re: (Score:2)
        
        by AxelBoldt ( 1490 ) writes:
        
        Also, (I could be wrong on this, but) since pi is irrational, it can't have a pattern.
        Irrational numbers can't have a repeating pattern such as 0.123012301230123..., but they can still have a pattern, for instance 0.1230123001230001230000123...
    - Re: (Score:2)
      
      by arrrrg ( 902404 ) writes:
      
      Tangentially related and mildly amusing trivia:
      
      Feynman Point [wikipedia.org]. From the Wikipedia article: "The Feynman point comprises the 762nd through 767th decimal places of pi ... The name refers to a remark made by the physicist Richard Feynman, expressing a wish to memorise the digits of as far as that point so that when reciting them, he would be able to end with '... nine, nine, nine, nine, nine, nine, and so on.'"
      
      So, I dunno about "0000," but for some interesting sequences we may get lucky :).
    - Re: (Score:2)
      
      by dido ( 9125 ) writes:
      
      But again, this assumes that Pi is a normal number [everything2.com]. So far, nobody knows for sure whether or not this is true. I don't know why everyone seems to make this unfounded assumption [everything2.com] when dealing with Pi. Attempts to establish the truth of this matter are one of the main reasons why mathematicians are engaged in calculating Pi to billions and billions of digits; there is no other practical use for that many decimal places of accuracy; 39 digits of the number [everything2.com] would be enough to calculate the circumference of th
    - Re: (Score:2)
      
      by x2A ( 858210 ) writes:
      
      How about storing how long it would take a room full of monkeys to type it?
    - Re: (Score:2)
      
      by CastrTroy ( 595695 ) writes:
      
      I was just pointing out how resource intensive this program really was. I mean, sure you could use this program for compression (i'm interested on how it would perform on other pieces of data, or whether or not there was specific hacks put in for the data set), but it consumes a ton of resources and doesn't really give you something that much better than using gzip. At least not good enough that it's worth spending 5 hours decompressing 100 MB worth of data. Oh, and you could fit very large numbers in a
      - Re: (Score:2)
        
        by AxelBoldt ( 1490 ) writes:
        
        Oh, and you could fit very large numbers in a very small space, as long as you use the correct notation.
        That works only for very few numbers; you would have to be extraordinarily lucky for this trick to work with the starting index of Wikipedia in the digit sequence of pi.
- Re: (Score:2)
  
  by mh101 ( 620659 ) writes:
  
  FTFA:
  
  Restrictions: Must run in less than 10 hours on a 2GHz P4 with 1GB RAM and 10GB free HD.
  If someone's program takes close to 10 hours to compress 100MB it better get considerably more than just an additional 1% out of it!
  - Re: (Score:2)
    
    by arth1 ( 260657 ) writes:
    
    If someone's program takes close to 10 hours to compress 100MB it better get considerably more than just an additional 1% out of it!
    Indeed. Who in their right mind would spend 10 hours compressing instead of simply moving the entire dataset in a fraction of the time?
    
    Oh, and for a good compression scheme:
    
    uuencode filename filename | mail external@stora.ge
    
    Then all the decompresser needs to do is fetch the mail and uudecode it.
    
    Regards,
    --
    *Art
- Re: (Score:2, Insightful)
  
  by Alaria Phrozen ( 975601 ) writes:
  
  If you'd RTFA you'd find the running times ranged from 30 minutes to 5 hours. They have a whole table and everything.
  
  The whole point of the challenge was to create a self-executing compression program that made a perfect copy of their 100MB file. Final file sizes were in the 16MB range. Geeze, seriously RTFA.
compress knowledge = intelligence (Score:3, Insightful)

by Profound ( 50789 ) writes: on Sunday October 29, 2006 @10:56PM (#16637998) Homepage

Being able to compress knowledge well is believed to be related to acting intelligently. - IHNPTTT (I Have Not Passed The Turing Test), but while my brain is good at remember the gist of knowlege, but really bad at losslessly recalling it.

Share
twitter facebook
- Re: (Score:2)
  
  by illuminatedwax ( 537131 ) writes:
  
  intelligence != acting like a human
  - Re: (Score:2)
    
    by Kris_J ( 10111 ) * writes:
    
    intelligence != acting like a human
    
    Truer words were never typed.
  - Re: (Score:2)
    
    by qbwiz ( 87077 ) * writes:
    
    Intelligence != being able to store Wikipedia in a small space. In fact, according to Wikipedia, programs are already able to predict what comes next in English text [wikipedia.org] approximately as well as humans, but they aren't as intelligent as humans.
    
    If we want to make something less intelligent more intelligent, it seems likely that at some point, when its intelligence is in some sense the same as the average human's, that its behavior would be more similar to a human's than it is now. Of course, that's hardly a cer
    - Error in Wikipedia (Score:2)
      
      by Baldrson ( 78598 ) * writes:
      
      A more precise Wikipedia link is to the erroneous section on Entropy as information content [wikipedia.org] which reads:
      Experiments with human predictors show an information rate of between 1.1 and 1.6 bits per character, depending on the experimental setup; the PPM compression algorithm can achieve a compression ratio of 1.5 bits per character.
      The range for Shannon's experiments are actually between .6 and 1.3 bit per character.
      This error even caught Dr. Dobbs compression expert, Mark Nelson [marknelson.us] so I guess it isn't too
    - Re: (Score:2)
      
      by Iron Condor ( 964856 ) writes:
      
      If we want to make something less intelligent more intelligent, it seems likely that at some point, when its intelligence is in some sense the same as the average human's, that its behavior would be more similar to a human's than it is now.
      Why would it?
      The first flying machines we built weren't very good. Today we have flying machines that fly much better than birds. At now point in the intervening time did we ever have a machine that behaved anywhere similar to a bird.
      A point could be made that airp
- - Re: (Score:2)
    
    by epine ( 68316 ) writes:
    
    I can think of lots of stuff which would realistically count towards intelligence, but "compress knowledge" is the kind of thing that just sounds unbelievably stupid.
    Apparently, sounds deceive. You might want to practice your hearing. A good start would be subtracting out the sound of your own voice droning on at the drop of a pin about subjects you haven't made an effort to fully appreciate.
    
    Kolmogorov-Chaitin complexity [wikipedia.org] is the deepest theory going about the inter-relationship between compression and pred
    - Here's some clue for you (Score:2)
      
      by Moraelin ( 679338 ) writes:
      
      Here's some free, complimentary clue for you: "All that glitters is not gold." Just because someone wrote a ton of maths about compression, doesn't mean it's anywhere near applicable to building an AI.
      
      And in fact, noone actually built yet anything even vaguely resembling intelligence yet, so maybe, just maybe, all those hypotheses are missing some vital point. Maybe, just maybe, just because you've compressed something to the minimum number of bits, doesn't mean jack squat about being able to use it effecti
  - Re: (Score:2)
    
    by Abcd1234 ( 188840 ) writes:
    
    I can think of lots of stuff which would realistically count towards intelligence, but "compress knowledge" is the kind of thing that just sounds unbelievably stupid.
    
    Ooooh! Well, if you say it's stupid, then it must be true! I mean, I'm sure you hold a PhD in information theory and artificial intelligence, and as such, are in a perfect position to criticize this work, right? I'd hate to think you were attacking the work of someone far more educated than you in this subject matter out of sheer ignorance a
    - Re: (Score:2)
      
      by Moraelin ( 679338 ) writes:
      
      So, by the same logic, if you're not holding a Ph.D. in Economics, you're not allowed to say that an apology of Communism is bogus, right? They had plenty of Ph.D.'s in Moscow writing books about how great that system is. You wouldn't go and attack the work of people more educated than you out of sheer ignorance and stupidity, right?
      
      Here's a fun concept for you: appeal to false authority. Because that's the fallacy you're committing there.
      
      Show me someone who's actually built a working AI, and then we'll tal
      - Re: (Score:2)
        
        by Abcd1234 ( 188840 ) writes:
        
        Here's a fun concept for you: appeal to false authority. Because that's the fallacy you're committing there.
        
        And the fallacy you've committed is decrying a theory without even reading about it, let alone understanding it. Tell me, which is worse?
        
        Anything else is just taking a guess. So exactly what there says that you should automatically stop thinking for yourself and take his unproven ideas for absolute truth?
        
        I don't recall saying that. My point is that, with absolutely no basis in actual fact, you've de
        
        Re: (Score:2)
        
        by Moraelin ( 679338 ) writes:
        
        BTW, I might point out that people like you used to discount quantum theory. "The world around me doesn't work that way, so it can't possibly be true!" they would say. Guess who was right?
        "They laughed at Einstein. They laughed at the Wright Brothers. But they also laughed at Bozo the Clown." - Carl Sagan
        
        Did I mention fallacies? I think I did. In this case the Association Fallacy (sharing one quality or in this case one group of enemies with quantum physics doesn't actually prove this one true too), salted
Done before.... (Score:2)

by gmuslera ( 3436 ) writes:

... by Douglas Adams. Wonder what rate of compression is putting the meaning of life, universe and everything in just the number 42.
what matters most... (Score:2)

by ElitistWhiner ( 79961 ) writes:

is what you get when you run the new compression backwards!

This reminds me of the cryptographer paid to crypto-compress datastreams into musical notation. The work laid the foundation to real time packet sniffing of the Internet

Cryptographer? Got atta boys
Done on 9/25/06? (Score:3, Interesting)

by SillySnake ( 727102 ) writes: on Monday October 30, 2006 @01:02AM (#16638629)

According to http://prize.hutter1.net/ [hutter1.net] this happened on Sept 25 of 2006.

The site also gives some of the requirements..
Create a compressed version (self-extracting archive) of the 100MB file enwik8 of less than 17MB. More precisely:

* Create a Linux or Windows executable of size S L := 17'073'018 = previous record.
* If run, it produces (without input from other sources) a 108 byte file that is identical to enwik8.
* If we can verify your claim, you are eligible for a prize of 50'000×(1-S/L). Minimum claim is 500.
* Restrictions: Must run in 10 hours on a 2GHz P4 with 1GB RAM and 10GB free HD.

Share
twitter facebook
- Re: (Score:2)
  
  by Skuto ( 171945 ) writes:
  
  >According to http://prize.hutter1.net/ [hutter1.net] this happened on Sept 25 of 2006.
  
  There is waiting period for public comment/verification etc...
Interesting related webpage (Score:2)

by Skuto ( 171945 ) writes:

http://www.cs.fit.edu/~mmahoney/compression/text.h tml [fit.edu]

Info about contenders and results of common compression programs on the testset. (All the "just use gzip/rar/winrk/..." fools can stop jabbering now...)
Know what to compress, not how. (Score:2)

by fahrbot-bot ( 874524 ) writes:

Being able to compress knowledge well is believed to be related to acting intelligently.
Bah! Knowing what to compress is more intelligenter...
This is the AI problem (Score:2)

by LesPaul75 ( 571752 ) writes:

Being able to compress knowledge well is believed to be related to acting intelligently.
There's actually a little more to it than that. The creators of the Hutter prize believe that intelligence is the ability to compress knowledge. That's why they're offering this prize -- to solve the artificial intelligence problem. I'm not saying I buy into it, but that's their claim. That's why they call it "The most crucial technology prize of all." Here [geocities.com] is the page describing the origin of the prize.
- - Re: (Score:2)
    
    by rbarreira ( 836272 ) writes:
    
    Nice story ;) It would have been even better if:
    
    At a time there was a bot that beat all the other bots. Well, an analysis of the .class file turned out that it was actually loading the other bots' .class files and simulating them. :-)
    ... unknowingly, two such bots had been made and entered an infinite loop. That would teach their authors ;)
some standard compression tools tested (Score:2)

by elmartinos ( 228710 ) writes:

I have tested some of the standard compression tools, here are the compressed sizes:

100000000 | original size
36445241 | gzip --best
29008758 | bzip2
28860178 | rzip -9
24864529 | 7zr -mx=9
24753293 | rar a -m5 -mdG

7z does not do so well, I think because not so much tuned for text compression, it is much better at compressing binaries. For text compression PPMD and the variants are quite good, so I guess you will see good results with WinRK, Compressia, PAQ and the like.
Hutter Prize (Score:2)

by johansalk ( 818687 ) writes:

I really misread this as a Hitler Prize.
Netopsystems FEAD Optimizer (Score:2)

by Schraegstrichpunkt ( 931443 ) writes:

Recomposing...
For all the people laughing at this contest (Score:2)

by rbarreira ( 836272 ) writes:

For all the people laughing at this contest: Read this [fit.edu] and this [hutter1.net] before making any more posts about how ridiculous this contest seems, how stupid it is to compress wikipedia, how compression is such a dumb process, and how bad the compressor was for taking a lot of time to process the text.
- Re: (Score:2)
  
  by Viol8 ( 599362 ) writes:
  
  These sorts of algorithms might not be ridiculus but they have limited scope. To compress text in this way you have to understand the language and its idioms. So the algo that did this would for example be useless at compressing French and possibly even English written in some colloquial form. It might have uses in some specific instances but as a general purpose text compression algorithm it leaves a lot to be desired.
  - Re: (Score:2)
    
    by rbarreira ( 836272 ) writes:
    
    We can't know that until the algorithms are invented. And I guess the important thing here is not the algorithms, but the theories behind them (which hopefully will be possible to apply in a general way to all the languages).
Am I the only one that read the news as (Score:2)

by lord_rob the only on ( 859100 ) writes:

First Hustler Prize Awarded ?

Maybe I am the only Slashdot reader that enjoys pr0n ...
Unfortunately for Ratushnyak... (Score:2)

by Bohnanza ( 523456 ) writes:

...the prize was compressed by a factor of 5.86 as well.
I can make it 99% small!! (Score:2)

by lordmage ( 124376 ) writes:

See I have a reference file Zipper. That means that I take long strings and make a tag outta them. Right now it has 1 tag and using one huge 100 meg string reference file. I will update this later to include more. I think I should get the money! It also takes 20 seconds to run!
- Re:what the hell? (Score:4, Funny)
  
  by Barny ( 103770 ) writes: on Sunday October 29, 2006 @10:25PM (#16637728) Journal
  
  Hehe, and remember the study a few months back that found that the order of letters within a word is unimportant, so long as the first and last are correct? Potentially big words could be listed merely by how many of each letter they have, then randomly mix them up inside, order doesn't matter :)
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Carewolf ( 581105 ) writes:
    
    Not a bad idea, you could always rearrange them to the right order using a dictionary.
    - Re: (Score:2)
      
      by Trogre ( 513942 ) writes:
      
      Which would work fine, until it had to distinguish words like salt and slat.
      
      Or words not in a dictionary, like GetPidOf()
      - Re: (Score:2)
        
        by Schraegstrichpunkt ( 931443 ) writes:
        
        "GetPidOf()"... wasn't that renamed to "Bribe()" some time ago?
  - Re: (Score:2)
    
    by SamSim ( 630795 ) writes:
    
    Yeah, and remember when that turned out to be baloney [everything2.com]?
  - - - Re: (Score:2)
        
        by McFadden ( 809368 ) writes:
        
        It took me 10 minutes to figure out "ioiretnr" :-/
        Presumably this was after you had already figured out "reversing" and still drew a blank.
- Re: (Score:2)
  
  by Skuto ( 171945 ) writes:
  
  Read the website. (Oh wait, this is /., nobody does that).
  
  A system that understands the text it is compressing will compress better than one that doesn't.
  
  The winner improved upon previous compression programs by adding semantic modelling. Improving the modeling further would improve the results. You're heading towards language understanding.
- Re:Compression related to acting intelligently? (Score:5, Interesting)
  
  by adrianmonk ( 890071 ) writes: on Monday October 30, 2006 @02:58AM (#16639191)
  
  How does a computer algorithm for compression relate to the idea that human compression of knowledge corresponds to intelligent action? I think the summary is making a mistake in terminology. An intelligent person is able to compress information and access it quickly, but that doesn't really have much to do with a more efficient compression algorithm, unless we're talking some seriously advanced AI. I just really don't understand the correlation or what the summary is trying to imply...
  
  Nope, I had heard about the contest before seeing it today on Slashdot. The summary is fine. That is in fact the thesis that the contest is designed to investigate.
  
  As for why this is the case, I spent some time studying compression two or three years ago, and one of the things that you quickly realize is that there is no such thing as a truly general-purpose compression algorithm. Compression algorithms work on the principle that there is some underlying pattern or structure to your data. Once the structure is understood, the data can be transformed into a smaller format. Think of compressed files as a set of parameters to a magic function which generates a larger file. The real art is in finding the function.
  
  To give a concrete example, one of the simplest forms of compression is Huffman coding. You analyze your stream of symbols, and you realize that some occur more often than others. You then assign shorter bit strings to more frequently-occurring symbols and longer ones to less frequent symbols. This gives you a net gain. You were able to do this because you had the insight that in human language, some letters (like "e") occur more frequently than others (like "q").
  
  There are, of course, other patterns that can be exploited. You can take the frequency distribution trick above up to another level by noting that the frequency of a symbol is not really independent of the symbol before it. For example, the most likely symbol after a "q" is "u". Sure, you could have other things. But "u" is the most likely. You can exploit that to get better compression.
  
  But of course, you might be compressing something other than English text. There are different techniques for whatever kind of data you're trying to compress. A row of pixels in a faxed (1-bit) image tends to be very close to the same as the previous row. Each pixel is a bit. If you represent a row of pixels not as on or off directly but instead represent it as its bit value XORed with the pixel above it, you end up with 0 bits for every pixel that hasn't changed and 1 bits for every value that hasn't. Presto, you have just managed to skew the probability distribution radically towards 0 bits, and you can use further tricks to gain compression from that.
  
  The point of all this is that to come up with these tricks, you have to understand something about the data. One of the better definitions I've ever heard for data compression was simply "applied modeling".
  
  Given that, you have to ask a question: have we reached the point today where compression algorithms are as good as they're going to get at compressing English text based on its surface-level characteristics such as character frequencies and repeated strings? Have we exhausted the low-level modeling that is possible? There has been a lot of work on this, so it's possible that we may have. And if so, then any further gains in the compression ratio could very well be the result of some sort of higher-level modeling. Maybe even modeling the knowledge rather than just the language. And that is what this contest is about, as I understand it.
  
  It's not for sure that a winning contest entry necessarily requires the submitter to have developed some kind of machine intelligence. But it's an intriguing enough idea that it might be worthwhile to run the contest just to see if something interesting does happen.
  
  By the way, for some interesting reading on this subject, look up Kolmogorov Complexity. The wikipedia seems to have a pretty decent article [wikipedia.org] on it (although I haven't read the whole thing).
  Read the rest of this comment...
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by dido ( 9125 ) writes:
  
  The idea is basically an extension of what scientific theories are supposed to do. To understand any phenomenon, it is necessary to compress it. A scientific theory, at its core, explains the data from many different experiments by means of a formula simpler than the data. Say, we did a thousand experiments measuring the energy released from the annihiliation of large numbers of electrons and positrons. We could use something like the Lagrange polynomial [wikipedia.org] of the data from the experiment, and then come up
- Re: (Score:2)
  
  by Skuto ( 171945 ) writes:
  
  They were certainly not, and they are right.
  
  The contest produces a hard, verifiable result with a hard restriction on resources that you can use to attain it.
  
  If you look at the state of AI research, then you will understand that introducing some cold hard numbers wont hurt.
- Re: (Score:2)
  
  by rbarreira ( 836272 ) writes:
  
  I recommend that you read this [fit.edu] before making any further irrelevant comments.
  - - OK, I'll bite (Score:2)
      
      by rbarreira ( 836272 ) writes:
      
      Have you actually read the links I sent you? There's a mathematical proof that compressing natural language text implies being able to pass the Turing test, which is commonly thought to be an AI breakthrough AI when it happens. While that may not imply "intelligence" for all its definitions, it's surely something worthy of note, which your original comment doesn't seem to acknowledge.
      Finally, a real challenge would be the self-inventing compression method, not the self-extracting algorithm.
      So you're saying
      - Re: (Score:2)
        
        by smallfries ( 601545 ) writes:
        
        OK. I'm not the OP that you're arguing with but you cme across as someone who thinks they understand Hutter's proof so I've got a few questions for you. From the link that you gave, I've read the background before and skmmed through the ECML paper to remember what he'd done.
        
        The general AI algorithm seems to suffer from two main problem:
        1. Specification. The measurement critera is "wooly". If we assume that certain input symbols are nominated rewards from the environment then we are limiting ourselfs to cert
- Re: (Score:2)
  
  by LordEd ( 840443 ) writes:
  
  It would not be feasible in this competition because either the archive must be self-extracting, or that the decompression app's size is included in the final result.
  
  A lookup table of 32 bits would immediately cause an overhead of over 20 GB bytes (assuming an average of 5 letters per word, which is probably too small for the number of word options in 32 bits).
- And the answer is... (Score:2)
  
  by rumblin'rabbit ( 711865 ) writes:
  
  The answer in hex is 2A. Or 42, if you prefer decimal,
  
  Gotter down to 1 byte. Not bad at all.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Seems like a strange contest (Score:5, Interesting)

Re: (Score:3, Informative)

Re:Seems like a strange contest (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re:Seems like a strange contest (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Virtue from necessity (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Interesting)

Re:Seems like a strange contest (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Informative)

Re: (Score:2)

Lossless from Lossy? (Score:2, Funny)

Hmm... (Score:4, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Informative)

For comparison .... (Score:4, Informative)

WIkipedia-specific compression algorithms (Score:2)

Re: (Score:2)

makes one wonder (Score:2)

Re: (Score:2)

Re: (Score:3, Informative)

Re:makes one wonder (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Insightful)

compress knowledge = intelligence (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Error in Wikipedia (Score:2)

Re: (Score:2)

Re: (Score:2)

Here's some clue for you (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Done before.... (Score:2)

what matters most... (Score:2)

Done on 9/25/06? (Score:3, Interesting)

Re: (Score:2)

Interesting related webpage (Score:2)

Know what to compress, not how. (Score:2)

This is the AI problem (Score:2)

Re: (Score:2)

some standard compression tools tested (Score:2)

Hutter Prize (Score:2)

Netopsystems FEAD Optimizer (Score:2)

For all the people laughing at this contest (Score:2)