Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror

Compress Wikipedia and Win AI Prize 324

Posted by CmdrTaco
from the what-does-this-mean dept.
Baldrson writes "If you think you can compress a 100M sample of Wikipedia better than paq8f, then you might want to try winning win some of a (at present) 50,000 Euro purse. Marcus Hutter has announced the Hutter Prize for Lossless Compression of Human Knowledge the intent of which is to incentivize the advancement of AI through the exploitation of Hutter's theory of optimal universal artificial intelligence. The basic theory, for which Hutter provides a proof, is that after any set of observations the optimal move by an AI is find the smallest program that predicts those observations and then assume its environment is controlled by that program. Think of it as Ockham's Razor on steroids. Matt Mahoney provides a writeup of the rationale for the prize including a description of the equivalence of compression and general intelligence."
This discussion has been archived. No new comments can be posted.

Compress Wikipedia and Win AI Prize

Comments Filter:
  • Comparison (Score:2, Informative)

    by ronkronk (992828) on Sunday August 13, 2006 @06:08PM (#15899828) Journal
    There are some amazing compression programs out there, trouble is they tend to take a while and consume lots of memory. PAQ [fit.edu] gives some impressive results, but the latest benchmark figures [maximumcompression.com] are regularly improving. Let's not forget that compression is not good unless it is integrated into a usable tool. 7-zip [7-zip.org] seems to be the new archiver on the block at the moment. A closely related, but different, set of tools are the archivers [freebsd.org], of which there are lots with many older formats still not supported by open source tools
  • by kfg (145172) * on Sunday August 13, 2006 @06:15PM (#15899855)
    a) how big the compressed size was

    18MB

    b) how many bytes was wikipedia before it was compressed

    A sample of 100MB

    Your goal:
    .

    KFG
  • Wrong contest (Score:4, Informative)

    by Baldrson (78598) * on Sunday August 13, 2006 @06:55PM (#15899977) Homepage Journal
    That's another contest that is useless for the reason you cite.

    The contest for the Hutter Prize requires the compressed corpus to be a self-extracting archive -- or failing that to add the size of the compressor to the compressed corpus.

  • by DrJimbo (594231) on Sunday August 13, 2006 @07:13PM (#15900036)
    Harmonious Botch said:
    This - in humans, at least - can lead to the cyclic reinforcement of one's belief system. The belief system that explains observations initially is used to filter observations later.
    I encourage you to read E. T. Jaynes' book: Probability Theory: The Logic of Science [amazon.com]. It used to be available on the Web in pdf form before a published version became available.

    In it, Jaynes shows that an optimal decision maker shares this same tendency of reinforcing exiting belief systems. He even gives examples where new information reinforces the beliefs of optimal observers who have reached opposite conclusions (due to differing initial sets of data). Each observer believes the new data further supports their own view.

    Since even an optimal decision maker has this undesirable trait, I don't think the existence of this trait is a good criteria for rejecting decision making models.

  • by KiloByte (825081) on Sunday August 13, 2006 @07:37PM (#15900119)
    Why so? The test file is exactly 10^8 bytes.
    I downloaded the corpus, and indeed, you're right -- it's 10^8 bytes. The article is incorrect, it says 100M where it means 95.3M.

    This inconsistency doesn't have any effect on the challenge, though -- that 50kEUR[1] is offered for compressing the given data corpus, not for compressing a string of 100MB.

    [1] 1kEUR=1000EUR. 1M EUR=1000000EUR. 1KB=1024B. 1MB=1048576B.
    And by the way, what about fixing Slash to finally allow Unicode -- either natively or at least as HTML entities?
  • by Baldrson (78598) * on Sunday August 13, 2006 @07:49PM (#15900155) Homepage Journal
    In it, Jaynes shows that an optimal decision maker shares this same tendency of reinforcing exiting belief systems. He even gives examples where new information reinforces the beliefs of optimal observers who have reached opposite conclusions (due to differing initial sets of data). Each observer believes the new data further supports their own view.

    I think what Hutter has shown is that there is a solution which unifies the new data with the old within a new optimum, which is most likely unique. I think it is based on the idea that Kolmogorov complexity is a unique value for any string and is most likely represented by a single optimum program (the "self-extracting archive" of the string).

  • by Baldrson (78598) * on Sunday August 13, 2006 @09:24PM (#15900439) Homepage Journal
    See the detailed rules for specifics [fit.edu] but generally the rules are just what you would expect: The program runs (and completes in a reasonable time) on a relatively recent system running Windows (currently XP) or Linux with no external inputs, eg no dynamically loaded libraries not included in the submission, no net communication and no disk I/O that isn't generated by the program itself.

    Points are not awarded for attempting to circumvent the intent of the competition. I expect such attempts would result in future submissions from the same source being ignored.

PLUG IT IN!!!

Working...