Compress Wikipedia and Win AI Prize

Compress Wikipedia and Win AI Prize 324

Posted by CmdrTaco on Sunday August 13, 2006 @06:50PM from the what-does-this-mean dept.

Baldrson writes "If you think you can compress a 100M sample of Wikipedia better than paq8f, then you might want to try winning win some of a (at present) 50,000 Euro purse. Marcus Hutter has announced the Hutter Prize for Lossless Compression of Human Knowledge the intent of which is to incentivize the advancement of AI through the exploitation of Hutter's theory of optimal universal artificial intelligence. The basic theory, for which Hutter provides a proof, is that after any set of observations the optimal move by an AI is find the smallest program that predicts those observations and then assume its environment is controlled by that program. Think of it as Ockham's Razor on steroids. Matt Mahoney provides a writeup of the rationale for the prize including a description of the equivalence of compression and general intelligence."

Compress Wikipedia and Win AI Prize

This discussion has been archived. No new comments can be posted.

Search 324 Comments Log In/Create an Account

Comments Filter:

Comparison (Score:2, Informative)

by ronkronk ( 992828 ) writes: on Sunday August 13, 2006 @07:08PM (#15899828) Journal

There are some amazing compression programs out there, trouble is they tend to take a while and consume lots of memory. PAQ [fit.edu] gives some impressive results, but the latest benchmark figures [maximumcompression.com] are regularly improving. Let's not forget that compression is not good unless it is integrated into a usable tool. 7-zip [7-zip.org] seems to be the new archiver on the block at the moment. A closely related, but different, set of tools are the archivers [freebsd.org], of which there are lots with many older formats still not supported by open source tools

Re:for those who rtfa (Score:2, Informative)

by kfg ( 145172 ) * writes: on Sunday August 13, 2006 @07:15PM (#15899855)

a) how big the compressed size was

18MB

b) how many bytes was wikipedia before it was compressed

A sample of 100MB

Your goal:
.

KFG

Wrong contest (Score:4, Informative)

by Baldrson ( 78598 ) * writes: on Sunday August 13, 2006 @07:55PM (#15899977) Homepage Journal

That's another contest that is useless for the reason you cite.
The contest for the Hutter Prize requires the compressed corpus to be a self-extracting archive -- or failing that to add the size of the compressor to the compressed corpus.

Re:It's a big world out there (Score:5, Informative)

by DrJimbo ( 594231 ) writes: on Sunday August 13, 2006 @08:13PM (#15900036)

Harmonious Botch said:

This - in humans, at least - can lead to the cyclic reinforcement of one's belief system. The belief system that explains observations initially is used to filter observations later.

I encourage you to read E. T. Jaynes' book: Probability Theory: The Logic of Science [amazon.com]. It used to be available on the Web in pdf form before a published version became available.

In it, Jaynes shows that an optimal decision maker shares this same tendency of reinforcing exiting belief systems. He even gives examples where new information reinforces the beliefs of optimal observers who have reached opposite conclusions (due to differing initial sets of data). Each observer believes the new data further supports their own view.

Since even an optimal decision maker has this undesirable trait, I don't think the existence of this trait is a good criteria for rejecting decision making models.

Re:Can it be "lossy" compression? (Score:3, Informative)

by KiloByte ( 825081 ) writes: on Sunday August 13, 2006 @08:37PM (#15900119)

Why so? The test file is exactly 10^8 bytes.
I downloaded the corpus, and indeed, you're right -- it's 10^8 bytes. The article is incorrect, it says 100M where it means 95.3M.

This inconsistency doesn't have any effect on the challenge, though -- that 50kEUR[1] is offered for compressing the given data corpus, not for compressing a string of 100MB.

[1] 1kEUR=1000EUR. 1M EUR=1000000EUR. 1KB=1024B. 1MB=1048576B.
And by the way, what about fixing Slash to finally allow Unicode -- either natively or at least as HTML entities?

Re:It's a big world out there (Score:3, Informative)

by Baldrson ( 78598 ) * writes: on Sunday August 13, 2006 @08:49PM (#15900155) Homepage Journal

In it, Jaynes shows that an optimal decision maker shares this same tendency of reinforcing exiting belief systems. He even gives examples where new information reinforces the beliefs of optimal observers who have reached opposite conclusions (due to differing initial sets of data). Each observer believes the new data further supports their own view.
I think what Hutter has shown is that there is a solution which unifies the new data with the old within a new optimum, which is most likely unique. I think it is based on the idea that Kolmogorov complexity is a unique value for any string and is most likely represented by a single optimum program (the "self-extracting archive" of the string).

Barebones Windows or Linux (Score:3, Informative)

by Baldrson ( 78598 ) * writes: on Sunday August 13, 2006 @10:24PM (#15900439) Homepage Journal

See the detailed rules for specifics [fit.edu] but generally the rules are just what you would expect: The program runs (and completes in a reasonable time) on a relatively recent system running Windows (currently XP) or Linux with no external inputs, eg no dynamically loaded libraries not included in the submission, no net communication and no disk I/O that isn't generated by the program itself.
Points are not awarded for attempting to circumvent the intent of the competition. I expect such attempts would result in future submissions from the same source being ignored.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Compress Wikipedia and Win AI Prize 324

Compress Wikipedia and Win AI Prize More Login

Compress Wikipedia and Win AI Prize

Comparison (Score:2, Informative)

Re:for those who rtfa (Score:2, Informative)

Wrong contest (Score:4, Informative)

Re:It's a big world out there (Score:5, Informative)

Re:Can it be "lossy" compression? (Score:3, Informative)

Re:It's a big world out there (Score:3, Informative)

Barebones Windows or Linux (Score:3, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot