Submission + - Saurabh Kumar's fast-cmix wins €5187 Hutter Prize Award! 1
Saurabh Kumar has just raised the bar 1.04% on the Hutter Prize for Lossless Compression of Human Knowledge with his "fast cmix" entry. If you would like to supplement Marcuss monetary award of €5187, one way is to send BTC to Saurabh at bc1qr9t26degxjc8kvx8a66pem70ye5sgdw7u4tyjy or contact Marcus Hutter directly.
Before describing Saurabhs contribution, there are two salient facts required to understand the importance of this competition:
1) It is more important than a language modeling competition. It is knowledge comprehension. To quote Gregory Chaitin, "Compression is comprehension."
- Every programming language is described in Wikipedia.
- Every scientific concept is described in Wikipedia.
- Every mathematical concept is described in Wikipedia.
- Every historic event is described in Wikipedia.
- Every technology is described in Wikipedia.
- Every work of art is described in Wikipedia — with examples.
- There is even the Wikidata project that provides Wikipedia a substantial amount of digested statistics about the real world.
Are you going to argue that comprehension of all that knowledge is insufficient to generatively speak the truth consistent with all that knowledge — and that this notion of "truth" will not be at least comparable to that generatively spoken by large language models such as ChatGPT?
2) The above also applies to Matt Mahoneys Large Text Compression Benchmark, which, unlike the Hutter Prize, allows unlimited computer resources. However the Hutter Prize is geared toward research in that it restricts computation resources to the most general purpose hardware that is widely available.
Why?
As described by the seminal paper "The Hardware Lottery" by Sara Hooker, AI research is biased toward algorithms optimized for existing hardware infrastructure. While this hardware bias is justified for engineering (applying existing scientific understanding to the "utility function" of making money) to quote Sara Hooker, it "can delay research progress by casting successful ideas as failures".
Saurabh Kumars Contribution
Saurabhs fast-cmix README describes how he went about substantially increasing the speed of the prior Hutter Prize algorithms, most recently Artemiy Margaritovs SorTing ARticLes by sImilariTy (STARLIT).
The complaint that this is "mere" optimization ignores the fact that this was done on general purpose computation hardware, and is therefore in line with the spirit of Sara Hookers admonition to researchers in "The Hardware Lottery". By showing how to optimize within the constraint of general purpose computation, Saurabhs contribution may help point the way toward future directions in hardware architecture.
Compression != comprehension (Score:2)
I keep shaking my head trying to come up with the right explanation, but other than superficially... compression is not comprehension.
Could comprehension be used to compress? Sure... But this feels like the Chinese Box. Yes the compression program does the original task, but that doesn't mean it's "thinking" or could generate novel content it wasn't programmed to do in the first place.
Isn't novel application the point of comprehension? You can't ask a data compressor to suggest anything "new yet appropr