Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×

Comment Re:What's the size again? (Score 2) 22

kvezach writes: To put it differently: suppose that the text was 10 bytes long.

A better way of thinking about data scaling is to ask "How many digits of Pi would it take before, say, John Tromp's 401-bit binary lambda algorithm that generates Pi would become a better model than the literal string of those apparently-random digits?" (And by "better" I mean not only that it would be shorter than those digits, but that it would extrapolate to (ie: "predict") the next digit of Pi.)

In terms of how much data humans require, this is, as I said, something about which everyone has an opinion (including obviously you, to which you are of course entitled) but on which there is no settled science: Hence the legitimacy of the 1GB limit on a wide range of human knowledge for research purposes.

Concerns about the bias implied by "The Hardware Lottery" are not particularly relevant for engineering/business decisions, but path dependencies implicit in the economics of the world are always suspect as biasing research directions away from more viable models and, in the present instance, meta-models.

Comment Re:What's the size again? (Score 2) 22

There is only one Hutter Prize contest and it's for 1GB. 100MB was the original size for the Hutter Prize starting in 2006, but it was increased to 1GB in 2020, along with a factor of 10 increase in the payout per incremental improvement. See the "Hutter Prize History".

Insofar as the size is concerned: The purpose of the Hutter Prize is research into radically better means of automated data-driven model creation, not biased by what Sara Hooker has called "The Hardware Lottery". One of the primary limitations on current machine learning techniques is their data efficiency is low compared to that which natural intelligence is speculated to attain by some theories. Everyone has their opinion, of course, but it is far from "settled science". In particular, use of ReLU activation seems to indicate machine learning currently relies heavily on piece-wise linear interpolation in construction of its world model from language. Any attempt to model causality has to identify system dynamics (including cognitive dynamics) to extrapolate to future observations (ie: predictions) from past observations (ie: "the data in evidence"). Although there is reason to believe Transformers can do something like dynamics within their context windows despite using ReLU (and that this is what gives them their true potential for "emergence at scale") it wasn't until people started going to State Space Models that they started returning to dynamical systems identification (under another name, as academics are wont to gratuitously impose on their fields).

Submission + - Kaido Orav's fx-cmix Wins 6911€ Hutter Prize Award! (google.com)

Baldrson writes: Kaido Orav has just improved 1.38% on the Hutter Prize for Lossless Compression of Human Knowledge with his “fx-cmix” entry.

The competition seems to be heating up, with this winner coming a mere 6 months since the prior winner. This is all the more impressive since each improvement in the benchmark approaches the (unknown) minimum size called the Kolmogorov Complexity of the data.

Comment Re:No, that's not 114 megabytes (Score 1) 64

You didn't submit an executable archive of enwik9 purported to expand into a file that matched bit for bit. While you also failed in some other ways to qualify, that is now the first bar you must clear, before any further investment by the judging committee.

Or are you Yann LeCun out to pull my leg again?

https://twitter.com/ylecun/sta...

Comment Re:Silicon Valley anyone? (Score 1) 64

One big mistake they made early on with the Hutter Prize was not insisting that all contestants make their entries Open Source.

IIRC, only one entry was closed source. You may be thinking of Matt Mahoney's Large Text Compression Benchmark where the top contender is frequently closed source.

That the machine learning world has yet to recognize lossless compression as the most principled loss function is a tragedy, but it is due to a lot more than that entry. This failure stretches back to when Solomonoff's proof was overshadowed by Poppers falsification dogma in his popularization of the philosophy of science:

When a model's prediction is wrong, under Popper's falsification dogma, the model is "falsified", whereas under Solomonoff, the model is penalized by not only a measurement of the error (such as LSE), but by the literal encoding the error within the context of the model. The significance of this subtle difference is hard for people to understand, and this lack of understanding derailed the principled application of Moore's Law to science. Instead we got an explosion of statistical "information criteria for model selection", all of which are less principled than the Algorithmic Information Criterion, and now we have ChatGPT hallucinating us into genuinely treacherous territory.

Submission + - Saurabh Kumar's fast-cmix wins €5187 Hutter Prize Award! 1

Baldrson writes: Marcus Hutters tweet makes it official:

Saurabh Kumar has just raised the bar 1.04% on the Hutter Prize for Lossless Compression of Human Knowledge with his "fast cmix" entry. If you would like to supplement Marcuss monetary award of €5187, one way is to send BTC to Saurabh at bc1qr9t26degxjc8kvx8a66pem70ye5sgdw7u4tyjy or contact Marcus Hutter directly.

Before describing Saurabhs contribution, there are two salient facts required to understand the importance of this competition:

1) It is more important than a language modeling competition. It is knowledge comprehension. To quote Gregory Chaitin, "Compression is comprehension."

  • Every programming language is described in Wikipedia.
  • Every scientific concept is described in Wikipedia.
  • Every mathematical concept is described in Wikipedia.
  • Every historic event is described in Wikipedia.
  • Every technology is described in Wikipedia.
  • Every work of art is described in Wikipedia — with examples.
  • There is even the Wikidata project that provides Wikipedia a substantial amount of digested statistics about the real world.

Are you going to argue that comprehension of all that knowledge is insufficient to generatively speak the truth consistent with all that knowledge — and that this notion of "truth" will not be at least comparable to that generatively spoken by large language models such as ChatGPT?

2) The above also applies to Matt Mahoneys Large Text Compression Benchmark, which, unlike the Hutter Prize, allows unlimited computer resources. However the Hutter Prize is geared toward research in that it restricts computation resources to the most general purpose hardware that is widely available.

Why?

As described by the seminal paper "The Hardware Lottery" by Sara Hooker, AI research is biased toward algorithms optimized for existing hardware infrastructure. While this hardware bias is justified for engineering (applying existing scientific understanding to the "utility function" of making money) to quote Sara Hooker, it "can delay research progress by casting successful ideas as failures".

Saurabh Kumars Contribution


Saurabhs fast-cmix README describes how he went about substantially increasing the speed of the prior Hutter Prize algorithms, most recently Artemiy Margaritovs SorTing ARticLes by sImilariTy (STARLIT).

The complaint that this is "mere" optimization ignores the fact that this was done on general purpose computation hardware, and is therefore in line with the spirit of Sara Hookers admonition to researchers in "The Hardware Lottery". By showing how to optimize within the constraint of general purpose computation, Saurabhs contribution may help point the way toward future directions in hardware architecture.

Submission + - Artemiy Margaritov Wins €9000 In the First 10x Hutter Prize Award

Baldrson writes: The Hutter Prize for Lossless Compression of Human Knowledge has now awarded €9000 to Artemiy Margaritov as the first winner of the 10x expansion of the HKCP, first announced, over a year ago in conjunction with a Lex Fridman podcast!

Artemiy Margaritov's STARLIT algorithm's 1.13% cleared the 1% improvement hurdle to beat the last benchmark, set by Alexander Rhatushnyak. He receives a bonus in proportion to the time since the last benchmark was set, raising his award by 60% to €9000.

Congratulations to Artemiy Margaritov for his winning submission!

Submission + - Rule Update: 5'000 Byte Per Day Relaxation of Award Threshold

Baldrson writes: Marcus Hutter lowers the bar for The Hutter Prize for Lossless Compression of Human Knowledge. It has been a year since the Hutter Prize increased by a factor of 10 and there have been no new entries. By decreasing the compression threshold for a prize award by 5,000 bytes per day, Hutter hopes to increase the rate and fairness of prize awards, hence progress toward artificial general intelligence. From the Hutter Prize FAQ: Why do you grant a 'discount' of 5'000 Byte per day?

The contest went big early 2020, but so far no-one was able to beat the baseline. The discount has been introduced to ease participation and guarantee eventually at least one winner. The discount is around 1.5% per year, so should allow a first winner within a year, or at most two. The reason for the incremental discount is to prevent a quick win by whoever notices the discount first.

Comment Re:just use BERT? oh wait... (Score 1) 65

retchdog writes:

If BERT can magically sneak in some spooky black magic, then you're admitting that enwik9 is not an adequate data set for testing, end of story.

No. All BERT must do is beat the Large Text Compression Benchmark -- a benchmark that uses enwik9 as the corpus.

Comment Re:Annoying (Score 1) 65

Megol asks:

But let's assume the information is important, why should the raw text be repeated instead of the myriad variants that express the same thing in a different way?

For the same reason scientists don't throw out the raw measurements in their experiments just because it departs from theory. One can assert that the form of the expression is irrelevant to the underlying knowledge, but this is a sub-theory based the agent's comprehension of the entire corpus. This becomes particularly relevant when attempting to assign latent identities to sources of information so as to model their bias.

Comment Re:just use BERT? oh wait... (Score 1) 65

Separating the 2 questions, 1) "Why not just use BERT?" and 2) "Why exclude parallel hardware?"

1) BERT's algorithmic opacity leaves open an enormous "argument surface" regarding "algorithmic bias". There are >100,000 hits for "algorithmic bias" due to inadequately principled model selection criteria used by the ML community, thence google's algorithms. Algorithmic Information Theory cuts down the argument surface to its mathematical definition of natural science:
  Solomonoff Induction. Basically, it is provable that if classical (Cartesian) causality holds, distilling a dataset of observations to its Kolmogorov Complexity means you've factored out all the bias you possibly can if you claim to be "data-driven" in your policies. The BERT folks have yet to present a compressed enwik8 (100M) or enwik9 (1G) to demonstrate it is a superior unbiased model. They should do so. They don't have to submit an entry to the Hutter Prize for this, so they are free to use Google's entire TPU farm, or whatever. Let this be a challenge to them if they are serious about dealing with accusations of bias in their algorithms.

2) If one reads the relevant Hutter Prize FAQ answer, it becomes apparent that even a 2080Ti (or other GPU/TPU) may not be affordable to some of the best minds on the planet.

Slashdot Top Deals

Are you having fun yet?

Working...