jabowery
jabowery OP t1_jdvkjt0 wrote
Reply to comment by 1stuserhere in [D] Definitive Test For AGI by jabowery
Imputation can make interpolation appear to be extrapolation.
So, to fake AGI's capacity for accurate extrapolation (data efficiency), one may take a big pile of money and throw it at expanding the training set to infinity and expanding the matrix multiplication hardware to infinity. This permits more datapoints within which one may interpolate over a larger knowledge space.
But it is fake.
If, on the other hand, you actually understand the content of Wikipedia (the Hutter Prize's very limited, high quality corpus), you may deduce (extrapolate) the larger knowledge space through the best current mathematical definition of AGI: AIXI's where the utility function of the sequential decision theoretic engine is to minimize the algorithmic description of the training data (Solomonoff Induction) used as the prediction oracle in the AGI.
jabowery OP t1_jdvjoi8 wrote
Reply to comment by Deep-Station-1746 in [D] Definitive Test For AGI by jabowery
Information quality may be measured in terms of its signal to noise ratio. Now, agreed, too dense a signal may appear to be noise to some audiences and this is part of the art of writing. However, an advantage of interactive media as opposed to, say, a book, is that the audience is present -- hence [D] is possible. What I've presented to you is, while not understandable to the general audience as signal, is, nevertheless, profoundly true. It may therefore be a good starting point for [D].
jabowery OP t1_jdvetot wrote
Reply to comment by Matthew2229 in [D] Definitive Test For AGI by jabowery
Optimal lossless compression isn't just another task. It's central to the very definition of Artificial General Intelligence. See this presentation by one of the founders of DeepMind.
jabowery OP t1_jdte6cz wrote
Reply to comment by ttkciar in [D] Definitive Test For AGI by jabowery
Here ya go:
print("I am the AGI you've been waiting for.")
Submitted by jabowery t3_1234air in MachineLearning
jabowery t1_jdm16ig wrote
Algorithmic information theory: Smallest model that memorizes all the data is optimal. "Large" is only there because of the need to expand in order to compress. Think decompress gz in order to compress with bz2. Countering over-fitting with over-informing (bigger data) yields interpolation, sacrificing extrapolation.
If you understand all of the above you'll be light years beyond the current ML industry including the political/religious bias of "algorithmic bias experts".
jabowery t1_je107nj wrote
Reply to [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
See these entries in the Hutter Prize FAQ:
Why aren't cross-validation or train/test-set used for evaluation?
Why is (sequential) compression superior to other learning paradigms?
Why is Compressor Length superior to other Regularizations?
Why not use Perplexity, as most big language models do?
Is Ockham's razor and hence compression sufficient for AGI?