activatedgeek t1_jbfjafv wrote on March 8, 2023 at 6:06 PM

Reply to what exactly is Variance(Xt) during the Forward Process in Diffusion model ? by eugene129

From (1), it looks like V(x_t) is the conditional variance of x_t given x_{t-1} (for the forward process defined by q).

activatedgeek t1_j9tq7qo wrote on February 24, 2023 at 2:27 PM

Reply to [D] Best Way to Measure LLM Uncertainty? by _atswi_

Came across this recently - Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation

activatedgeek t1_j9lz6ib wrote on February 22, 2023 at 10:50 PM

Reply to comment by GraciousReformer in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer

I literally gave an example of how (C)NNs have better inductive bias than random forests for images.

You need to ask more precise questions than just a "why".

activatedgeek t1_j9lux6q wrote on February 22, 2023 at 10:22 PM

Reply to Unit Normalization instead of Cross-Entropy Loss [Discussion] by thomasahle

You are implying that the NN learns exp(logits) instead of the logits without really constraining the outputs to be positive. It probably won't be a proper scoring rule though might appear to work.

In some ways, this is similar to how you can learn classifiers with the mean squared error by regressing directly to the one-hot vector of class label (here also you don't care about positive output). It works, and also implies a proper scoring rule called the Brier score.

activatedgeek t1_j9lu7q7 wrote on February 22, 2023 at 10:18 PM

Reply to comment by GraciousReformer in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer

All model classes have inductive biases. e.g. random forests have the inductive bias of producing axis-aligned region splits. But clearly, that inductive bias is not good enough for image classification because a lot of information in the pixels is spatially correlated that axis-aligned regions cannot capture as specialized neural networks, under the same budget. By budget, I mean things like training time, model capacity, etc.

If we have infinite training time and infinite number of image samples, then probably random forests might be just as good as neural networks.

activatedgeek t1_j9lou0s wrote on February 22, 2023 at 9:44 PM

Reply to comment by ThrowRA39384749 in [D] Simple Questions Thread by AutoModerator

It is. See Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences

activatedgeek t1_j9lobdv wrote on February 22, 2023 at 9:41 PM

Reply to comment by theidiotrocketeer in [D] Simple Questions Thread by AutoModerator

It is not uncommon anymore to model images as patches of tokens, and then send in the sequence to a transformer-based model. So not psychotic at all.

See An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.

activatedgeek t1_j9lnhvv wrote on February 22, 2023 at 9:36 PM

Reply to comment by chief167 in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer

See Theorem 2 (Page 34) of The Supervised Learning No-Free-Lunch Theorems.

It conditions "uniformly" averaged over all "f" the input-output mapping, i.e. the function that generates the dataset (this is a noise-free case). It also provides "uniformly averaged over all P(f)", a distribution over the data-generating functions.

So while you could still have different data-generating distributions P(f), the result is defined over all such distributions uniformly averaged.

The NFL is sort of a worst-case result, and I think it pretty meaningless and inconsequential for the real world.

Let me know if I have misinterpreted this!

activatedgeek t1_j9k58ev wrote on February 22, 2023 at 4:03 PM

Reply to comment by activatedgeek in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer

Not only dataset, the Transformer architecture itself seems to be amenable to in-context learning. See https://arxiv.org/abs/2209.11895

activatedgeek t1_j9k4z4o wrote on February 22, 2023 at 4:01 PM

Reply to comment by red75prime in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer

Very much indeed. See https://arxiv.org/abs/2205.05055

activatedgeek t1_j9jvj8h wrote on February 22, 2023 at 2:25 PM

Reply to [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer

For generalization (performing well beyond the training), there’s at least two dimensions: flexibility and inductive biases.

Flexibility ensures that many functions “can” be approximated in principle. That’s the universal approximation theorem. It is a descriptive result and does not prescribe how to find that function. This is not something very unique to DL. Deep Random Forests, Fourier Bases, Polynomial Bases, Gaussian processes all are universal function approximators (with some extra technical details).

The part unique to DL is that somehow their inductive biases have helped match some of the complex structured problems including vision and language that makes them generalize well. Inductive bias is a loosely defined term. I can provide examples and references.

CNNs provide the inductive bias to prefer functions that handle translation equivariance (not exactly true but only roughly due to pooling layers). https://arxiv.org/abs/1806.01261

Graph neural networks provide a relational inductive bias. https://arxiv.org/abs/1806.01261

Neural networks overall prefer simpler solutions, embodying Occam’s razor, another inductive bias. This argument is made theoretically using Kolmogorov complexity. https://arxiv.org/abs/1805.08522

activatedgeek t1_j9jt721 wrote on February 22, 2023 at 2:08 PM

Reply to comment by chief167 in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer

I think the no free lunch theorem is misquoted here. The NFL also assumes that all datasets from the universe of datasets are equally likely. But that is objectively false. Structure is more likely than noise.