Submitted by GraciousReformer t3_118pof6 in MachineLearning
chief167 t1_j9jev01 wrote
Define scale
Language models? Sure. Images? Sure. Huge amounts of transaction data to search for fraud? Xgboost all the way lol.
Church no free lunch theorem: there is no single approach best for every possible problem. Djeezes I hate it when marketing takes over. You learn this principle in the first chapter of literally every data course
activatedgeek t1_j9jt721 wrote
I think the no free lunch theorem is misquoted here. The NFL also assumes that all datasets from the universe of datasets are equally likely. But that is objectively false. Structure is more likely than noise.
chief167 t1_j9ku5mq wrote
I don't think it implies that all datasets are equally likely. I think it only implies that given all possible datasets, there is no best approach to modelling them. All possible != All are equally likely
But I don't have my book with me, and I do t trust the internet since it seems to lead to random blogposts instead of the original paper (Wikipedia gave a 404 in the footnotes)
activatedgeek t1_j9lnhvv wrote
See Theorem 2 (Page 34) of The Supervised Learning No-Free-Lunch Theorems.
It conditions "uniformly" averaged over all "f" the input-output mapping, i.e. the function that generates the dataset (this is a noise-free case). It also provides "uniformly averaged over all P(f)", a distribution over the data-generating functions.
So while you could still have different data-generating distributions P(f), the result is defined over all such distributions uniformly averaged.
The NFL is sort of a worst-case result, and I think it pretty meaningless and inconsequential for the real world.
Let me know if I have misinterpreted this!
GraciousReformer OP t1_j9jfvfy wrote
Then what will be the limitation of transformers?
LowLook t1_j9jprdu wrote
Inventing them
GraciousReformer OP t1_j9jpu94 wrote
?
Viewing a single comment thread. View all comments