Submitted by GraciousReformer t3_118pof6 in MachineLearning
relevantmeemayhere t1_j9ij6cc wrote
Lol. The fact that we use general linear models in every scientific field, and have been for decades should tell you all you need to know about this statement.
adventuringraw t1_j9in5sj wrote
I mean... the statement specifically uses the phrase 'arbitrary functions'. GLMs are a great tool in the toolbox, but the function family it optimizes over is very far from 'arbitrary'.
I think the statement's mostly meaning 'find very nonlinear functions of interest when dealing with very large numbers of samples from very high dimensional sample spaces'. GLM's are used in every scientific field, but certainly not for every application. Some form of deep learning really is the only game in town still for certain kinds of problems at least.
relevantmeemayhere t1_j9kin48 wrote
I agree with you. I was just pointing out that to say they are the only solution is foolish, as the quote implied
This quote could have just been used without much context, so grain of salt.
adventuringraw t1_j9ll3fp wrote
I can see how the quote could be made slightly more accurate. In particular, tabular data in general is still better tackled with something like XGBoost instead of deep learning, so deep learning certainly hasn't turned everything into a nail for one universal hammer yet.
Featureless_Bug t1_j9iy4yq wrote
Haven't heard of GLMs being successfully used for NLP and CV in the recent time. And these are like the only things that would be described as large scale in ML. The statement is completely correct - even stuff like gradient boosting does not work at scale in that sense
chief167 t1_j9kt5ho wrote
We use gradient boosting at quite a big scale. Not LLM big, but still big. It's just not NLP or CV at all. It's for fraud detection in large transactional tabular datasets. And it outperforms basically all neural network, shallow or deep, approaches.
Featureless_Bug t1_j9kuu22 wrote
Large scale is somewhere to the north of 1-2 TB of data. Even if you had that much data, in absolutely most cases tabular data has such a simplistic structure that you wouldn't need that much data to achieve the same performance - so I wouldn't call any kind of tabular data large scale to be frank
relevantmeemayhere t1_j9ki2x1 wrote
Because they are useful for some problems and not others, like every algorithm? Nowhere in my statement did I say they are monolithic in their use across all subdomains of ml
The statement was that deep learning is the only thing that works at scale. It’s not lol. Deep learning struggles in a lot of situations.
Featureless_Bug t1_j9kvek5 wrote
Ok, name one large scale problem where GLMs are the best prediction algorithm possible.
relevantmeemayhere t1_j9kygtx wrote
Any problem where you want things like effect estimates lol. Or error estimates. Or models that generate joint distributions
So, literally a ton of them. Which industries don’t like things like that?
VirtualHat t1_j9j2gwx wrote
Large linear models tend not to scale well to large datasets if the solution is not in the model class. Because of this lack of expressivity, linear models tend to do poorly on complex problems.
relevantmeemayhere t1_j9khp8m wrote
As you mentioned, this is highly dependent on the functional relationship of the data.
You wouldn’t not use domain knowledge to determine that.
Additionally, non linear models tend to have their own drawbacks. Lack of interpretability, high variability being some of them
GraciousReformer OP t1_j9j7iwm wrote
>Large linear models tend not to scale well to large datasets if the solution is not in the model class
Will you provide me a reference?
VirtualHat t1_j9j8805 wrote
Linear models make an assumption that the solution is in the form of y=ax+b. If the solution is not in this form then the best solution will is likely to be a poor solution.
I think Emma Brunskill's notes are quite good at explaining this. Essentially the model will underfit as it is too simple. I am making an assumption though, that a large dataset implies a more complex non-linear solution, but this is generally the case.
relevantmeemayhere t1_j9kifhu wrote
Linear models are often preferred for the reasons you mentioned. Under fitting is almost always preferred to overfitting.
VirtualHat t1_j9ll5i2 wrote
Yes, that's right. For many problems, a linear model is just what you want. I guess what I'm saying is that the dividing line between when a linear model is appropriate vs when you want a more expressive model is often related to how much data you have.
GraciousReformer OP t1_j9j8bsl wrote
Thank you. I understand the math. But I meant a real world example that "the solution is not in the model class."
VirtualHat t1_j9j8uvr wrote
For example, in IRIS dataset, the class label is not a linear combination of the input. Therefore, if your model class is all linear models, you won't find the optimal or in this case, even a good solution.
If you extend the model class to include non-linear functions, then your hypothesis space now at least contains a good solution, but finding it might be a bit more trickly.
GraciousReformer OP t1_j9jgdmc wrote
But DL is not a linear model. Then what will be the limit of DL?
terminal_object t1_j9jp51j wrote
You seem confused as to what you yourself are saying.
GraciousReformer OP t1_j9jppu7 wrote
"Artificial neural networks are often (demeneangly) called "glorified regressions". The main difference between ANNs and multiple / multivariate linear regression is of course, that the ANN models nonlinear relationships."
PHEEEEELLLLLEEEEP t1_j9k691x wrote
Regression doesnt just mean linear regression, if that's what you're confused about
Acrobatic-Book t1_j9k94l4 wrote
The simplest example is the xor-problem (aka either or). This was also why multilayer perceptrons as the basis of deep learning where actually created. Because a linear model cannot solve it.
VirtualHat t1_j9lkto4 wrote
Oh wow, super weird to be downvoted just for asking for a reference. r/MachineLearning isn't what it used to be I guess, sorry about that.
BoiElroy t1_j9ioqcz wrote
Yeah always should first exhaust existing classical methods before reaching for deep learning.
[deleted] t1_j9ijb65 wrote
[deleted]
relevantmeemayhere t1_j9ilsax wrote
?
sdmat t1_j9izc3q wrote
Not exactly but close enough?
Fancy-Jackfruit8578 t1_j9j5v25 wrote
Because every NN is just basically a big linear function… with a nonlinearity at the end.
hpstring t1_j9jbhwv wrote
This is correct for two-layer NNs, not general NNs.
TeamRocketsSecretary t1_j9jemik wrote
Wrong.
Viewing a single comment thread. View all comments