adventuringraw t1_j9in5sj wrote on February 22, 2023 at 6:07 AM

I mean... the statement specifically uses the phrase 'arbitrary functions'. GLMs are a great tool in the toolbox, but the function family it optimizes over is very far from 'arbitrary'.

I think the statement's mostly meaning 'find very nonlinear functions of interest when dealing with very large numbers of samples from very high dimensional sample spaces'. GLM's are used in every scientific field, but certainly not for every application. Some form of deep learning really is the only game in town still for certain kinds of problems at least.

relevantmeemayhere t1_j9kin48 wrote on February 22, 2023 at 5:27 PM

I agree with you. I was just pointing out that to say they are the only solution is foolish, as the quote implied

This quote could have just been used without much context, so grain of salt.

adventuringraw t1_j9ll3fp wrote on February 22, 2023 at 9:21 PM

I can see how the quote could be made slightly more accurate. In particular, tabular data in general is still better tackled with something like XGBoost instead of deep learning, so deep learning certainly hasn't turned everything into a nail for one universal hammer yet.

Featureless_Bug t1_j9iy4yq wrote on February 22, 2023 at 8:22 AM

Haven't heard of GLMs being successfully used for NLP and CV in the recent time. And these are like the only things that would be described as large scale in ML. The statement is completely correct - even stuff like gradient boosting does not work at scale in that sense

chief167 t1_j9kt5ho wrote on February 22, 2023 at 6:31 PM

We use gradient boosting at quite a big scale. Not LLM big, but still big. It's just not NLP or CV at all. It's for fraud detection in large transactional tabular datasets. And it outperforms basically all neural network, shallow or deep, approaches.

Featureless_Bug t1_j9kuu22 wrote on February 22, 2023 at 6:41 PM

Large scale is somewhere to the north of 1-2 TB of data. Even if you had that much data, in absolutely most cases tabular data has such a simplistic structure that you wouldn't need that much data to achieve the same performance - so I wouldn't call any kind of tabular data large scale to be frank

relevantmeemayhere t1_j9ki2x1 wrote on February 22, 2023 at 5:24 PM

Because they are useful for some problems and not others, like every algorithm? Nowhere in my statement did I say they are monolithic in their use across all subdomains of ml

The statement was that deep learning is the only thing that works at scale. It’s not lol. Deep learning struggles in a lot of situations.

Featureless_Bug t1_j9kvek5 wrote on February 22, 2023 at 6:45 PM

Ok, name one large scale problem where GLMs are the best prediction algorithm possible.

relevantmeemayhere t1_j9kygtx wrote on February 22, 2023 at 7:03 PM

Any problem where you want things like effect estimates lol. Or error estimates. Or models that generate joint distributions

So, literally a ton of them. Which industries don’t like things like that?

VirtualHat t1_j9j2gwx wrote on February 22, 2023 at 9:23 AM

Large linear models tend not to scale well to large datasets if the solution is not in the model class. Because of this lack of expressivity, linear models tend to do poorly on complex problems.

relevantmeemayhere t1_j9khp8m wrote on February 22, 2023 at 5:22 PM

As you mentioned, this is highly dependent on the functional relationship of the data.

You wouldn’t not use domain knowledge to determine that.

Additionally, non linear models tend to have their own drawbacks. Lack of interpretability, high variability being some of them

GraciousReformer OP t1_j9j7iwm wrote on February 22, 2023 at 10:34 AM

>Large linear models tend not to scale well to large datasets if the solution is not in the model class

Will you provide me a reference?

VirtualHat t1_j9j8805 wrote on February 22, 2023 at 10:43 AM

Linear models make an assumption that the solution is in the form of y=ax+b. If the solution is not in this form then the best solution will is likely to be a poor solution.

I think Emma Brunskill's notes are quite good at explaining this. Essentially the model will underfit as it is too simple. I am making an assumption though, that a large dataset implies a more complex non-linear solution, but this is generally the case.

relevantmeemayhere t1_j9kifhu wrote on February 22, 2023 at 5:26 PM

Linear models are often preferred for the reasons you mentioned. Under fitting is almost always preferred to overfitting.

VirtualHat t1_j9ll5i2 wrote on February 22, 2023 at 9:21 PM

Yes, that's right. For many problems, a linear model is just what you want. I guess what I'm saying is that the dividing line between when a linear model is appropriate vs when you want a more expressive model is often related to how much data you have.

GraciousReformer OP t1_j9j8bsl wrote on February 22, 2023 at 10:44 AM

Thank you. I understand the math. But I meant a real world example that "the solution is not in the model class."

VirtualHat t1_j9j8uvr wrote on February 22, 2023 at 10:51 AM

For example, in IRIS dataset, the class label is not a linear combination of the input. Therefore, if your model class is all linear models, you won't find the optimal or in this case, even a good solution.

If you extend the model class to include non-linear functions, then your hypothesis space now at least contains a good solution, but finding it might be a bit more trickly.

GraciousReformer OP t1_j9jgdmc wrote on February 22, 2023 at 12:19 PM

But DL is not a linear model. Then what will be the limit of DL?

terminal_object t1_j9jp51j wrote on February 22, 2023 at 1:37 PM

You seem confused as to what you yourself are saying.

GraciousReformer OP t1_j9jppu7 wrote on February 22, 2023 at 1:42 PM

"Artificial neural networks are often (demeneangly) called "glorified regressions". The main difference between ANNs and multiple / multivariate linear regression is of course, that the ANN models nonlinear relationships."

https://stats.stackexchange.com/questions/344658/what-is-the-essential-difference-between-a-neural-network-and-nonlinear-regressi

PHEEEEELLLLLEEEEP t1_j9k691x wrote on February 22, 2023 at 4:10 PM

Regression doesnt just mean linear regression, if that's what you're confused about

Acrobatic-Book t1_j9k94l4 wrote on February 22, 2023 at 4:28 PM

The simplest example is the xor-problem (aka either or). This was also why multilayer perceptrons as the basis of deep learning where actually created. Because a linear model cannot solve it.

VirtualHat t1_j9lkto4 wrote on February 22, 2023 at 9:19 PM

Oh wow, super weird to be downvoted just for asking for a reference. r/MachineLearning isn't what it used to be I guess, sorry about that.

BoiElroy t1_j9ioqcz wrote on February 22, 2023 at 6:25 AM

Yeah always should first exhaust existing classical methods before reaching for deep learning.

[deleted] t1_j9ijb65 wrote on February 22, 2023 at 5:26 AM

[deleted]

relevantmeemayhere t1_j9ilsax wrote on February 22, 2023 at 5:52 AM

?

sdmat t1_j9izc3q wrote on February 22, 2023 at 8:38 AM

Not exactly but close enough?

Fancy-Jackfruit8578 t1_j9j5v25 wrote on February 22, 2023 at 10:10 AM

Because every NN is just basically a big linear function… with a nonlinearity at the end.

hpstring t1_j9jbhwv wrote on February 22, 2023 at 11:24 AM

This is correct for two-layer NNs, not general NNs.

TeamRocketsSecretary t1_j9jemik wrote on February 22, 2023 at 12:00 PM

Wrong.

[D] "Deep learning is the only thing that currently works at scale"

relevantmeemayhere t1_j9ij6cc wrote on February 22, 2023 at 5:25 AM