DreamMidnight t1_jc5om1y wrote on March 14, 2023 at 5:47 AM

What is the basis of this rule of thumb in regression:

"a minimum of ten observations per predictor variable is required"?

What is the origin of this idea?

LeN3rd t1_jcgqzvo wrote on March 16, 2023 at 6:37 PM

If you have more variables than datapoints, you will run into problems, if your model starts learning by heart. Your models overfits to the training data: https://en.wikipedia.org/wiki/Overfitting

You can either reduce the number of parameters in your model, or apply a prior (a constraint on your model parameters) to improve test dataset performance.

Since neural networks (the standard emperical machine learning tools nowadays) have a structure for their parameters, this means they can have much more parameters than simple linear regression models, but seem to run into problems, when the number of parameters in the network matches the number of datapoints. This is just empirically shown, i do not know any mathematical proves for it.

DreamMidnight t1_jchxtfy wrote on March 16, 2023 at 11:17 PM

Yes, although I am specifically looking into the reasoning of "at least 10 datapoints per variable."

What is the mathematical reasoning of this minimum?

LeN3rd t1_jcislrk wrote on March 17, 2023 at 3:11 AM

I have not heard this before. Where is it from? I know that you should have more datapoints than parameters in classical models.

DreamMidnight t1_jcrh53z wrote on March 19, 2023 at 12:10 AM

Here are some sources:

https://home.csulb.edu/~msaintg/ppa696/696regmx.htm

https://developers.google.com/machine-learning/data-prep/construct/collect/data-size-quality (order of magnitude in this case means 10)

https://stats.stackexchange.com/questions/163055/clarification-on-the-rule-of-10-for-logistic-regression

LeN3rd t1_jct6arv wrote on March 19, 2023 at 11:14 AM

Ok, so all of these are linear ( logistics) regression models, for which it makes sense to have more data points, because the weights aren't as constraint as in a convolutional layer I.e. but it is still a rule of thumb, not exactly a proof.

VS2ute t1_jd1irhb wrote on March 21, 2023 at 3:43 AM

If you have random noise on a variable, it can have a substantial effect when too few samples.

jakderrida t1_jcotnis wrote on March 18, 2023 at 12:32 PM

The basis of this rule of thumb is that having too few observations relative to the number of predictor variables can lead to unstable estimates of the model parameters, making it difficult to generalize to new data. In particular, if the number of observations is small relative to the number of predictor variables, the model may fit the noise in the data rather than the underlying signal, leading to overfitting.