TheBrain85 t1_j7fymn9 wrote on February 6, 2023 at 2:58 PM

Reply to comment by Benediktas in [OC] European attitudes towards Muslims and Jews by Udzu

The log-scale is also a silly choice for this data

TheBrain85 t1_iyp1qrz wrote on December 3, 2022 at 1:17 AM

Reply to comment by SherbertTiny2366 in [R] Statistical vs Deep Learning forecasting methods by fedegarzar

Because if there's overlap in the datasets, or they contain similar data, the exact ensemble you use is essentially an optimized hyperparameter specific for the dataset. It is exactly the reason that for any hyperparameter optimization cross-validation is used on a set separate from the test set. So using the results on the M4 dataset is akin to optimizing hyperparameters on the test set, which is a form of overfitting.

The datasets are from the same author, same series of competitions: https://en.wikipedia.org/wiki/Makridakis_Competitions#Fourth_competition,_started_on_January_1,_2018,_ended_on_May_31,_2018

"The M4 extended and replicated the results of the previous three competitions"

TheBrain85 t1_iymw874 wrote on December 2, 2022 at 4:11 PM

Reply to comment by SherbertTiny2366 in [R] Statistical vs Deep Learning forecasting methods by fedegarzar

Pretty biased selection method: the best ensemble in the M4 competition, evaluated on the M3 competition. Although I'm not familiar with these datasets, they're from the same author, so presumably they have significant overlap and similarity. The real question is how hard is it to find such an ensemble without overfitting to the dataset.