As per our understanding choosing the right Machine Learning models is one thing, choosing the right attributes is another. The reason of good accuracy in detection for the Bayesian classifier model in this dataset was because of the type of study done - Most past papers have been working on extracting linguistic features from the article, some have been looking at it from social media perspective, aka looking at twitter profiles and on that basis classifying tweets as fake or real.
Month is not the ONLY attribute we used, the type of news (Political, World News, US news) was another factor we used.
Choosing the right model and the right attributes and the right methodology is a specific thing. Most linguistic features-extraction based models for example are more complicated in nature but they cannot even discern real news from fake news very well for most of the previous work we saw... the accuracy is in 70s. For us, getting the right performance using the right selection of attributes was critical and we feel we have done a decent job at that.
The why should be left to your interpretation. I have already said what I said. It is political in nature. More than that we cannot say.
I share ur concern. In this case, the date is the date of publishing or the last date modified of the article... it is not related to the date at which data collector found the article.
The reason we didnt dwelve into the "why" in our research paper itself is because it is supposed to be a Machine Learning paper with a scientific community reading it. Our group of writers are of the opinion that the reasoning is highly political in nature.
loosefer2905 OP t1_iqw45vd wrote
Reply to comment by KellinPelrine in [R] An easy-to-read preprint on Fake News Detection during US 2016 elections - Accuracy of 95%+ by loosefer2905
As per our understanding choosing the right Machine Learning models is one thing, choosing the right attributes is another. The reason of good accuracy in detection for the Bayesian classifier model in this dataset was because of the type of study done - Most past papers have been working on extracting linguistic features from the article, some have been looking at it from social media perspective, aka looking at twitter profiles and on that basis classifying tweets as fake or real. Month is not the ONLY attribute we used, the type of news (Political, World News, US news) was another factor we used.
Choosing the right model and the right attributes and the right methodology is a specific thing. Most linguistic features-extraction based models for example are more complicated in nature but they cannot even discern real news from fake news very well for most of the previous work we saw... the accuracy is in 70s. For us, getting the right performance using the right selection of attributes was critical and we feel we have done a decent job at that.
The why should be left to your interpretation. I have already said what I said. It is political in nature. More than that we cannot say.