Submitted by regenerated_lawyer t3_zc7oar in MachineLearning
Hi all,
I am working on fine-tuning some language models to do some tasks unique to current news, and I was wondering if there were are good datasets for articles from a variety of top publications like NYT, WSJ, Economist, etc? I haven't been able to find a varied dataset like that so far.
beezlebub33 t1_iywqxfg wrote
It will be hard to get a free source of data for publications like that, because they are not free.
However, there is GDELT: https://www.gdeltproject.org/ (wikipedia: https://en.wikipedia.org/wiki/GDELT_Project ) It's a project that collects events and other data from a variety of sources.