Viewing a single comment thread. View all comments

proto-n t1_j30xu2t wrote

Is there anywhere one can download the training dataset of gpt2 (or equivalent)? Or do you have to crawl it yourself for legal reasons?

Nvm, after an hour: common crawl, openwebtext2, the pile

1