Comments

You must log in or register to comment.

jobeta t1_iyl87di wrote

What kind of data is it?

2

Fun_Country_4193 OP t1_iyl8an7 wrote

all text data, consists of data from the pile and some other datasets, it's about 1TB total, but you can train on randomly pulled batches from the overall set (about 2GB), which works about as well as trying to train on the whole dataset

1

jobeta t1_iyl8mib wrote

« Data from the pile »? Why don’t you organize a Kaggle challenge ?

2

Fun_Country_4193 OP t1_iyl9s1d wrote

I just checked, and minimum cost is 50,000. I could probably do like 20k, but 50k is a lot.

1

jobeta t1_iym2upg wrote

Oh ok. I guess they have some costs on their end too. What did you mean by data from the pile? I’m happy to give it a shot if you think ~1 GB of data can be enough.

1