Submitted by elf7979 t3_10de78o in deeplearning
elf7979 OP t1_j4o90u3 wrote
Reply to comment by suflaj in Is 100 mega byte text corpus big enought to train? by elf7979
I think trascript from company's conference call includes some certain characterstics since business professionals may use some particular verbs or expressions. I haven't checked out w2v datasets you mentioned yet. Is there existing corpus that's business-oriented?
​
What if dataset size increases to 1 giga bytes. Is it big enough?
suflaj t1_j4pdd6o wrote
You're closer but not yet quite there - the smaller Google News Dataset W2V is trained on is 10 GB. The full one used is around 300GB IIRC
Viewing a single comment thread. View all comments