JackBlemming t1_j39baqq wrote
Reply to comment by jrmylee in [D] I recently quit my job to start a ML company. Would really appreciate feedback on what we're working on. by jrmylee
Per 2. yes, exactly right. Some of my datasets are millions of images with metadata. As you can imagine, uploading and consuming this magnitude is slow and tedious, and then integrating it with the remote machine actually running the training script.
jrmylee OP t1_j39dpr0 wrote
Got it, appreciate it the feedback!
i_ikhatri t1_j3xlleh wrote
Just to add onto this feedback (because I think /u/JackBlemming is 100% correct) you would probably benefit from storing some of the most popular datasets (ImageNet, MS COCO, whatever is relevant to the fields you're targeting) somewhere in the cloud where you can provide fast read access (or fast copies) to any number of training workers that get spun up.
Research datasets tend to be fairly standardized so I think you could get a high amount of coverage by just having a few common datasets available. I only gave computer vision examples because that's what I'm most familiar with but if you get a few CV datasets, a few NLP ones etc. you should be able to provide a killer UX.
Bonus points if you're somehow able to configure the repos to read from the centralized datastore properly automatically (though this is probably difficult/impossible).
Viewing a single comment thread. View all comments