Submitted by redditnit21 t3_xzkmlr in MachineLearning
PassionatePossum t1_irmr78w wrote
You seem to be quite new at this (no offense, but otherwise you wouldn't be asking for code for such a trivial task), I would like to give you some advice on how to do this right. Others have already told you how to implement a random split, which generally is good advice. However, the underlying assumption is, that the images themselves are not somehow correlated with one another.
I've actually seen people taking video frames (and of course every video frame doesn't look much different from the previous one) and randomly sample these frames into training/test sets and then bragging about their incredibly good performance. Of course any performance measurements you do on such a dataset will be worthless.
So how you want to sample training/test data is something you should think about carefully (i.e. are the training/validation/test set actually independent from one another).
So under the assumption, that the images are independent from one another a random split would be a good idea. If that isn't the case (and without more information, nobody here can tell you whether that is the case), you need some other way to split the data (e.g. by video).
Viewing a single comment thread. View all comments