Submitted by redditnit21 t3_xzkmlr in MachineLearning
Street_Excitement_14 t1_irmupgk wrote
u/in-your-own-words explaining nicely but I feel you lack basics, thus here are my two cents:
- You have 2 different entities, A. a csv file indicating path of the images and output class of the images, for each image, and B. the images itself.
- Maintain your sv file in a dataframe (ie in Pandas) and shuffle it.
- Create a new column which indicates images in that row is for training or for testing. (you can take indexes, and split them, or you can also stratified split ie take 20% test data from each class.) If you stuck, search web or ask stackoverflow. Pandas, sklearn and numpy are your friends.
- Now you have a dataframe that has image path, image class and train/test indicator. Using this dataframe, you can create train and test folder and copy/move corresponding images to those folder easily. You can also create two different csv file from your dataframe for train and test and save them as well.
- Alternatively, you can directly feed data from your dataframe to the learning algorithm. The loader function will take image path as an input, and load the image and feed the algorithm.
Do not ask for code:) As I have said, mostly pandas is your friend
in-your-own-words t1_irmv7ca wrote
My pedagogical method is more socratic and from an engineering perspective. I think discovering how to find the names of the mainstream tools, how to find the documentation, and how to learn to read, understand, and rely on it, is ultimately the most beneficial and empowering to the developer.
redditnit21 OP t1_irmvsly wrote
Thanks for such a good answer and I will keep in mind all you said and start learning basics. I don’t know why other guys are just straight up criticising me. What I did I split the images into 2 folders train and test and then further classified into folders (Class 1 and Class 2). Then I am thinking of using train data generator for training.
Street_Excitement_14 t1_irmwhvt wrote
you are welcome. my advice is do not do anything manually, do it with Pandas. ie you can use pandas '.loc' command to filter training data, and write that data to training folder etc. İf stuck at any point, search the internet or ask it. good luck, have a nice day:)
redditnit21 OP t1_irmwspv wrote
Just a last question, do you know any good resource to learn the basics for Pandas?
Street_Excitement_14 t1_irmygtj wrote
Trust me it pays off well (even without the context of data science field, it gives you the ability to manage the tabular data effectively)
https://www.kaggle.com/learn/pandas
https://www.coursera.org/specializations/data-science-python
redditnit21 OP t1_irn75ut wrote
Thanks a lot man! I am really sorry for asking to write code.
Viewing a single comment thread. View all comments